Maintenance of Generalized Association Rules with. Multiple Minimum Supports

Size: px
Start display at page:

Download "Maintenance of Generalized Association Rules with. Multiple Minimum Supports"

Transcription

1 Maintenance of Generalized Association Rules with Multiple Minimum Supports Ming-Cheng Tseng and Wen-Yang Lin (corresponding author) Institute of Information Engineering, I-Shou University, Kaohsiung 84, Taiwan Tel: Dept. of Information Management, I-Shou University, Kaohsiung 84, Taiwan Abstract Mining generalized association rules among items in the presence of taxonomy has been recognized as an important model in data mining. Earlier wor on generalized association rules confined the minimum supports to be uniformly specified for all items or items within the same taxonomy level. This constraint would restrain an expert from discovering more interesting but much less supported association rules. In our previous wor, we have addressed this problem and proposed two algorithms, MMS_Cumulate and MMS_Stratify. In this paper, we examined the problem of maintaining the discovered multi-supported, generalized association rules when new transactions are added into the original database. We proposed two algorithms, _Cumulate and _Stratify, which can incrementally update the discovered generalized association rules with non-uniform support specification and are capable of

2 effectively reducing the number of candidate sets and database re-scanning. Empirical evaluation showed that _Cumulate and _Stratify are -6 times faster than running MMS_Cumulate or MMS_Stratify on the updated database afresh. Keywords: Generalized association rules, maintenance, multiple minimum supports, sorted closure, taxonomy.. Introduction Mining association rules from a large database of business data, such as transaction records, has been a popular topic within the area of data mining [,,, 3]. This problem is originally motivated by applications nown as maret baset analysis to find relationships among items purchased by customers, that is, the inds of products tend to be purchased together. For example, an association rule, Destop In-jet (Support=3%, Confidence=6%), says that 3% (support) of customers purchase both Destop PC and In-jet printer together, and 6% (confidence) of customers who purchase Destop PC also purchase In-jet printer. Such information is useful in many aspects of maret management, such as store layout planning, target mareting, and understanding the behavior customers. In many applications, there are taxonomies (hierarchies), explicitly or implicitly, over the items. In some applications, it may be more useful to find associations at different levels of the taxonomy than only at the primitive concept level [7, 4]. For example, consider Figure, the taxonomy of items from which the previous association rule is derived can be represented as

3 Printer PC Scanner Non-impact Dot-matrix Destop Noteboo Laser In-jet Figure. An example of taxonomy T. It is liely to happen that the association rule Destop In-jet (Support=3%, Confidence=6%) does not hold when the minimum support is set to 4%, but the following association rule may be valid PC Printer. Besides, note that in reality the frequencies of items are not uniform. Some items occur very frequently in the transactions while others rarely appear. In this case, a uniform minimum support assumption would hinder the discovery of some deviations or exceptions that are more interesting but much less supported than general trends. To meet such situations, we have investigated the problem of mining generalized association rules across different levels of taxonomy with non-uniform minimum supports [6]. We proposed two efficient algorithms, MMS_Cumulate and MMS_Stratify, which not only can discover associations that span different hierarchy levels but also have high potential to produce rare but informative item rules. The proposed approaches, however, are not effective to the situation for frequently updating the large source database. In this case, adopting the mining approach tends to re-applying the whole process on the updated database to reflect correctly the most recent associations among items. This is not cost-effective and is unacceptable in general. To be more realistic and cost-effective, it is better to perform the association mining algorithms to generate the initial association rules, and then upon updating the source database, apply an incremental maintenance method to re-build the discovered rules. The major challenge here is to deploy an efficient maintenance algorithm to fa- 3

4 cilitate the whole mining process. This problem is nontrivial because updates may invalidate some of the discovered association rules, turn previous wea rules into strong ones and import new, undiscovered rules. In this paper, we addressed the issues for developing efficient maintenance methods and proposed two algorithms, _Cumulate and _Stratify. Our algorithms can incrementally update the generalized association rules with non-uniform support specification and is capable of effectively reducing the number of candidate sets and database re-scanning. The performance study showed that _Cumulate and _Stratify are -6 times faster than running MMS_Cumulate or MMS_Stratify on the updated database afresh. The remainder of the paper is organized as follows. The problem of maintaining generalized association rules with multiple minimum supports is formalized in Section. In Section 3, we explain the proposed algorithms for updating frequent itemsets with multiple minimum supports. The evaluation of the proposed algorithms on IBM synthetic data is described in Section 4. A review of related wor is given in Section 5. Finally, our conclusions are stated in Section 6.. Problem statement. Mining multi-supported, generalized association rules Let I = {i, i,, i m } be a set of items and = {t, t,, t n } be a set of transactions, where each transaction t i = tid, A has a unique identifier tid and a set of items A (A I). We assume that the taxonomy of items, T, is available and is denoted as a directed acyclic graph on I J, where J = {j, j,, j p } represents the set of generalized items derived from I. An edge in T denotes an is-a relationship. That is, if there is an edge from j to i, we call j a parent (generalization) of i and i a child of j. For example, in Figure I = {Laser printer, In-jet printer, Dot matrix printer, Destop PC, Noteboo, Scanner} and J = {Non-impact printer, Printer, Personal computer}. 4

5 Definition. Given a transaction t = tid, A, we say an itemset B is in t if every item in B is in A or is an ancestor of some item in A. An itemset B has support s, denoted as s = sup(b), in the transaction set if s% of transactions in contain B. Definition. Given a set of transactions and a taxonomy T, a generalized association rule is an implication of the form A B, where A, B I J, A B =, and no item in B is an ancestor of any item in A. The support of this rule, sup(a B), is equal to the support of A B. The confidence of the rule, conf(a B), is the ratio of sup(a B) and sup(a). The condition in Definition that no item in A is an ancestor of any item in B is essential; otherwise, a rule of the form, a ancestor(a), always has % confidence and is trivial. Definition 3. Let ms(a) denote the minimum support of an item a in I J. An itemset A = {a, a,, a }, where a i I J, is frequent if the support of A is larger than the lowest value of minimum support of items in A, i.e., sup(a) min a i A ms(a i ). Definition 4. A multi-supported, generalized association rule A Bis strong if sup(a B) min a i A B ms(a i ) and conf(a B) minconf. Definition 5. Given a set of transactions, a taxonomy T, the user-specified minimum supports for all items in T, {ms(a ), ms(a ),, ms(a n )}, and the minconf, the problem of mining multi-supported, generalized association rules is to find all association rules that are strong. Example. Suppose that a shopping transaction database in Table consists of items I {Laser printer, In-jet printer, Dot matrix printer, Destop PC, Noteboo, Scanner} and taxonomy T as shown in Figure. Let the minimum confidence (minconf) be 6% and the minimum support (ms) associated with each item in the taxonomy be as follows: 5

6 ms(printer) 8% ms(non-impact) 65% ms(dot matrix) 7% ms(laser) 5% ms(in-jet) 6% ms(scanner) 5% ms(pc) 35% ms(destop) 5% ms(noteboo) 5% The following generalized association rule, PC, Laser Dot matrix (sup 6.7%, conf 5%), fails because its support is less than min{ms(pc), ms(laser), ms(dot matrix)} 5%. But another rule, PC Laser (sup 33.3%, conf 66.7%), holds because both its support and confidence are larger than min{ms(pc), ms(laser)} 5% and minconf, respectively. Table lists the frequent itemsets and the resulting strong rules for this example. Table. A transaction database (). TID Items Purchased Noteboo, Laser printer Scanner, Dot matrix printer 3 Dot matrix printer, In-jet printer 4 Noteboo, Dot matrix printer, Laser printer 5 Scanner 6 Destop computer Table. Frequent itemsets and association rules generated for Example. Itemsets min ms (%) Support (%) {Scanner} {PC} {Noteboo} {Laser} {Scanner, Printer} {Scanner, Dot matrix} {Laser, PC} {Noteboo, Printer} {Noteboo, Non-impact} {Noteboo, Laser} Rules 6

7 PC Laser (sup 33.3%, conf 66.7%) Laser PC (sup 33.3%, conf %) Noteboo Printer (sup 33.3%, conf %) Noteboo Non-impact (sup 33.3%, conf %) Noteboo Laser (sup 33.3%, conf %) As shown in [], the tas of mining association rules is usually decomposed into two steps:. Itemset generation: find all frequent itemsets those have support exceeding a threshold minimum support.. Rule construction: from the set of frequent itemsets, construct all association rules that have a confidence exceeding a threshold minimum confidence. Since the solution to the second subproblem is straightforward, the problem can be reduced to finding the set of frequent itemsets that satisfy the specified minimum support.. Maintaining multi-supported, generalized association rules In real business applications, the database grows over time. This implies that if the updated database is processed afresh, the previously discovered associations might be invalid and some undiscovered associations should be generated. That is, the discovered association rules must be updated to reflect the new circumstance. Analogous to mining associations, this problem can be reduced to updating the frequent itemsets. Definition 6. Let denote the original database, db the incremental database, the updated database containing db and, i.e., = db +, T the taxonomy of items, and L the set of frequent itemsets in. The problem of updating the frequent itemsets with taxonomy and multiple supports is to find L = {A sup (A) min a i A ms(a i )}, 7

8 given the nowledge of, T, db, L, and sup (A) A L. Example. Consider Example again, and suppose that the incremental transaction database db is shown in Table 3. Table 4 lists the frequent itemsets and the resulting strong rules. Comparing Table 4 to Table, we observe that two old frequent - itemsets in Table, {Scanner, Printer} and {Scanner, Dot matrix}, are discarded, while three new frequent itemsets, {Destop}, {PC, Printer}, and {PC, Nonimpact}, are added into Table 4, and two new rules PC Printer and PC Nonimpact are found. Table 3. An incremental database (db). TID Items Purchased 7 Destop computer, Laser printer 8 Laser printer 3. Methods for updating frequent itemsets for multi-supported, generalized association rules 3. Preliminary As stated in the pioneering wor [4], the primary challenge of devising effective association rules maintenance algorithm is how to reuse the original frequent itemsets and avoid the possibility of re-scanning the original database. Table 4. Frequent itemsets and association rules generated for Example. Itemsets min ms (%) Support (%) {Scanner} 5 5. {Destop} 5 5. {Noteboo} 5 5. {PC} {Laser} 5 5. {PC, Printer} {PC, Non-impact}

9 {Laser, PC} {Noteboo, Printer} 5 5. {Noteboo, Non-impact} 5 5. {Noteboo, Laser} 5 5. Rules PC Printer (sup 37.5%, conf 75%) PC Non-impact (sup 37.5%, conf 75%) PC Laser (sup 37.5%, conf 75%) Laser PC (sup 37.5%, conf 75%) Noteboo Printer (sup 5%, conf %) Noteboo Non-impact (sup 5%, conf %) Noteboo Laser (sup 5%, conf %) Let denote the number of transaction records in the original database, db be the number of transaction records in the incremental database db, and be the number of transaction records in the whole updated database containing db and. For a sorted -itemset A = a, a,, a with ms(a ) ms(a ) ms(a ), we define its support counts in db as A.count db, in as A.count and in as A.count. Note that = db + and A.count = A.count db + A.count. After scanning the incremental database db, we have A.count db (A) and can proceed further according to the following conditions. () If A is a frequent itemset both in db and, i.e., count db (A) ms(a ) db and count (A) ms(a ), then A is a frequent itemset in. There is no need to compute because count (A) ms(a ). () If A is not a frequent itemset in db but is frequent in, then a simple calculation can determine whether A is frequent or not in. (3) If A is a frequent itemset in db but not frequent in, then A is an undetermined itemset in. Since count (A) is not available, we must re-scan to compute count (A) to decide whether A is frequent or not in. 9

10 (4) If A is neither a frequent itemset in db nor in, then A is not frequent in. There is no need for further computation. These four cases are depicted in Figure. Note that only case 3 yields the essence of re-scanning the original database. Infrequent Itemset Min. Support Frequent Itemset Original Database Case 4 Case 3 Case Case Incremental Database Infrequent Itemset Min. Support Frequent Itemset Figure. Four cases in the incremental frequent itemset maintenance [8]. Let -itemset denote an itemset with items. The basic process of updating generalized association rules with multiple minimum supports is similar to previous wor [4, 6] on updating association rules with uniform support and follows the level-wise approach widely used in most efficient algorithms to generate all frequent -itemsets. First, count all -itemsets in db and then determine whether to re-scan the original database from these itemsets and finally create frequent -itemsets L. From frequent -itemsets, generate candidate -itemsets C and repeat the above procedure until no frequent -itemsets L are created. The above paradigm, however, has to be modified to incorporate taxonomy information and multiple minimum supports. First, note that in the presence of taxonomy an itemset can be composed of items, primitive or generalized, in the taxonomy. To calculate the occurrence of each itemset, the current scanned transaction t is extended to include the generalized items of all its component items.

11 Secondly, note that the well-nown apriori pruning technique based on the concept of downward closure does not wor for multiple support specification. For example, consider four items a, b, c, and d that have minimum supports specified as ms(a) = 5%, ms(b) = %, ms(c) = 4%, and ms(d) = 6%. Clearly, a -itemset {a, b} with % of support is discarded for % < min{ms(a), ms(b)}. According to the downward closure, the 3-itemsets {a, b, c} and {a, b, d} will be pruned even though their supports may be larger than ms(c) and ms(d), respectively. To solve this problem, we have adopted the sorted closure property [9] in our previous wor for mining generalized association rules with multiple minimum supports. Hereafter, to distinguish from the traditional itemset, a sorted -itemset denoted as a, a,, a is used. Lemma. If a sorted -itemset a, a,, a, for and ms(a ) ms(a ) ms(a ), is frequent, then all of its sorted subsets with items are frequent, except the subset a, a 3,, a. Again, let L and C represent the set of frequent -itemsets and candidate - itemsets, respectively. We assume that any itemset in L or C is sorted in increasing order of the minimum item supports. The result in Lemma reveals the obstacle in using the apriori-gen in generating frequent itemsets. Lemma. For =, the procedure apriori-gen(l ) fails to generate all candidate -itemsets in C. For example, consider a sorted candidate -itemset a, b. It is easy to find if we want to generate this itemset from L, both items a and b should be included in L ; that is, each one should be occurring more frequently than the corresponding minimum support ms(a) and ms(b). Clearly, the case ms(a) sup(b) < ms(b) fails to generate a, b in C even sup( a, b ) ms(a). For the above reason, all items within an itemset are sorted in the increasing order of their minimum supports, and a sorted itemset, called frontier set, F = a j, a j, a j,, a j l, is facilitated to generate the set of candidate -itemsets, where a j = min a i I J {a i sup(a i ) ms(a i )}, ms(a j ) ms(a j ) ms(a j ) ms(a jl ), sup(a ji ) ms(a j), i l.

12 Example 3. Continuing with Example, we change ms(scanner) from 5% to %. The resulting F is shown in Table 5. From Table 5, we observe that ms(scanner) is the smallest of all items, and Scanner could join with any item whose support is greater than or equal to ms(scanner) % to become a C candidate without losing any -itemsets. The -itemsets Scanner, Destop and Scanner, In-jet could not become candidates because sup(destop) or sup(in-jet) is less than ms(scanner), and the supports of Scanner, Destop and Scanner, In-jet could not be greater than sup(destop) and sup(in-jet), respectively, according to the downward closure property. Therefore, we eep items whose support is greater than or equal to ms(scanner) in F, and discard Destop and In-jet. For C, 3, the candidate generation is not changed. Please refer to [6] for details on the mining of multi-support, generalized association rules. 3. Algorithm _Cumulate The basic process of _Cumulate algorithm is proceeded as follows. First, count all -itemsets in db including generalized items. Second, combine the itemset counts in db and, and create the frequent -itemsets L according to the four cases. Next, create the frontier set F and use it to generate candidate -itemsets C. Then, generate the frequent -itemsets L following the same procedure for L. Finally, for 3, repeat the above procedure until no frequent -itemsets L are created, except that the candidate -itemsets C are generated from L. In each iteration for C generation, an additional pruning technique as described below is performed. Table 5. Generating frontier set F. Item Sorted ms % Support % F Scanner 33.3 Scanner

13 Laser Destop Noteboo PC In-jet Non-impact Dot-matrix Printer Laser Noteboo PC Non-impact Dot-matrix Printer Lemma 3. For any sorted -itemset A a, a,, a, if, A can be pruned if A is not a frequent itemset in and there exists a subset of A, say a i, a i,, a i, such that sup db( a i, a i,, a i ) ms(a ) in db, and a i a or ms(a ) ms(a ). Proof. Note that if A is not frequent in and db, then A will not be frequent in. Since we now that A is not frequent in, all we have to do is to mae sure A is not frequent in db. If we can find A is not frequent in db, then we sure A is not frequent in. Example 4. Assume that ms(scanner) 3%, ms(noteboo) 3%, ms(in-jet) 5%,, db, Scanner, Noteboo, In-jet is not frequent in. Let us consider the three subsets of Scanner, Noteboo, In-jet, and assume that sup db ( Scanner, Noteboo ), sup db ( Scanner, In-jet ) or sup db ( Noteboo, Injet ) is not frequent in db. Since Scanner, Noteboo, In-jet is infrequent in, count( Scanner, Noteboo, In-jet ) 3% 3. Now, if Scanner, Noteboo or Scanner, In-jet is not frequent in db, then sup db ( Scanner, Noteboo, In-jet ) sup db ( Scanner, Noteboo ) ms(scanner), or sup db ( Scanner, Noteboo, In-jet ) sup db ( Scanner, In-jet ) ms(scanner) in db according to the downward closure property, and so count db ( Scanner, Noteboo, In-jet ) 3

14 3% 3. Hence, count ( Scanner, Noteboo, In-jet ) count db ( Scanner, Noteboo, In-jet ) + count ( Scanner, Noteboo, In-jet ) It follows that sup ( Scanner, Noteboo, In-jet ) count ( Scanner, Noteboo, In-jet )/ 33/(+) 3% ms(scanner). Therefore, Scanner, Noteboo, In-jet is not a frequent 3-itemset in. On the other hand, if Noteboo, In-jet is not frequent in db, then it is true that sup db ( Scanner, Noteboo, In-jet ) sup db ( Noteboo, In-jet ) ms(noteboo) 3% ms(scanner). Hence, Scanner, Noteboo, In-jet is not frequent in db. Finally, we can derive that Scanner, Noteboo, In-jet is not a frequent 3-itemset in. The procedure for generating F is described in Figure 3. Example 5. Let us continue with Example. Tables 6 and 7 show the progressing results for generating F. First, scan db to find the counts of all -itemsets in db, then scan to find the counts of all -itemsets that are not in L. The result is shown in Table 6. Next, calculate and combine the counts of items calculated in the first two steps and finally generate F, and the result is shown in Table 7.. for each transaction t db do /* Scan db and calculate count db (a) */. for each item a t do 3. Add all ancestors of a in IA into t; 4. Remove any duplicates from t; 5. for each item a t do 6. count db (a)++; 7. end for 8. for each item a L do /* Cases & */ 9. count (a) count (a) + count db (a);. if sup (a) ms(a) then. L = L {a};. end for 3. for each transaction t do /* Scan and calculate count (a) 4. for a L */ 5. for each item a t do 4

15 6. Add all ancestors of a in IA into t; 7. Remove any duplicates from t; 8. for each item a t and a L do 9. count (a)++;. end for. for each item a L do /* Cases 3 & 4 */. count (a) count (a) + count db (a); 3. if sup (a) ms(a) then 4. L = L {a}; 5. end for 6. for each item a in SMS in the same order do 7. if a L then 8. Insert a into F ; 9. brea; 3. end if 3. for each item b in SMS that is after a in the same order do 3. if sup (b) ms(a) then 33. Insert b into F ; Figure 3. Procedure F -gen(sms,, db, L, IA). Table 6. The result for scanning db and. L Scan db Scan not in L -itemset Count Sup % -itemset Count Sup % -itemset Count Sup % Scanner 33.3 Laser Destop 6.7 Laser 33.3 Destop 5. Dot-matrix 3 5. Noteboo 33.3 PC 5. Non-impact 3 5. PC 3 5. Non-impact In-jet 6.7 Printer Printer Table 7. Generating frontier set F. C Count Sorted ms% Sup % F Scanner 5 5. Scanner Laser Laser Destop 5 5. Destop Noteboo 5 5. Noteboo PC PC In-jet 6.5 Non-impact Non-impact Dot-matrix Dot-matrix Printer Printer

16 Figure 4 shows an overview of the _Cumulate algorithm. As an illustration, let us consider the example shown in Figure 5. For convenience, all the itemsets and frequent itemsets in, db and are first shown in Tables 8 to 3. A simple picture of running the _Cumulate algorithm on the example in Figure 5 is shown in Figure 6. First, scan db and sort all the items in db. Second, load 6 L, and chec whether those items in db are in L, if so, we can calculate their supports straightforward. Otherwise, we must re-scan the original database to determine whether candidate -itemsets are frequent or not. At the same time, we generate the frontier set F and use it to generate candidate -itemsets C. After that, the process is almost the same as that for generating L, except that, if candidate -itemsets are not frequent in, we can use count db (C ) to prune C, and determine whether candidate -itemsets C are frequent or not in db.. Create IMS; /* The table of user-defined minimum support */. Create IA; / * The table of each item and its ancestors from taxonomy T */ 3. SMS sort(ims); /* Ascending sort according to ms(a) stored in IMS */ 4. F F -gen(sms,, db, L, IA); 5. L {a F sup(a) ms(a)}; 6. for ( ; L ; ) do 7. if then C C -gen(f ); 8. else C apriori-gen(l ); 9. Delete any candidate in C that consists of an item and its ancestor;. Delete any candidate in C that satisfies Lemma 3;. Delete any ancestor in IA that is not present in any of the candidates in. C ; 3. Delete any item in F that is not present in any of the candidates in C ; 4. Cal-count(db, F, C ); /* Scan db and calculate counts of C */ 5. for each candidate A L do /* Cases & */ 6. count (A) count (A) + count db (A); 7. if count (A) ms(a[]) then /* A[] denotes the first item with the smallest minimum support in sorted A */ 8. L 9. end for = L {A};. for each candidate A C L do /* Cases 3 & 4 */

17 . if count db (A) ms(a[]) db then. Delete any candidate in C L ; 3. Cal-count(, F, C L ); /* Scan and calculate counts of C L */ 4. for each candidate A C L do 5. count (A) count (A) + count db (A); 6. if count (A) ms(a[]) then 7. L = L {A}; 8. end for 9. Result L ; Figure 4. Algorithm _Cumulate. A Taxonomy F I B D G H C E Original Database () Hierarchy Table (HI) TID Items Purchased Item Level_No. Group SubLevel H, C A 3 I, D B 3 3 D, E C H, D, C D 3 5 I E G F G Incremental Database (db) H TID Items Purchased I 3 7 G, C 8 C Minimum Item Support Table (MIS) 7 Item Ancestor Table (IA) Item minsup % Item Ancestor_ Ancestor_ A 8 A

18 B 65 B A C 5 C A B D 7 D A E 6 E A B F 35 F G 5 G F H 5 H F I 5 I Figure 5. An example of updating generalized association rules. 8

19 count db ( C ) C I C A, B, C, D, E, F, G, H, I G H F Scan db & sort E C in L Load Cal. support B D L A C not in L Scan & cal. support I 5. % H C % % F 5. % G 5. % E.5 % B 6.5 % D 37.5 % A 75. % F Generate L Generate F I, C, G, H, F, B, D, A L I, C, G, H, F Generate C C IC,IG,IH,IF,IB,ID,IA,CG,CH,CF,CD,GH, GB,GD,GA,HB,HD,HA,FB,FD,FA C in Load L & L use count db ( C ) to prune C C not in L ID,IA,CH,CF,HB,HA CG,GB,GA,FB,FA Scan db count ( C ) db ID IA CH CF HB HA CG GB GA FB FA count ( C ) db ms(a ) db Cal. support CG,GB,GA,FB,FA Scan & cal. support ID.5 % IA.5 % CH 5. % CF 37.5 % HB 5. % HA 5. % L Generate CH,CF,HB,HA,FB,FA Generate C3 C 3 L CG.5 % GB.5 % GA.5 % FB 37.5 % FA 37.5 % Figure 6. An example of _Cumulate. 9

20 Table 8. Summary for candidates, counts and supports of in Figure 5. C Count Sup % C Count Sup % C 3 Count Sup % I 33.3 I, D 6.7 C, H, D 6.7 C 33.3 I, A 6.7 C, F, D 6.7 G 6.7 C, H 33.3 H, B, D 6.7 H 33.3 C, F 33.3 F, B, D 6.7 F 3 5. C, D 6.7 E 6.7 H, B 33.3 B 3 5. H, D 6.7 D 3 5. H, A 33.3 A F, B 33.3 F, D 6.7 F, A 33.3 E, D 6.7 B, D 33.3 Table 9. Summary for candidates, counts and supports of db in Figure 5. C Count Sup % C Count Sup % C C, G 5. G 5. C, F 5. F 5. G, B 5. B G, A 5. A F, B 5. F, A 5. Table. Frontier set F in Figure 5. C Count Sorted ms% Sup % F I 5 5. I C C G 5 5. G H 5 5. H F F E 6.5 B B D D A A

21 Table. Summary for candidates, counts and supports of in Figure 5. C Count Sup % C Count Sup % C 3 Count Sup % I 5. I, D.5 C, H, D.5 C 4 5. I, A.5 C, F, D.5 G 5. C, G.5 H, B, D.5 H 5. C, H 5. F, B, D.5 F 4 5. C, F E.5 C, D.5 B G, B.5 D G, A.5 A H, B 5. H, D.5 H, A 5. F, B F, D.5 F, A E, D.5 B, D 5. Table. Frequent itemsets summary of in Figure 5. L -itemset Count Sup % -itemset Count Sup % 3-itemset I C H F I, D I, A C, H C, F H, B H, A L L 3 Table 3. Frequent itemsets summary of in Figure 5. L -itemset Count Sup % -itemset Count Sup % 3-itemset I C G H F C, H C, F H, B H, A F, B F, A L L 3

22 3.3 Algorithm _Stratify The _Stratify algorithm is based on the concept of stratification introduced in [4], which refers to a level-wise counting strategy from the top level of the taxonomy down to the lowest level, hoping that those candidates containing items at higher levels will not have minimum support, yielding no need to count candidates which include items at lower levels. However, this counting strategy may fail in the case of non-uniform minimum supports. Example 6. Let {Printer, PC}, {Printer, Destop}, and {Printer, Noteboo} are the candidate itemsets to be counted. The taxonomy and minimum supports are defined as Example. Using the level-wise strategy, we first count {Printer, PC} and assume that it is not frequent, i.e., sup({printer, PC}) <.35. Since the minimum supports of {Printer, Destop},.5, and {Printer, Noteboo}, also.5, are less than {Printer, PC}, we cannot assure that the occurrences of {Printer, Destop} and {Printer, Noteboo}, though less than {Printer, PC}, are also less than their minimum supports. In this case, we still have to count {Printer, Destop} and {Printer, Noteboo} even though {Printer, PC} does not have minimum support. The following observation inspires us to the deployment of the _Stratify algorithm. Lemma 4. Consider the two -itemsets a, a,, a and a, a,, â, where â is an ancestor of a. If a, a,, â is not frequent, then neither is a, a,, a. Lemma 4 implies that if a sorted candidate itemset in higher level of the taxonomy is not frequent, then neither are all of its descendants that differ from the itemset only in the last item. We thus first divide C, according to the ancestor-descendant re-

23 lationship claimed in Lemma 4, into two disjoint subsets, called top candidate set TC and residual candidate set RC, as defined below. Definition 7. Consider a set, S, of candidates in C induced by the schema a, a,, a,, where ' ' denotes don't care. A candidate -itemset A a, a,, a, a is a top candidate if none of the candidates in S is an ancestor of A. That is, A[])}, and TC {A A C, ( A C and A[i] A [ i], i, A[ ] is an ancestor of RC = C TC, if every itemset in TC is a frequent itemset. Example 7. Assume that the candidate -itemset C in Example consists of Scanner, PC, Scanner, Destop, Scanner, Noteboo, Noteboo, Laser, Noteboo, Non-impact, and Noteboo, Dot matrix, and the supports of the items in higher levels are larger than those in lower levels. Then TC = { Scanner, PC, Noteboo, Non-impact, Noteboo, Dot matrix } and RC = { Scanner, Destop, Scanner, Noteboo, Noteboo, Laser }. If the support of the top level candidate does not pass its minimum support ms(scanner), we do not need to count the remaining descendant candidates Scanner, Destop, Scanner, Noteboo. On the contrary, if Scanner, Destop is a frequent itemset, we should perform another pass over to count Scanner, Destop and Scanner, Noteboo to determine whether they are frequent or not. Our approach is that, for, rather than counting all candidates in C as in _Cumulate algorithm, we count the supports of candidates in TC, deleting as well as their descendants in RC the candidates that do not have minimum support. If RC is not empty, we then perform an extra scanning over the transaction database to 3

24 count the remaining candidates in RC. Again, the less frequent candidates are eliminated. The resulting TC and RC, called TL and RL, respectively, form the set of frequent -itemsets L. An overview of the _Stratify algorithm is described in Figure 7. The procedures for generating TC and RC are given in Figure 8 and Figure 9, respectively. Figure shows a picture of running the _Stratify algorithm on the example in Figure 5. The procedure for generating frontier set F and L is the same as that in the _Cumulate algorithm. After that, first, we sort C to find the top candidate set TC, lie _Cumulate algorithm doing, and generate TL. We then use TL to prune C and generate RC. Since RC is not empty, we must do the same procedure as generating TL, and generate RL and L lastly. 4

25 . Create IMS; /* The table of user-defined minimum support */. Create HI; /* The table of each item with Hierarchy Level, Sublevel, and Group */ 3. Create IA; /* The table of each item and its ancestors from taxonomy T */ 4. SMS sort(ims); /* Ascending sort according to ms(a) stored in IMS */ 5. F F -gen(sms,, db, L, IA); 6. L {a F sup(a) ms(a)}; 7. for ( ; L ; ++) do 8. if then C C -gen(f) 9. else C C -gen( L );. Delete any candidate in C that consists of an item and its ancestor;. Delete any candidate in C that satisfies Lemma 3;. TC TC -gen(c, HI); /* Using C, HI to find top C */ 3. Delete any ancestor in IA that is not present in any of the candidates in TC ; 4. Delete any item in F that is not present in any of the candidates in C ; 5. Cal-count(db, F, TC ); /* Scan db and calculate counts of TC */ 6. for each candidate A L do /* Cases & */ 7. count (A) count (A) + count db (A); 8. if count (A) ms(a[]) then /* A[] denotes the first item with the smallest minimum support in sorted A */ 9. TL = TL {A};. end for. for each candidate A TC. if count db (A) ms(a[]) db then 3. Delete any candidate in TC L do /* Cases 3 & 4 */ 5 L ; 4. Cal-count(, F, TC L ); /* Scan and calculate counts of TC L */ 5. for each candidate A TC L do 6. count (A) count (A) + count db (A); 7. if count (A) ms(a[]) then 8. TL = TL {A}; 9. end for 3. RC RC -gen(c, TC, TL ); /* Use C, TL to find residual C */ 3. If RC then 3. Delete any ancestor in IA that is not present in any of the candidates in RC ; 33. Delete any item in F that is not present in any of the candidates in RC ; 34. Cal-count(db, F, RC ); /* Scan db and calculate counts of RC */ 35. for each candidate A L do /* Cases & */ 36. count (A) count (A) + count db (A); 37. if count (A) ms(a[]) then 38. RL = RL {A}; 39. end for

26 4. for each candidate A RC L do /* Cases 3 & 4 */ 4. if count db (A) ms(a[]) db then 4. Delete any candidate in RC L ; 43. Cal-count(, F, RC L ); /* Scan and calculate counts of 44. RC L */ 45. for each candidate A RC L do /* Cases 3 & 4 */ 46. count (A) count (A) + count db (A); 47. if count (A) ms(a[]) then 48. RL = RL {A}; 49. end for 5. end if 5. L TL RL ; 5. end for 53. Result L ; Figure 7. Algorithm _Stratify for each itemset A C do Sort C according to HI of A[] preserving the ordering of A[]A[] A[-]; for itemset A C in the same order do S = { A A C and A[i] A [ i], i }; if A is not mared and none of the candidates in S is an ancestor of A[] then insert A into TC ; mar A and all of its descendants in S ; end if end for Figure 8. Procedure TC -gen(c, HI) for each itemset A TC do if A TL then S = { A A C and A[i] A [ i], i }; Insert all of its descendants in S into RC ; end if Figure 9. Procedure RC -gen(c, TC, TL ). 6

27 count ( C ) db C I C in Cal. support C L A, B, C, D, E, F, G, H, I G H F Load Scan db & sort E B D L A C not in L Scan & cal. support I 5. % C 5. % H 5. % F 5. % G 5. % E.5 % B 6.5 % D 37.5 % A 75. % Generate L F L Generate I, C, G, H, F, B, D, A I, C, G, H, F F Generate C C IC,IG,IH,IF,IB,ID,IA,CG,CH,CF,CD,GH, GB,GD,GA,HB,HD,HA,FB,FD,FA Load L & use count db ( C ) to prune C Sorted C IA,ID,CF,CG,CH,GA,GB,HA,HB,FA,FB TC IA,CF,GA,HA,FA RC C in Generate TC L C not in IA,CF,HA Scan db GA,FA L Generate C in RC CH,HB CG,CH,HB,FB L C not in Scan db CG,FB L C.count db IA CF Cal. support IA.5 % CF HA 37.5 % 5. % TL HA GA FA CH HB CG FB count ( C ) ms(a ) db count C ) ms(a ) db db CF,HA,FA GA,FA Scan & cal. support GA.5 % FA 37.5 % Generate TL Cal. support CH 5.% HB 5.% RL db ( CG,FB Scan & cal. support Generate RL CH,HB,FB CG.5% FB 37.5% L CH,CF,HB,HA,FB,FA Generate C3 C 3 Generate L Use TL to prune C Figure. An example of _Stratify. 7

28 4. Experiments In this section, we evaluate the performance of the two algorithms, _Cumulate and _Stratify, using the synthetic dataset generated by the IBM data generator []. We performed four experiments, changing a different parameter in each experiment. All the parameters except the one being varied were set to their default values, as shown in Table 4. All experiments were performed on an Intel Pentium-II 35 with 64MB RAM, running on Windows Table 4. Default parameter settings for synthetic data generation. Parameter Default value Number of original transactions, db Number of incremental transactions, t Average size of transactions 5 N Number of items R Number of groups 3 L Number of levels 3 F Fanout 5 Minimum Support: We use the following formula [9] to generate multiple minimum supports for each item a: ms(a) = sup ( a), sup ( a).., otherwise where.6. and sup (a) denotes the support of item a in the original database. The result is depicted in Figure. As shown in the figure, _Cumulate and _Stratify perform 8

29 significantly better than MMS_Stratify and MMS_Cumulate; the improvement ranges from 3 to 6 times and increases as decreases. Besides, algorithm MMS_Stratify performs better than MMS_Cumulate for., and algorithm _Cumulate performs slightly better than _Stratify. 4 Time (sec) MMS_Cumulate MMS_Stratify _Cumulate _Stratify Figure. Performance comparison of MMS_Cumulate, MMS_Stratify, _Cumulate, and _Stratify for different values. Number of Incremental Transactions: We then compared the efficiency of these four algorithms under various sizes of incremental database. The number of transactions was varied from, to 85,, and the minimum supports were specified with =.6. As shown in Figure, _Cumulate and _Stratify outperform MMS_Stratify and MMS_Cumulate, and algorithm _Cumulate performs better than _Stratify. The performance gap between _Stratify and MMS_Cumulate decreases as the incremental size increases. The reason is that at high values, i.e., high minimum supports, the percentage of top candidates with generalized items becomes very small, and so _Stratify cannot prune significantly larger number of descendant itemsets to compensate for the cost of another database scan. 9

30 Time (sec) MMS_Cumulate MMS_Stratify _Cumulate _Stratify Number of Incremental Transactions (x, ) Figure. Performance comparison of MMS_Cumulate, MMS_Stratify, _Cumulate, and _Stratify for various sizes of incremental transactions. Fanout: We changed the fanout from 3 to 9. Note that this corresponded to decreasing the number of levels. The results are shown in Figure 3. All algorithms performed faster as the fanout increased. This is because the cardinality as well as the number of generalized itemsets decreased upon increasing the number of levels. Also note that the performance gap between _Cumulate, _Stratify and MMS_Cumulate, MMS_Stratify decreased at high fanout since there were fewer candidate itemsets. Number of Groups: We varied the number of groups from 5 to. As shown in Figure 4, the effect of increasing the number of groups is similar to that of increasing the fanout. The reason is that the number of items within a specific group decreases as the number of groups increases, so the probability of a generalized item decreases. 3

31 Time (sec) MMS_Cumulate MMS_Stratify _Cumulate _Stratify Fanout Figure 3. Performance comparison of MMS_Cumulate, MMS_Stratify, _Cumulate, and _Stratify for varying fanout. Time (sec) MMS_Cumulate MMS_Stratify _Cumulate _Stratify 5 5 Number of Groups Figure 4. Performance comparison of MMS_Cumulate, MMS_Stratify, _Cumulate, and _Stratify for varying number of groups. 3

32 5. Related Wor The problem of incremental updating association rules was first addressed by Cheung et al. [4]. They coined the essence of updating the discovered association rules when new transaction records are added into the incremental database over time and proposed an algorithm called FUP (Fast UPdate). By maing use of the discovered frequent itemsets, the proposed algorithm can dramatically reduce the efforts for computing the frequent itemsets in the updated database. They further examined the maintenance of multi-level association rules [5], and extended the model to incorporate the situations of deletion and modification [6]. All their approaches, however, did not recognize the varied support requirement inherent in items at different hierarchy levels. Since then, a number of techniques have been proposed to improve the efficiency of incremental mining algorithm [8,,, 5]. In [, 5], the authors proposed an incremental updating technique mainly based on the concept of negative borders. Empirical results showed that the approach could significantly save I/O access time for the maintenance of association rules. In [], Ng and Lam proposed an alternative incremental updating algorithm that incorporated the dynamic counting technique. Similar to its batch counterpart IDC [3], this algorithm can significantly reduce the number of database scans. In [8], Hong et al. developed an incremental mining algorithm that was based on a novel concept of pre-large itemsets. According to two user-specified upper and lower support thresholds, their approach facilitated the pre-large itemsets (those itemsets having support larger than the lower support threshold) to refrain from rescanning the original database until the accumulated amount of inserted transactions exceeds a safety bound derived by pre-large concept. Their wor, however, did not exploit association rules with generalized items, and did not consider multiple minimum supports. To sum up, all previous wors for incremental maintenance of association rules addressed in part the aspects discussed in this paper; no wor, to our nowledge, has considered simultane- 3

33 ously both issues of taxonomy and non-uniform support specification. A summary of these wors is described in Table 5. Table 5. A summary of related wor for incremental maintenance of association rules Contributors Model of incremental maintenance of association rules Support specification Exploiting taxonomy Type of update Cheung et al. [4] uniform no insertion Cheung et al. [5] uniform yes insertion Cheung et al. [6] uniform no insertion, deletion and modification Hong et al. [8] uniform no insertion Ng and Lam [] uniform no insertion Sarda and Srinivas [] uniform no insertion Thomas et al. [5] uniform no insertion 6. Conclusions In this paper, we have investigated the problem of maintaining association rules in the presence of taxonomy and multiple minimum supports. We presented two novel algorithms, _Cumulate and _Stratify, for maintaining multi-supported, generalized frequent itemsets. The proposed algorithms can incrementally update the discovered generalized association rules with non-uniform support specification and effectively reduce the number of candidate itemsets and database re-scanning. Empirical evaluation showed that these two algorithms not only were -6 times faster than running MMS_Cumulate or MMS_Stratify on the updated database afresh but also had good linear scale-up characteristic. In the future, we will extend the maintenance of generalized association rules to a more general model that incorporates the situations of deletion and modification, and fuzzy taxonomic structures. We will also investigate the problem of on-line discovery and maintenance of multidimensional association rules from data warehouse data. 33

34 References [] R. Agrawal, T. Imielinsi, and A. Swami, Mining association rules between sets of items in large databases, in: Proc. of 993 ACM-SIGMOD International Conference on Management of Data, 993, pp [] R. Agrawal and R. Sriant, Fast algorithms for mining association rules, in: Proc. of the th International Conference on Very Large Data Bases, 994, pp [3] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, Dynamic Itemset Counting and Implication Rules for Maret-Baset Data, in: Proc. of 997 ACM-SIGMOD International Conference on Management of Data, pp. 7-6, 997. [4] D.W. Cheung, J. Han, V.T. Ng, and C.Y. Wong. Maintenance of discovered association rules in large databases: An incremental update technique, in: Proc. of the th 996 International Conference on Data Engineering, 996, pp.6-4. [5] D.W. Cheung, V.T. Ng, B.W. Tam, Maintenance of discovered nowledge: a case in multilevel association rules, in: Proc. of the nd International Conference on Knowledge Discovery and Data Mining, 996, pp [6] D.W. Cheung, S.D. Lee, and B. Kao, A general incremental technique for maintaining discovered association rules, in: Proc. of the 5th International Conference on Database Systems for Advanced Applications (DASFAA'97), 997, pp [7] J. Han and Y. Fu, Discovery of multiple-level association rules from large databases, in: Proc. of the st International Conference on Very Large Data Bases, 995, pp [8] T.P. Hong, C.Y. Wang, Y.H. Tao, Incremental data mining based on two support thresholds, in: Proc. of the 4th International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies,, pp

35 [9] B. Liu, W. Hsu, and Y. Ma, Mining association rules with multiple minimum supports, in: Proc. of the 5th International Conference Knowledge on Discovery and Data Mining, 999, pp [] K.K. Ng and W. Lam, Updating of association rules dynamically, in: Proc. of 999 International Symposium on Database Applications in Non-Traditional Environments (DANTE'99),, pp [] J.S. Par, M.S. Chen, and P.S. Yu, An effective hash-based algorithm for mining association rules, in: Proc. of 995 ACM-SIGMOD International Conference on Management of Data, 995, pp [] N.L. Sarda and N.V. Srinivas, An adaptive algorithm for incremental mining of association rules, in: Proc. of the 9th International Worshop on Database and Expert Systems Applications (DEXA'98), 998, pp [3] A. Savasere, E. Omiecinsi, and S. Navathe, An efficient algorithm for mining association rules in large databases, in: Proc. of the st International Conference on Very Large Data Bases, 995, pp [4] R. Sriant and R. Agrawal, Mining generalized association rules, in: Proc. of the st International Conference on Very Large Data Bases, 995, pp [5] S. Thomas, S. Bodagala, K. Alsabti, and S. Rana, An efficient algorithm for the incremental updation of association rules in large databases, in: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, 997, pp [6] M.C. Tseng and W.Y. Lin, Mining generalized association rules with multiple minimum supports, in: Proc. of the 3rd International Conference on Data Warehousing and Knowledge Discovery,, pp

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Efficient Remining of Generalized Multi-supported Association Rules under Support Update Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou

More information

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and ata Bases, Corfu Island, Greece, February 16-19, 2007 142 Maintenance of Generalized Association Rules for

More information

Maintenance of the Prelarge Trees for Record Deletion

Maintenance of the Prelarge Trees for Record Deletion 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of

More information

Incremental Maintenance of Generalized Association Rules under Taxonomy Evolution

Incremental Maintenance of Generalized Association Rules under Taxonomy Evolution Incremental Maintenance of Generalized Association Rules under Taxonomy Evolution Ming-heng Tseng Institute of Information Engineering, I-Shou University, Kaohsiung 84, Taiwan Wen-Yang Lin Department of

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

An Improved Algorithm for Mining Association Rules Using Multiple Support Values

An Improved Algorithm for Mining Association Rules Using Multiple Support Values An Improved Algorithm for Mining Association Rules Using Multiple Support Values Ioannis N. Kouris, Christos H. Makris, Athanasios K. Tsakalidis University of Patras, School of Engineering Department of

More information

Maintenance of fast updated frequent pattern trees for record deletion

Maintenance of fast updated frequent pattern trees for record deletion Maintenance of fast updated frequent pattern trees for record deletion Tzung-Pei Hong a,b,, Chun-Wei Lin c, Yu-Lung Wu d a Department of Computer Science and Information Engineering, National University

More information

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare

More information

Data Mining Query Scheduling for Apriori Common Counting

Data Mining Query Scheduling for Apriori Common Counting Data Mining Query Scheduling for Apriori Common Counting Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,

More information

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases *

A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases * Shichao Zhang 1, Xindong Wu 2, Jilian Zhang 3, and Chengqi Zhang 1 1 Faculty of Information Technology, University of Technology

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu, Mining N-most Interesting Itemsets Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong fadafu, wwkwongg@cse.cuhk.edu.hk

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Incrementally mining high utility patterns based on pre-large concept

Incrementally mining high utility patterns based on pre-large concept Appl Intell (2014) 40:343 357 DOI 10.1007/s10489-013-0467-z Incrementally mining high utility patterns based on pre-large concept Chun-Wei Lin Tzung-Pei Hong Guo-Cheng Lan Jia-Wei Wong Wen-Yang Lin Published

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

A mining method for tracking changes in temporal association rules from an encoded database

A mining method for tracking changes in temporal association rules from an encoded database A mining method for tracking changes in temporal association rules from an encoded database Chelliah Balasubramanian *, Karuppaswamy Duraiswamy ** K.S.Rangasamy College of Technology, Tiruchengode, Tamil

More information

Mining Frequent Patterns with Counting Inference at Multiple Levels

Mining Frequent Patterns with Counting Inference at Multiple Levels International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets American Journal of Applied Sciences 2 (5): 926-931, 2005 ISSN 1546-9239 Science Publications, 2005 Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets 1 Ravindra Patel, 2 S.S.

More information

Association Rules. Berlin Chen References:

Association Rules. Berlin Chen References: Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A

More information

AN ENHANCED SEMI-APRIORI ALGORITHM FOR MINING ASSOCIATION RULES

AN ENHANCED SEMI-APRIORI ALGORITHM FOR MINING ASSOCIATION RULES AN ENHANCED SEMI-APRIORI ALGORITHM FOR MINING ASSOCIATION RULES 1 SALLAM OSMAN FAGEERI 2 ROHIZA AHMAD, 3 BAHARUM B. BAHARUDIN 1, 2, 3 Department of Computer and Information Sciences Universiti Teknologi

More information

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS 23 CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS This chapter introduces the concepts of association rule mining. It also proposes two algorithms based on, to calculate

More information

AN ALGORITHM FOR MINING ASSOCIATION RULES WITH WEIGHTED MINIMUM SUPPORTS

AN ALGORITHM FOR MINING ASSOCIATION RULES WITH WEIGHTED MINIMUM SUPPORTS AN ALGORITHM FOR MINING ASSOCIATION RULES WITH WEIGHTED MINIMUM SUPPORTS Yuchiang L\\ Chinchen Chang^'^ and Jiehshan Yeh^ 'Department of Computer Science and Information Engineering, National Chung Cheng

More information

Generation of Potential High Utility Itemsets from Transactional Databases

Generation of Potential High Utility Itemsets from Transactional Databases Generation of Potential High Utility Itemsets from Transactional Databases Rajmohan.C Priya.G Niveditha.C Pragathi.R Asst.Prof/IT, Dept of IT Dept of IT Dept of IT SREC, Coimbatore,INDIA,SREC,Coimbatore,.INDIA

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

A Novel Texture Classification Procedure by using Association Rules

A Novel Texture Classification Procedure by using Association Rules ITB J. ICT Vol. 2, No. 2, 2008, 03-4 03 A Novel Texture Classification Procedure by using Association Rules L. Jaba Sheela & V.Shanthi 2 Panimalar Engineering College, Chennai. 2 St.Joseph s Engineering

More information

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923

More information

Parallel Mining of Maximal Frequent Itemsets in PC Clusters

Parallel Mining of Maximal Frequent Itemsets in PC Clusters Proceedings of the International MultiConference of Engineers and Computer Scientists 28 Vol I IMECS 28, 19-21 March, 28, Hong Kong Parallel Mining of Maximal Frequent Itemsets in PC Clusters Vong Chan

More information

Roadmap. PCY Algorithm

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity Unil Yun and John J. Leggett Department of Computer Science Texas A&M University College Station, Texas 7783, USA

More information

Parallel Association Rule Mining by Data De-Clustering to Support Grid Computing

Parallel Association Rule Mining by Data De-Clustering to Support Grid Computing Parallel Association Rule Mining by Data De-Clustering to Support Grid Computing Frank S.C. Tseng and Pey-Yen Chen Dept. of Information Management National Kaohsiung First University of Science and Technology

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

IARMMD: A Novel System for Incremental Association Rules Mining from Medical Documents

IARMMD: A Novel System for Incremental Association Rules Mining from Medical Documents IARMMD: A Novel System for Incremental Association Rules Mining from Medical Documents Hany Mahgoub Faculty of Computers and Information, Menoufia University, Egypt ABSTRACT This paper presents a novel

More information

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.

More information

Mining Association rules with Dynamic and Collective Support Thresholds

Mining Association rules with Dynamic and Collective Support Thresholds Mining Association rules with Dynamic and Collective Suort Thresholds C S Kanimozhi Selvi and A Tamilarasi Abstract Mining association rules is an imortant task in data mining. It discovers the hidden,

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Interestingness Measurements

Interestingness Measurements Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected

More information

An Efficient Method For Finding Emerging Large Itemsets

An Efficient Method For Finding Emerging Large Itemsets An Efficient Method For Finding Emerging Large Itemsets Susan P. Imberman College of Staten Island The Graduate Center City University of New Yor 28 Victory Blvd. Staten Island, NY 34 Imberman@postbox.csi.cuny.edu

More information

Using Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment

Using Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment Using Pattern-Join and Purchase-Combination for Mining Web Transaction Patterns in an Electronic Commerce Environment Ching-Huang Yun and Ming-Syan Chen Department of Electrical Engineering National Taiwan

More information

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques

More information

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center Mining Association Rules with Item Constraints Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120, U.S.A. fsrikant,qvu,ragrawalg@almaden.ibm.com

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

A New Method for Mining High Average Utility Itemsets

A New Method for Mining High Average Utility Itemsets A New Method for Mining High Average Utility Itemsets Tien Lu 1, Bay Vo 2,3, Hien T. Nguyen 3, and Tzung-Pei Hong 4 1 University of Sciences, Ho Chi Minh, Vietnam 2 Divison of Data Science, Ton Duc Thang

More information

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on   to remove this watermark. 119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched

More information

Efficient Mining of Generalized Negative Association Rules

Efficient Mining of Generalized Negative Association Rules 2010 IEEE International Conference on Granular Computing Efficient Mining of Generalized egative Association Rules Li-Min Tsai, Shu-Jing Lin, and Don-Lin Yang Dept. of Information Engineering and Computer

More information

An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams *

An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams * An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams * Jia-Ling Koh and Shu-Ning Shin Department of Computer Science and Information Engineering National Taiwan Normal University

More information

Optimized Frequent Pattern Mining for Classified Data Sets

Optimized Frequent Pattern Mining for Classified Data Sets Optimized Frequent Pattern Mining for Classified Data Sets A Raghunathan Deputy General Manager-IT, Bharat Heavy Electricals Ltd, Tiruchirappalli, India K Murugesan Assistant Professor of Mathematics,

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 68 CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL 5.1 INTRODUCTION During recent years, one of the vibrant research topics is Association rule discovery. This

More information

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find

More information

A Comparative Study of Association Rules Mining Algorithms

A Comparative Study of Association Rules Mining Algorithms A Comparative Study of Association Rules Mining Algorithms Cornelia Győrödi *, Robert Győrödi *, prof. dr. ing. Stefan Holban ** * Department of Computer Science, University of Oradea, Str. Armatei Romane

More information

Algorithm for Efficient Multilevel Association Rule Mining

Algorithm for Efficient Multilevel Association Rule Mining Algorithm for Efficient Multilevel Association Rule Mining Pratima Gautam Department of computer Applications MANIT, Bhopal Abstract over the years, a variety of algorithms for finding frequent item sets

More information

Improving Efficiency of Apriori Algorithm using Cache Database

Improving Efficiency of Apriori Algorithm using Cache Database Improving Efficiency of Apriori Algorithm using Cache Database Priyanka Asthana VIth Sem, BUIT, Bhopal Computer Science Deptt. Divakar Singh Computer Science Deptt. BUIT, Bhopal ABSTRACT One of the most

More information

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India

More information

Frequent Itemsets Melange

Frequent Itemsets Melange Frequent Itemsets Melange Sebastien Siva Data Mining Motivation and objectives Finding all frequent itemsets in a dataset using the traditional Apriori approach is too computationally expensive for datasets

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Association mining rules

Association mining rules Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when

More information

An Algorithm for Mining Large Sequences in Databases

An Algorithm for Mining Large Sequences in Databases 149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati

Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering Amravati Analytical Representation on Secure Mining in Horizontally Distributed Database Raunak Rathi 1, Prof. A.V.Deorankar 2 1,2 Department of Computer Science and Engineering, Government College of Engineering

More information

Mining Frequent Itemsets from Uncertain Databases using probabilistic support

Mining Frequent Itemsets from Uncertain Databases using probabilistic support Mining Frequent Itemsets from Uncertain Databases using probabilistic support Radhika Ramesh Naik 1, Prof. J.R.Mankar 2 1 K. K.Wagh Institute of Engg.Education and Research, Nasik Abstract: Mining of frequent

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set

Pincer-Search: An Efficient Algorithm. for Discovering the Maximum Frequent Set Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set Dao-I Lin Telcordia Technologies, Inc. Zvi M. Kedem New York University July 15, 1999 Abstract Discovering frequent itemsets

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Sanguthevar Rajasekaran FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases Jun Luo Sanguthevar Rajasekaran Dept. of Computer Science Ohio Northern University Ada, OH 4581 Email: j-luo@onu.edu Dept. of

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset

SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset SETM*-MaxK: An Efficient SET-Based Approach to Find the Largest Itemset Ye-In Chang and Yu-Ming Hsieh Dept. of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan, Republic

More information

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree

RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree International Journal for Research in Engineering Application & Management (IJREAM) ISSN : 2454-915 Vol-4, Issue-3, June 218 RHUIET : Discovery of Rare High Utility Itemsets using Enumeration Tree Mrs.

More information

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India

More information

A Software Testing Optimization Method Based on Negative Association Analysis Lin Wan 1, Qiuling Fan 1,Qinzhao Wang 2

A Software Testing Optimization Method Based on Negative Association Analysis Lin Wan 1, Qiuling Fan 1,Qinzhao Wang 2 International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2015) A Software Testing Optimization Method Based on Negative Association Analysis Lin Wan 1, Qiuling Fan

More information

A Mining Algorithm to Generate the Candidate Pattern for Authorship Attribution for Filtering Spam Mail

A Mining Algorithm to Generate the Candidate Pattern for Authorship Attribution for Filtering Spam Mail A Mining Algorithm to Generate the Candidate Pattern for Authorship Attribution for Filtering Spam Mail Khongbantabam Susila Devi #1, Dr. R. Ravi *2 1 Research Scholar, Department of Information & Communication

More information

2 CONTENTS

2 CONTENTS Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a Road Map..................................... 3 5.1.1 Market Basket Analysis: A Motivating Example........................

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

Aggregation and maintenance for database mining

Aggregation and maintenance for database mining Intelligent Data Analysis 3 (1999) 475±490 www.elsevier.com/locate/ida Aggregation and maintenance for database mining Shichao Zhang School of Computing, National University of Singapore, Lower Kent Ridge,

More information

Mining of association rules is a research topic that has received much attention among the various data mining problems. Many interesting wors have be

Mining of association rules is a research topic that has received much attention among the various data mining problems. Many interesting wors have be Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules. S.D. Lee David W. Cheung Ben Kao Department of Computer Science, The University of Hong Kong, Hong Kong. fsdlee,dcheung,aog@cs.hu.h

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES

AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES AN IMPROVED GRAPH BASED METHOD FOR EXTRACTING ASSOCIATION RULES ABSTRACT Wael AlZoubi Ajloun University College, Balqa Applied University PO Box: Al-Salt 19117, Jordan This paper proposes an improved approach

More information

ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS

ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS D.SUJATHA 1, PROF.B.L.DEEKSHATULU 2 1 HOD, Department of IT, Aurora s Technological and Research Institute, Hyderabad 2 Visiting Professor, Department

More information

Data Structure for Association Rule Mining: T-Trees and P-Trees

Data Structure for Association Rule Mining: T-Trees and P-Trees IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 6, JUNE 2004 1 Data Structure for Association Rule Mining: T-Trees and P-Trees Frans Coenen, Paul Leng, and Shakil Ahmed Abstract Two new

More information

A Modern Search Technique for Frequent Itemset using FP Tree

A Modern Search Technique for Frequent Itemset using FP Tree A Modern Search Technique for Frequent Itemset using FP Tree Megha Garg Research Scholar, Department of Computer Science & Engineering J.C.D.I.T.M, Sirsa, Haryana, India Krishan Kumar Department of Computer

More information

Ecient Parallel Data Mining for Association Rules. Jong Soo Park, Ming-Syan Chen and Philip S. Yu. IBM Thomas J. Watson Research Center

Ecient Parallel Data Mining for Association Rules. Jong Soo Park, Ming-Syan Chen and Philip S. Yu. IBM Thomas J. Watson Research Center Ecient Parallel Data Mining for Association Rules Jong Soo Park, Ming-Syan Chen and Philip S. Yu IBM Thomas J. Watson Research Center Yorktown Heights, New York 10598 jpark@cs.sungshin.ac.kr, fmschen,

More information

A Further Study in the Data Partitioning Approach for Frequent Itemsets Mining

A Further Study in the Data Partitioning Approach for Frequent Itemsets Mining A Further Study in the Data Partitioning Approach for Frequent Itemsets Mining Son N. Nguyen, Maria E. Orlowska School of Information Technology and Electrical Engineering The University of Queensland,

More information

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES

EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES EVALUATING GENERALIZED ASSOCIATION RULES THROUGH OBJECTIVE MEASURES Veronica Oliveira de Carvalho Professor of Centro Universitário de Araraquara Araraquara, São Paulo, Brazil Student of São Paulo University

More information