sort items in each transaction in the same order in NL (d) B E B C B C E D C E A (c) 2nd scan B C E A D D Node-Link NL

Size: px

Start display at page:

Download "sort items in each transaction in the same order in NL (d) B E B C B C E D C E A (c) 2nd scan B C E A D D Node-Link NL"

Antony Simmons
6 years ago
Views:

1 ppears the st I onference on Data Mg (200) LPMer: n lgorithm for Fdg Frequent Itemsets Usg Length-Decreasg Support onstrat Masakazu Seno and George Karypis Department of omputer Science and ngeerg, rmy HP Research enter University of Mnesota 4-92 /S Buildg, 200 Union Street S, Mneapolis, MN Fax: (62) Technical Report #0-026 fseno, karypisg@cs.umn.edu ugust 7, 200 bstract Over the years, a variety of algorithms for ndg frequent itemsets very large transaction databases have been developed. The key feature most of these algorithms is that they use a constant support constrat to control the herently exponential complexity of the problem. In general, itemsets that conta only a few items will tend to be terestg if they have a high support, whereas long itemsets can still be terestg even if their support is relatively small. Ideally, we desire to have an algorithm that nds all the frequent itemsets whose support decreases as a function of their length. In this paper we present an algorithm called LPMer, that nds all itemsets that satisfy a length-decreasg support constrat. Our experimental evaluation shows that LPMer is up to two orders of magnitude faster than the FP-growth algorithm for ndg itemsets at a constant support constrat, and that its runtime creases gradually as the average length of the transactions (and the discovered itemsets) creases. Introduction Data mg research durg the last eight years has led to the development of a variety of algorithms for ndg frequent itemsets very large transaction databases [, 2, 4, 9]. These itemsets can be used to nd association rules or extract prevalent patterns that exist the transactions, and have been eectively used many dierent domas and applications. The key feature most of these algorithms is that they control the herently exponential complexity of the problem by ndg only the itemsets that occur a suciently large fraction of the transactions, called the support. limitation of this paradigm for generatg frequent itemsets is that it uses a constant value of support, irrespective of the length of the discovered itemsets. In general, itemsets that conta only a few items will tend to be terestg if they have a high support, whereas long itemsets can still be terestg even if their support is relatively small. Unfortunately, if constant-support-based frequent itemset discovery algorithms are used to nd some of the longer but frequent itemsets, they will end up generatg an exponentially large number of short itemsets. Maximal frequent itemset discovery algorithms [9] can potentially be used to nd some of these longer itemsets, but these algorithms can still generate a very large number of short frequent itemsets if these itemsets are maximal. Ideally, we desire to have an algorithm that nds all the frequent itemsets whose support decreases as This work was supported by NSF R , I , I , by rmy Research Oce contract D/DG , by the DO SI program, and by rmy High Performance omputg Research enter contract number DH ccess to computg facilities was provided by the Mnesota Supercomputg Institute.

2 a function of their length. Developg such an algorithm is particularly challengg because the downward closure property of the constant support constrat cannot be used to prune short frequent itemsets. In this paper we present another property, called smallest valid extension (SV), that can be used to prune the search space of potential itemsets the case where the support decreases as a function of the itemset length. Usg this property, we developed an algorithm called LPMer, that nds all itemsets that satisfy a length-decreasg support constrat. LPMer uses the recently proposed FP-tree [4] data structure to compactly store the database transactions ma memory, and the SV property to prune certa portions of the conditional FP-trees, that are beg generated durg itemset discovery. Our experimental evaluation shows that LPMer is up to two orders of magnitude faster than the FP-growth algorithm for ndg itemsets at a constant support constrat, and that its runtime creases gradually as the average length of the transactions (and the discovered itemsets) creases. The rest of this paper is organized as follows. Section 2 provides some background formation and related research work. Section 3 describes the FP-growth algorithm [4], on which LPMer is based. In Section 4, we describe how the length-decreasg support constrat can be exploited to prune the search space of frequent itemsets. The experimental results of our algorithm are shown Section 5, followed by the conclusion Section 6. 2 Background and Related Works The problem of ndg frequent itemsets is formally dened as follows: Given a set of transactions T, each contag a set of items from the set I, and a support, we want to nd all subsets of items that occur at least jt j transactions. These subsets are called frequent itemsets. Over the years a number of ecient algorithms have been developed for ndg all frequent itemsets. The rst computationally ecient algorithm for ndg itemsets large databases was priori [], which nds frequent itemsets of length l based on previously generated (l? )-length frequent itemsets. The key idea of priori is to use the downward closure property of the support constrat to prune the space of frequent itemsets. The FP-growth algorithm [4] nds frequent itemsets by usg a data structure called FP-tree that can compactly store memory the transactions of the origal database, thus elimatg the need to access the disks more than twice. nother ecient way to represent transaction database is to use vertical tid-list database format. The vertical database format associates each item with all the transactions that clude the item. clat [7] uses this data format to nd all frequent itemsets. ven though to our knowledge no work has been published for ndg frequent itemsets which the support decreases as a function of the length of the itemset, there has been some work developg itemset discovery algorithms that use multiple support constrats. Liu et al. [5] presented an algorithm which each item has its mimum item support (or MIS). The mimum support of an itemset is the lowest MIS among those items the itemset. By sortg items ascendg order of their MIS values, the mimum support of the itemset never decreases as the length of itemset grows, makg the support of itemsets downward closed. Thus the priori-based algorithm can be applied. Wang et al. [6] allow a set of more general support constrats. In particular, they associate a support constrat for each one of the itemsets. By troducg a new function called Pmsup that has \priori-like" property, they proposed an priori-based algorithm for ndg the frequent itemsets. Fally, ohen et al. [3] adopt a dierent approach that they do not use any support constrat. Instead, they search for similar itemsets usg probabilistic algorithms, that do not guarantee that all frequent itemsets can be found. 3 FP-growth lgorithm In this section, we describe how the FP-growth algorithm works because our approach is based on this algorithm. The description here is based on [4]. The key idea behd FP-growth is to use a data structure called FP-tree to obta a compact representation of the origal transactions so that they can t to the ma memory. s a result, any subsequent operations that are required to nd the frequent itemsets can be performed quickly, without havg to access the disks. The FP-growth algorithm achieves that by performg just two passes over the transactions. Figure shows how the FP-tree generation algorithm works given an put transaction database that has ve transactions with a total 2

3 (a) st scan tid transaction B D 2 B 3 B 4 B D 5 F Transaction Database item support sort items each transaction the same order NL B D (d) B B B D (c) 2nd scan null B 4 3 B D 40% 80% 80% 40% 80% F 20% Item Support Table sort by support (b) B D Node-Lk NL D 2 D FP-tree Figure : Flow of FP-tree generation of six dierent items. First, it scans the transaction database to count how many times each item occurs the database to get an \Item Support Table" (step (a)). The \Item Support Table" has a set of (item-name, support) pairs. For example, item occurs twice the database, namely a transaction with tid and another one with tid 5; therefore its support is 2=5 = 40%. In step (b), those items the Item Support Table are sorted accordg to their support. The result is stored item-name eld of Node-Lk header table NL. Notice that item F is not cluded NL because the support of item F is less than the mimum support constrat 40%. In step (c), items each transaction the put transaction database are sorted the same order as items the Node-Lk header table NL. While transaction tid 5 is sorted, item F is discarded because the item is frequent and has no need of consideration. In step (d), the FP-tree is generated by sertg those sorted transactions one by one. The itial FP-tree has only its root. When the rst transaction is serted, nodes that represent item B,,,, and D are generated, formg a path from the root this order. The count of each node is set to because each node represents only one transaction (tid ) so far. Next, when the second transaction is serted, a node representg item B is not generated. Instead, the node already generated is reused. In this case, because the root node has a child that represents item B, the count of the node is cremented by. s for item, sce there is no child representg item under the current node, a new node with item-name is generated as a child of the current node. Similar processes are repeated until all the sorted transactions are serted to the FP-tree. Once an FP-tree is generated from the put transaction database, the algorithm mes frequent itemsets from the FP-tree. The algorithm generates itemsets from shorter ones to longer ones addg items one by one to those itemsets already generated. It divides mg the FP-tree to mg smaller FP-trees, each of which is based on an item on the Node-Lk header table Figure. Let us choose item D as an example. For item D, we generate a new transaction database called conditional pattern base. ach transaction the conditional pattern base consists of items on the paths from parent nodes whose child nodes have item-name D to the root node. The conditional pattern base for item D is shown Figure 2. ach transaction the conditional pattern base also has its count of occurrence correspondg to the count of the node with item-name D the origal FP-tree. Note that item D itself is a frequent itemset consistg of one item. Let us call this frequent itemset \D" a conditional pattern. conditional pattern base is a set of transactions each of which cludes the conditional pattern. What we do next is to forget the origal FP-tree Figure for a while and then focus on the conditional pattern base we got just now to generate frequent itemsets that clude this conditional pattern \D". For this purpose, we generate a smaller FP-tree than the origal one, based on the conditional pattern \D". This new FP-tree, called conditional FP-tree, 3

4 onditional Pattern Base of conditional pattern D item count B B item support B 20% 40% 40% 40% B Node-Lk NL null 2 2 B 2 onditional FP-tree of conditional pattern D (Sgle path FP-tree) Figure 2: onditional FP-tree is generated from the conditional pattern base usg the FP-tree generation algorithm aga. If the conditional FP-tree is not a sgle path tree, we divide mg this conditional FP-tree to mg even smaller conditional FP-trees recursively. This is repeated until we obta a conditional FP-tree with only a sgle path. Durg those recursively repeated processes, all selected items are added to the conditional pattern. Once we obta a sgle path conditional FP-tree like the one Figure 2, we generate all possible combations of items along the path and combe each of these sets of items to the conditional pattern. For example, from those three nodes the conditional FP-tree Figure 2, we have 2 3 = 8 combations of item B,, and : \ " (no item), \B", \", \", \B", \", \B", and \B". Then we obta frequent itemsets based on conditional pattern base \D": \D", \DB", \D", \D", \DB", \D", \DB", and \DB". 4 LPMer lgorithm LPMer is an itemset discovery algorithm, based on the FP-growth algorithm, which nds all the itemsets that satisfy a particular length-decreasg support constrat f(l); here l is the length of the itemset. More precisely, f(l) satises f(l a ) f(l b ) for any l a ; l b such that l a < l b. The idea of troducg this kd of support constrat is that by usg a support that decreases with the length of the itemset, we may be able to nd long itemsets, that may be of terest, without generatg an exponentially large number of shorter itemsets. Figure 3 shows a typical length-decreasg support constrat. In this example, the support constrat decreases learly to the mimum value and then stays the same for itemsets of longer length. Our problem is restated as ndg those itemsets located above the curve determed by length-decreasg support constrat f(l). simple way of ndg such itemsets is to use any of the traditional constant-support frequent itemset discovery algorithms, which the support was set to m l>0 f(l), and then discard the itemsets that do not satisfy the lengthdecreasg support constrat. This approach, however, does not reduce the number of frequent itemsets beg discovered, and as our experiments will show, requires a large amount of time. s discussed the troduction, ndg the complete set of itemsets that satisfy a length-decreasg support function is particularly challengg because we cannot use the downward closure property of the constant support frequent itemsets. This property states that order for an itemset of length l to be frequent, all of its subsets have to be frequent as well. s a result, once we nd that an itemset of length l is frequent, we know that any longer itemsets that clude this particular itemset cannot be frequent, and thus elimate such itemsets from further consideration. However, because our problem the support of an itemset decreases as its length creases, an itemset can be frequent even if its subsets are frequent. key property, regardg the itemset whose support decreases as a function of their length, is the followg. Given a particular itemset I with a support of I, such that I < f(jij), then f? ( I ) = m(fljf(l) = I g) is 4

5 support(%) 0.5 c cc f c cc c cc 0.0 c 0 length of itemset Figure 3: n example of typical length-decreasg support constrat the mimum length that an itemset I 0 such that I 0 I must have before it can potentially become frequent. Figure 4 illustrates this relation graphically. The length of I 0 is nothg more than the pot at which a le parallel to the x-axis at y = I tersects the support curve; here, we essentially assume the best case which I 0 exists and it is supported by the same set of transactions as its subset I. We will refer to this property as the smallest valid extension property or SV for short. LPMer uses this property as much as it can to prune the conditional FP-trees, that are generated durg the itemset discovery phase. In particular, it uses three dierent prung methods that, when combed, substantially reduce the search space and the overall runtime. These methods are described the rest of this section. 4. Transaction Prung, TP The rst prung scheme implemented LPMer uses the smallest valid extension property to elimate entire candidate transactions of a conditional pattern base. Recall from Section 3 that, durg frequent itemset generation, the FP-growth algorithm builds a separate FP-tree for all the transactions that conta the conditional pattern currently under consideration. Let P be that conditional pattern, jp j be its length, and (P ) be its support. If P is frequent, we know from the SV property that order for this conditional pattern to grow to somethg deed frequent, it must have a length of at least f? ((P )). Usg this requirement, before buildg the FPtree correspondg to this conditional pattern, we can elimate any transactions whose length is shorter than f? ((P ))? jp j, as these transactions cannot contribute to a valid frequent itemset which P is part of it. We will refer to this as the transaction prung method and denote it by TP. We evaluated the complexity of this method comparison with the complexity of sertg a transaction to a conditional pattern base. There are three parameters we have to know to prune a transaction: the length of each transaction beg serted, f? ((P )), and jp j. The length of each transaction is calculated a constant time added to the origal FP-growth algorithm, because we can count each item when the transaction is actually beg generated. s f? ((P )) and jp j are common values for all transactions a conditional pattern base, these values need to be calculated only once for the conditional pattern base. It takes a constant time added to the origal FP-growth algorithm to calculate jp j. s for f? ((P )), evaluatg f? takes O(log(jIj)) to execute bary search on the support table determed by f(l). Let cpb be the conditional pattern base and m = P tran2cpb jtranj. The complexity per sertg a transaction is O(log(jIj)=m). Under an assumption that all items I are contaed cpb, this value is nothg more than O(). Thus, the complexity of this method is just a constant time per sertg a transaction. 5

6 support(%) I I : SV of I length of itemset Figure 4: Smallest valid extension (SV) 4.2 Node Prung, NP The second prung method focuses on prung certa nodes of a conditional FP-tree, on which the next conditional pattern base is about to be generated. Let us consider a node v of the FP-tree. Let I(v) be the item stored at this node, (I(v)) be the support of the item the conditional pattern base, and h(v) be the height of the longest path from the root through v to a leaf node. From the SV property we know that the node v will contribute to a valid frequent itemset if and only if h(v) + jp j f? ((I(v))) () where jp j is the length of conditional pattern of the current conditional FP-tree. The reason that equation () is correct is because, among the transactions that go through node v, the longest itemset that I(v) can participate has a length of h(v). Now, if the support of I(v) is small such that it requires an itemset whose length f? ((I(v))) is greater than h(v) + jp j, then that itemset cannot be supported by any of the transactions that go through node v. Thus, if equation () does not hold, node v can be pruned from the FP-tree. Once node v is pruned, then (I(v)) will decrease as well as the height of the nodes through v, possibly allowg further prung. We will refer to this as the node prung method, or NP for short. key observation to make is that both the TP and NP methods can be used together as each one of them prunes portions of the FP-tree that the other one does not. In particular, the NP methods can prune a node a path that is longer than f? ((P ))? jp j, because the item of that node has lower support than P. On the other hand, TP reduces the frequency of some itemsets the FP-tree by removg entire short transactions. For example, consider two transactions; (, B,, D) and (, B). Let's assume that f? ((P ))? jp j = 4, and each one of the items,b,,d has a support equal to that of P. In that case, the NP will not remove any nodes, whereas TP will elimate the second transaction. In order to perform the node prung, we need to compute the height of each node and then traverse each node v to see if it violates equation (). If it does, then the node v can be pruned, the height of all the nodes whose longest path goes through v must be decremented by, and the support of I(v) needs to be decremented to take account of the removal of v. very time we make such changes the tree, nodes that could not have been pruned before may now become eligible for prung. In particular, all the rest of the nodes that have the same item I(v) needs to be rechecked, as well as all the nodes whose height was decremented upon the removal of v. Our itial experiments with such an implementation showed that the cost of performg the prung was quite often higher than the savg we achieved when used conjunction with the TP scheme. For this reason we implemented an approximate but fast version of this scheme that achieves a comparable degree of prung. 6

7 Our approximate NP algorithm itially sorts the transactions of the conditional pattern base decreasg transaction length, then traverses each transaction that order, and tries to sert them the FP-tree. Let t be one such transaction and l(t) be its length. When t is serted to the FP-tree, it may share a prex with some transactions already the FP-tree. However, as soon as the sertion of t results a new node beg created, we check to see if we can prune it usg equation (). In particular, if v is that newly created node, then h(v) = l(t), because the transactions are serted to the FP-tree decreasg length. Thus v can be pruned if l(t) + jp j < f? ((I(v))) : (2) If that can be done, the new node is elimated and the sertion of t contues to the next item. Now if one of the next items serts a new node u, then that one may be pruned usg equation (2). In equation (2), we use the origal length of the transaction l(t), not the length after the removal of the item previously pruned. The reason is that l(t) is the correct upper bound of h(u), because one of the transactions serted later may have a length of at most l(t), the same as the length of the current transaction, and can modify its height. The above approach is approximate because (I) the elimation of a node aects only the nodes that can be elimated the subsequent transactions, not the nodes already the tree; (II) we use pessimistic bounds on the height of a node (as discussed the previous paragraph). This approximate approach, however, does not crease the complexity of generatg the conditional FP-tree, beyond the sortg of the transactions the conditional pattern base. Sce the length of the transaction falls with a small range, they can be sorted usg bucket sort lear time. 4.3 Path Prung, PP Once the tree becomes a sgle path, the origal FP-growth algorithm generates all possible combations of items along the path and concatenates each of those combations with its conditional pattern. If the path contas k items, there exist a total of 2 k such combations. However, usg the SV property we can limit the number of combations that we may need to consider. Let fi ; i 2 ; : : : ; i k g be the k items such that (i j ) (i j+ ). One way of generatg all possible 2 k combations is to grow them crementally as follows. First, we create two sets, one that contas i, and the other that does not. Next, for each of these sets, we generate two new sets such that, each pair of them, one contas i 2 and the other does not, leadg to four dierent sets. By contug this process a total of k times, we will obta all possible 2 k combations of items. This approach essentially builds a bary tree with k levels of edges, which the nodes correspond to the possible combations. One such bary tree for k = 4 is shown Figure 5. To see how the SV property can be used to prune certa subgraphs of this tree (and hence combations to be explored), consider a particular ternal node v of that tree. Let h(v) be the height of the node (root has a height of zero), and let (v) be the number of edges that were one on the path from the root to v. In other words, (v) is the number of items that have been cluded so far the set. Usg the SV property we can stop expandg the tree under node v if and only if (v) + (k? h(v)) + jp j < f? ((I h(v) )) : ssentially, the above formula states that, based on the frequency of the current item, the set must have a suciently large number of items before it can be frequent. If the number of items that were already serted the set ((v)) is small plus the number of items that are left for possible sertion (k? h(v)) is not suciently large, then no frequent itemsets can be generated from this branch of the tree, and hence it can be pruned. We will refer to this method as path prung or PP for short. The complexity of PP per one bary tree is k log jij because we need to evaluate f? for k items. On the other hand, the origal FP-growth algorithm has the complexity of O(2 k ) for one bary tree. The former is much smaller for large k. For small k, this analysis tells that PP may cost more than the savg. Our experimental result, however, suggests that the eect of prung pays the price. 7

8 4 n i 3 4??n i 4 n dge to the left child = 0 dge to the right child = i H HH n H i 2 2 H HH H 3 3@ i3 n H H n i 4 Figure 5: Bary tree when k = 4 5 xperimental Results We experimentally evaluated the various search space prung methods of LPMer usg a variety of datasets generated by the synthetic transaction generator that is provided by the IBM Quest group and was used evaluatg the priori algorithm []. ll of our experiments were performed on Intel-based Lux workstations with Pentium III at 600MHz and GB of ma memory. ll the reported runtimes are seconds. We used two classes of datasets DS and DS2. Both of two classes of datasets contaed 00K transactions. For each of the two classes we generated dierent problem stances, which we varied the average size of the transactions from 3 items to 35 items for DS, obtag a total of 33 dierent datasets, DS.3, : : :, DS.35, and from 3 items to 30 items for DS2, obtag DS2.3, : : :, DS2.30. For each problem stance both of DS.x and DS2.x, we set the average size of the maximal long itemset to be x=2, so as x creases, the dataset contas longer frequent itemsets. The dierence between DS.x and DS2.x is that each problem stance DS.x consists of K items, whereas each problem stance DS2.x consists of 5K items. The characteristics of these datasets are summarized Table. parameter DS DS2 jdj Number of transactions 00K 00K jt j verage size of the transactions 3 to 35 3 to 30 jij verage size of the maximal potentially long itemsets jt j=2 jt j=2 jlj Number of maximal potentially large itemsets N Number of items Table : Parameters for datasets used our tests In all of our experiments, we used mimum support constrat that decreases learly with the length of the frequent itemsets. In particular, for each of the DS.x datasets, the itial value of support was set to 0.5 and it was decreased learly down to 0.0 for itemsets up to length x. For the rest of the itemsets, the support was kept xed at 0.0. The left graph of Figure 6 shows the shape of the support curve for DS.20. In the case of the DS2 class of datasets, we used a similar approach to generate the constrat, however stead of usg 0.0 as the mimum support, we used The right graph of Figure 6 shows the shape of the support curve for DS

9 LPMer Dataset FP-growth NP TP PP NP+TP NP+PP TP+PP NP+TP+PP DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS Table 2: omparison of prung methods usg DS LPMer Dataset FP-growth NP TP PP NP+TP NP+PP TP+PP NP+TP+PP DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS Table 3: omparison of prung methods usg DS2 9

10 Support urve for DS.20 Support urve for DS Support (%) Support (%) Length of Patterns Length of Patterns Figure 6: Support curve for DS.20 and DS Results Tables 2 and 3 show the experimental results that we obtaed for the DS and DS2 datasets, respectively. ach row of the tables shows the results obtaed for a dierent DS.x or DS2.x dataset, specied on the rst column. The remag columns show the amount of time required by dierent itemset discovery algorithms. The column labeled \FP-growth" shows the amount of time taken by the origal FP-growth algorithm usg a constant support constrat that corresponds to the smallest support of the support curve, 0.0 for DS, and for DS2. The columns under the headg \LPMer" show the amount of time required by the proposed itemset discovery algorithm that uses the decreasg support curve to prune the search space. total of seven dierent varieties of the LPMer algorithm are presented, that are dierent combations of the prung methods described Section 4. For example, the column label \NP" corresponds to the scheme that uses only node prung (Section 4.2), whereas the column labeled \NP+TP+PP" corresponds to the scheme that uses all the three dierent schemes described Section 4. Note that values with a \-" correspond to experiments that were aborted because they were takg too long time. number of terestg observations can be made from the results these tables. First, either one of the LPMer methods performs better than the FP-growth algorithm. In particular, the LPMer that uses all three prung methods does the best, requirg substantially smaller time than the FP-growth algorithm. For DS, it is about 2.2 times faster for DS.0, 8.2 times faster for DS.20, 33.4 times faster for DS.30, and 5 times faster for DS.35. Similar trends can be observed for DS2, which the performance of LPMer is 4.2 times faster for DS2.0, 2.0 times faster for DS2.20, and 55.6 times faster for DS2.27. Second, the performance gap between FP-growth and LPMer creases as the length of the discovered frequent itemset creases (recall that, for both DS.x and DS2.x, the length of the frequent itemsets creases with x). This is due to the fact that the overall itemset space that LPMer can prune becomes larger, leadg to improved relative performance. Third, comparg the dierent prung methods isolation, we can see that NP and TP lead to the largest runtime reduction and PP achieves the smallest reduction. This is not surprisg as PP can only prune itemsets durg the late stages of itemset generation. Fally, the runtime with three prung methods creases gradually as the average length of the transactions (and the discovered itemsets) creases, whereas the runtime of the origal FP-growth algorithm creases exponentially. 0

11 6 onclusion In this paper we presented an algorithm that can eciently nd all frequent itemsets that satisfy a length-decreasg support constrat. The key sight that enabled us to achieve high performance was the smallest valid extension property of the length decreasg support curve. References [] R. grawal and R. Srikant. Fast lgorithms for Mg ssociation Rules. VLDB 994, [2] R.. garwal,. ggarwal, V.V.V. Prasad, and V. restana. Tree Projection lgorithm for Generation of Large Itemsets for ssociation Rules. IBM Research Report R234, Nov, 998. [3]. ohen, M. Datar, S. Fujiwara,. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and. Yang. Fdg Interestg ssociations without Support Prung. ID 2000, [4] J. Han, J. Pei, and Y. Y. Mg Frequent Patterns without andidate Generation. SIGMOD 2000, -2. [5] B. Liu, W. Hsu, Y. Ma. Mg association rules with multiple mimum supports. SIGKDD 999, [6] K. Wang, Y. He, and J. Han. Mg Frequent Itemsets Usg Support onstrats. VLDB 2000, [7] M. J. Zaki. Scalable algorithms for association mg. I Transactions on Knowledge and Data ngeerg, 2(3): , 2000 [8] M. J. Zaki and. Hsiao. HRM: n ecient algorithm for closed association rule mg. RPI Technical Report 99-0, 999. [9] M. J. Zaki and K. Gouda. Fast Vertical Mg Usg Disets. RPI Technical Report 0-, 200

Finding Frequent Patterns Using Length-Decreasing Support Constraints

Finding Frequent Patterns Using Length-Decreasing Support Constraints Masakazu Seno and George Karypis Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455 Technical