Association Rule Learning

Size: px
Start display at page:

Download "Association Rule Learning"


1 Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

2 Acknowledgements Material derived from slides for the book Machine Learning by T. Mitchell McGraw-Hill (1997) Material derived from slides by Eibe Frank Material derived from slides for the book Machine Learning by P. Flach Cambridge University Press (2012) COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

3 1. Aims Aims This lecture will enable you to describe machine learning approaches to the problem of discovering association rules from data. Following it you should be able to: contrast supervised vs. unsupervised learning define frequent itemsets and association rules reproduce the basic algorithms for discovering frequent itemsets and association rules describe the basic measures used in association rule mining, and some refinements such as closed itemsets Relevant WEKA programs: Apriori Relevant R programs: arules COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

4 2. Introduction Data Mining for Associations Consider a supermarket, where a database records items bought by customer as a transaction. You are interested in finding associations between sets of items in all customer transactions, e.g. 90% of all transactions that purchase bread and butter also purchase milk. This can be express as the rule: purchase(bread) and purchase(butter) purchase(milk) The antecedent is the conjunction of {purchase(bread), purchase(butter)} and the consequent is purchase(milk). 90% is the confidence factor of the rule. Usually, we are interested in discovering or mining from the data sets of rules which satisfy some initial specifications. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

5 2. Introduction More Associations Some mining tasks: Find all rules that have Diet Coke as a consequent Find all rules that have bagels in the antecedent Find all rules that have sausage in the antecedent and mustard in the consequent Find all rules relating items located on aisles 9 and 10 Find all rules relating items low in stock in the last 3 days Find best k rules that have bagels in the consequent Note that best can be defined in terms of support, confidence or some other measure on rules. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

6 2. Introduction Items and transactions Transaction Items 1 nappies 2 beer, crisps 3 apples, nappies 4 beer, crisps, nappies 5 apples 6 apples, beer, crisps, nappies 7 apples, crisps 8 crisps Each transaction in this table involves a set of items; conversely, for each item we can list the transactions in which it was involved: transactions 1, 3, 4 and 6 for nappies, transactions 3, 5, 6 and 7 for apples, and so on. We can also do this for sets of items: e.g., beer and crisps were bought together in transactions 2, 4 and 6; we say that item set {beer,crisps} covers transaction set {2,4,6}. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

7 2. Introduction Figure 6.17, p.183 An item set lattice {} {Crisps} {Beer} {Nappies} {Apples} {Beer, Crisps} {Nappies, Crisps} {Nappies, Beer} {Crisps, Apples} {Beer, Apples} {Nappies, Apples} {Nappies, Beer, Crisps} {Beer, Crisps, Apples} {Nappies, Crisps, Apples} {Nappies, Beer, Apples} {Nappies, Beer, Crisps, Apples} Item sets in dotted ovals cover a single transaction; in dashed ovals, two transactions; in triangles, three transactions; and in polygons with n sides, n transactions. The maximal item sets with support 3 or more are indicated in green. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

8 2. Introduction Supervised vs. Unsupervised Learning Supervised learning pre-defined class learn to predict class (a classifier) concept learning, perceptrons, decision trees, etc. Unsupervised learning no pre-defined class find attributes which can be grouped together in characteristic patterns association rules, clustering, etc. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

9 2. Introduction Association Rules Often rules in machine learning are for classification Association rules are like classification rules except: no pre-defined class, any attribute can appear in the consequent consequent can contain 1 attributes A highly combinatoric process that can generate very many association rules... However, we can use restrictions on the coverage and accuracy of rules, plus syntactic restrictions, to dramatically reduce the number of interesting and potentially useful rules generated. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

10 3. Some definitions Association Rules Suppose we are given a set of items I, such as the set of items for sale at a retail outlet, the set of attribute-value pairs used to describe a dataset, etc. Definition Itemset. An itemset I I is some subset of items. Definition Transaction. A transaction is a pair T = (tid, I ), where tid is the transaction identifier and I is an itemset. A transaction T = (tid, I ) is said to support an itemset X if X I. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

11 3. Some definitions Association Rules Definition Transaction database. A transaction database D is a set of transactions such that each transaction has unique identifier. Definition Cover. The cover of an itemset X in D consists of the set of transaction identifiers of transactions in D that support X : cover(x,d) := {tid (tid, I ) D, X I } COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

12 3. Some definitions Association Rules Definition Support. The support of an itemset X in D is the number of transactions in the cover of X in D: support(x,d) := cover(x,d) An itemset is called frequent (or large) in D if its support in D exceeds some minimum support threshold. Note that support is often defined as a fraction, namely the ratio of transactions in the cover of X to the total number of transactions in D. For example, see: Agrawal, R., Imielinski, T. and Swami, A. (1993) Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Conference 1993, pp Either can be used, usually clear from the context. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

13 3. Some definitions Association Rules Note: support is monotonic when moving down a path in the item set lattice it can never increase. Therefore, the set of frequent item sets is convex and is fully determined by the lower boundary of the largest frequent item sets. These are the maximal frequent item sets no superset of these item sets is frequent. These properties are used in algorithms. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

14 3. Some definitions Association Rules Definition Confidence. The confidence of a rule of the form X Y in D is the proportion of transactions in the cover of X in D which are also in the cover of X Y in D: cover(x Y,D) confidence(x Y,D) := cover(x,d) We can relate these terms from database mining to the machine learning terms coverage and accuracy as follows: support (coverage) predicts correctly number of transactions (instances) for which rule confidence (accuracy) number of transactions (instances) predicted correctly as a proportion of all instances rule applies to COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

15 3. Some definitions Confidence measure for Association Rules Confidence of an association rule X Y : P(Y X ) = support(x Y ) support(x ) i.e., the probability of finding the consequent in a transaction given that the transaction also contains the antecedent. N.B. above using support as a frequency see later slide COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

16 4. Mining large (frequent) itemsets Item sets Back to the supermarket for a moment in market basket analysis each transaction (instance) contains the set of items purchased by the customer. An item is therefore a (Boolean) attribute, true if the customer bought the article in the transaction, false otherwise. Now think of the number of possible items you could buy in your local supermarket then think of the number of possible item sets we can form from those items and then think of the number of possible association rules we can form between items in each item set... COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

17 4. Mining large (frequent) itemsets Item sets with minimum support 2 for pl ay one-item sets two-item sets three-item sets four-item sets outl = sunny (5) outl = sunny outl = sunny outl = sunny temp = mild (2) temp = hot temp = hot humi = high (2) humi = high play = no (2) temp = mild (6) outl = sunny outl = sunny outl = rainy wind = true (2) humi = high humi = normal play = no (3) wind = false play = yes (2) humi = high wind = true (3) COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

18 4. Mining large (frequent) itemsets Algorithm schema for generating item sets Key idea: a k item set can only have minimum support if all of its k 1 item subsets have minimum support (are large ). First, generate all 1-item sets by making a pass through the data set and storing all those above min. support in a hash table Next, generate all 2-item sets from all pairs of 1-item sets from the hash table Then pass through data set counting the support of the 2-item sets, discarding those below min. support The same process is iterated to generate k item sets from k 1 item sets, until no more large item sets are found Note: such algorithms called Apriori -type algorithms after: R. Agrawal & R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, pp , COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

19 4. Mining large (frequent) itemsets The Apriori algorithm Agrawal and Srikant (1994) Algorithm APRIORI 1 L 1 = {large 1-itemsets} 2 for (k = 2; L k 1 ; k++) do 3 C k = apriori-gen(l k 1 ) // Get new candidates 4 forall transactions t D do 5 C t = subset(c k, t) // Candidates in transaction 6 forall candidates c C t do 7 c.count++; 8 end 9 L k = {c C k c.count minsup} 10 end COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

20 4. Mining large (frequent) itemsets Apriori-style algorithms Apriori s anti-monotone heuristic: if any size k pattern is not frequent in the database, its size k + 1 super-pattern can never be frequent. Apriori algorithm uses the anti-monotone heuristic: generates all size k + 1 patterns from frequent size k patterns; tests them for frequency against the database Advantage: a possibly significant reduction in number of candidate patterns generated. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

21 4. Mining large (frequent) itemsets Example: generating item sets Suppose we have five three-item sets: (ABC ), (ABD), (ACD), (ACE) and (BCD) where A is a feature like outlook = sunny. Let (ABCD) be the union of the first two three-item sets. It is a candidate four-item set since all of its three-item subsets are large. What are they? (ABC ), (ABD), (ACD) and (BCD). Assuming the three-item sets are sorted in lexical order, we only have to consider pairs with the same two first items; otherwise the resulting item set would have more than four items! These are (ABC ) and (ABD) plus (ACD) and (ACE), and we can forget about (BCD). The second pair leads to (ACDE). COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

22 4. Mining large (frequent) itemsets Example: generating item sets BUT all of the three-item subsets of (ACDE) are not large... What are they? (ACD) and (ACE) are large, but (ADE) and CDE) are not large. Therefore, we only have one candidate four-item set, (ABC D), which must be checked for actual minimum support on the data set. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

23 4. Mining large (frequent) itemsets Algorithm 6.6, p.184 Maximal item sets Algorithm FrequentItems(D, f 0 ) find all maximal item sets exceeding a given support threshold. Input : data D X ; support threshold f 0. Output : set of maximal frequent item sets M. 1 M ; 2 initialise priority queue Q to contain the empty item set; 3 while Q is not empty do 4 I next item set deleted from front of Q; 5 max true ; // flag to indicate whether I is maximal 6 for each possible extension I of I do 7 if Supp(I ) f 0 then 8 max false ; // frequent extension found, so I is not maximal 9 add I to back of Q; 0 end 1 end 2 if max = true then M M {I }; 3 end 4 return M COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

24 4. Mining large (frequent) itemsets Maximal item sets If a large (frequent) item set I is maximal it means that no superset of I is large Why would we want only maximal item sets? Reduces redundancy, resulting in fewer item sets Can we characterise this in terms of inductive bias? Prefer the most-specific large item-sets COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

25 4. Mining large (frequent) itemsets Figure 6.18, p.185 Closed item sets {} {Apples} {Nappies} {Crisps} {Nappies, Apples} {Crisps, Apples} {Beer, Crisps} {Nappies, Beer, Crisps} {Nappies, Beer, Crisps, Apples} Closed item set lattice corresponding to the item sets in Figure This lattice has the property that no two adjacent item sets have the same coverage. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

26 5. Mining association rules From item sets to association rules individual items (attribute-value constraints) form one-item sets two-item sets formed from pairs of one-item sets (except pairs of same attribute with different values), and so on... all item sets have a support (coverage) on the data set any item sets below minimum support are discarded, leaving so-called large item sets generate association rules with a certain minimum confidence from each item set COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

27 5. Mining association rules From item sets to association rules For example, a 3-item set with support 4: humidity = normal, windy = false, play = yes leads to seven potential rules: Rule Conf If humidity = normal and windy = false Then play = yes 4/4 If humidity = normal and play = yes Then windy = false 4/6 If windy = false and play = yes Then humidity = normal 4/6 If humidity = normal Then windy = false and play = yes 4/7 If windy = false Then humidity = normal and play = yes 4/8 If play = yes Then humidity = normal and windy = false 4/9 Then humidity = normal and windy = false and play = yes 4/14 COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

28 5. Mining association rules Generating association rules efficiently Two stage algorithm: generate item sets with specified minimum support (coverage) generate association rules with specified minimum confidence (accuracy) from each item set Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp ) Menlo Park: AAAI Press. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

29 5. Mining association rules From item-sets to association rules Key idea: say we have a rule A B C D with greater than minimum support and minimum confidence c 1. This is a double-consequent rule. Now consider the two single-consequent rules A B C D A B D C These must also hold with greater than minimum support and confidence. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

30 5. Mining association rules From item-sets to association rules To see why, first consider support: this must be greater than minimum support since it is the same for each single-consequent as for the double-consequent rule. Let the rule A B C D have confidence c 2. This confidence c 2 c 1, because the denominator (coverage) for c 1 must be that for c 2, since the antecedent of the double-consequent rule has fewer conditions than the single-consequent rule, therefore must cover more instances than the antecedent of the second rule. What is another way of explaining this? Antecedent A B is more general than antecedent A B C. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

31 5. Mining association rules From item-sets to association rules Now consider the converse, for a given item-set: If any of the single-consequent rules which can form a double-consequent rule do not have better than minimum confidence, then do not consider the double-consequent rule, since it cannot have better than minimum confidence. This gives a basis for an efficient algorithm for constructing rules by building up from candidate single-consequent rules to candidate double-consequent rules, etc. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

32 5. Mining association rules Algorithm schema for generating association rules First, generate confidences for all single-consequent rules Discard any below minimum confidence Build up candidate double consequent rules from retained single-consequent rules Iterate as for construction of large item sets COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

33 5. Mining association rules Association rules I Frequent item sets can be used to build association rules, which are rules of the form if B then H where both body B and head H are item sets that frequently appear in transactions together. Pick any edge in Figure 6.17, say the edge between {beer} and {nappies,beer}. We know that the support of the former is 3 and of the latter, 2: that is, three transactions involve beer and two of those involve nappies as well. We say that the confidence of the association rule if beer then nappies is 2/3. Likewise, the edge between {nappies} and {nappies,beer} demonstrates that the confidence of the rule if nappies then beer is 2/4. There are also rules with confidence 1, such as if beer then crisps ; and rules with empty bodies, such as if true then crisps, which has confidence 5/8 (i.e., five out of eight transactions involve crisps). COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

34 5. Mining association rules Association rules II But we only want to construct association rules that involve frequent items. The rule if beer apples then crisps has confidence 1, but there is only one transaction involving all three and so this rule is not strongly supported by the data. So we first use Algorithm 6.6 to mine for frequent item sets; we then select bodies B and heads H from each frequent set m, discarding rules whose confidence is below a given confidence threshold. Notice that we are free to discard some of the items in the maximal frequent sets (i.e., H B may be a proper subset of m), because any subset of a frequent item set is frequent as well. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

35 5. Mining association rules Algorithm 6.7, p.185 Association rule mining Algorithm AssociationRules(D, f 0,c 0 ) find all association rules exceeding given support and confidence thresholds. Input : data D X ; support threshold f 0 ; confidence threshold c 0. Output : set of association rules R. 1 R ; 2 M FrequentItems(D, f 0 ) ; // FrequentItems: see Algorithm for each m M do 4 for each H m and B m such that H B = do 5 if Supp(B H)/Supp(B) c 0 then R R { if B then H }; 6 end 7 end 8 return R COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

36 5. Mining association rules Association rule example A run of the algorithm with support threshold 3 and confidence threshold 0.6 gives the following association rules: if beer then crisps support 3, confidence 3/3 if crisps then beer support 3, confidence 3/5 if true then crisps support 5, confidence 5/8 Association rule mining often includes a post-processing stage in which superfluous rules are filtered out, e.g., special cases which don t have higher confidence than the general case. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

37 5. Mining association rules Post-processing One quantity that is often used in post-processing is lift, defined as n Supp(B H) Lift( if B then H ) = Supp(B) Supp(H) where n is the number of transactions. For example, for the the first two association rules above we would have lifts of = 1.6, as Lift( if B then H ) = Lift( if H then B ). For the third rule we have Lift( if true then crisps ) = = 1. This holds for any rule with B =, as Lift( if then H ) = n Supp( H) Supp( ) Supp(H) = n Supp(H) n Supp(H) = 1 More generally, a lift of 1 means that Supp(B H) is entirely determined by the marginal frequencies Supp(B) and Supp(H) and is not the result of any meaningful interaction between B and H. Only association rules with lift larger than 1 are of interest. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

38 Length=4 & Beak=yes & Teeth=many Length=4 & Teeth=many Length=4 & Gills=no & Teeth=many Length=4 & Beak=yes Length=4 & Gills=no & Beak=yes & Teeth=many Length=4 Length=4 & Gills=no & Beak=yes Length=4 & Gills=no Beak=yes & Teeth=many 5. Mining association rules Length=3 & Teeth=many Teeth=many Length=3 & Beak=yes & Teeth=many Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Teeth=many Length=5 & Beak=yes & Teeth=many Length=5 & Gills=no & Teeth=many Length=3 & Gills=no & Beak=yes Length=3 & Gills=no & Beak=yes & Teeth=many Gills=no & Teeth=many Length=5 & Teeth=many Length=3 & Beak=yes Length=5 & Gills=no & Beak=yes & Teeth=many Beak=yes true Length=3 Gills=no & Beak=yes Gills=no Length=3 & Gills=no Length=5 Length=5 & Beak=yes Length=5 & Gills=no & Beak=yes Length=5 & Gills=no Length=3 & Beak=yes & Teeth=few Length=3 & Teeth=few Length=3 & Gills=no & Beak=yes & Teeth=few Teeth=few Beak=yes & Teeth=few Gills=no & Teeth=few Length=5 & Teeth=few Length=3 & Gills=no & Teeth=few Gills=no & Beak=yes & Teeth=few Length=5 & Beak=yes & Teeth=few Length=5 & Gills=no & Teeth=few Length=5 & Gills=no & Beak=yes & Teeth=few Figure 6.19, p.187 Item sets and dolphins The item set lattice corresponding to the positive examples of the dolphin example in Example 4.4. Each item is a literal Feature = Value; each feature can occur at most once in an item set. The resulting structure is exactly the same as what was called the hypothesis space in Chapter 4. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

39 5. Mining association rules Figure 6.20, p.188 Closed item sets and dolphins Gills=no & Beak=yes Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Beak=yes Length=5 & Gills=no & Beak=yes Gills=no & Beak=yes & Teeth=few Length=4 & Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Beak=yes & Teeth=many Length=5 & Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Beak=yes & Teeth=few Length=5 & Gills=no & Beak=yes & Teeth=few Closed item set lattice corresponding to the item sets in Figure COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

40 5. Mining association rules Implementations of Association Rule mining Many implementations, including Weka More optimised versions exist, e.g., COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

41 5. Mining association rules Apriori in Weka Relation: weather.symbolic Instances: 14 Attributes: 5 outlook temperature humidity windy play Apriori ======= Minimum support: 0.15 (2 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 17 Generated sets of large itemsets: Size of set of large itemsets L(1): 12 Size of set of large itemsets L(2): 47 Size of set of large itemsets L(3): 39 Size of set of large itemsets L(4): 6 COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

42 5. Mining association rules Apriori in Weka Best rules found: 1. outlook=overcast 4 ==> play=yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1] conv:(1.43) 2. temperature=cool 4 ==> humidity=normal 4 <conf:(1)> lift:(2) lev:(0.14) [2] conv:(2) 3. humidity=normal windy=false 4 ==> play=yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1] conv:(1.43) 4. outlook=sunny play=no 3 ==> humidity=high 3 <conf:(1)> lift:(2) lev:(0.11) [1] conv:(1.5) 5. outlook=sunny humidity=high 3 ==> play=no 3 <conf:(1)> lift:(2.8) lev:(0.14) [1] conv:(1.93) 6. outlook=rainy play=yes 3 ==> windy=false 3 <conf:(1)> lift:(1.75) lev:(0.09) [1] conv:(1.29) 7. outlook=rainy windy=false 3 ==> play=yes 3 <conf:(1)> lift:(1.56) lev:(0.08) [1] conv:(1.07) 8. temperature=cool play=yes 3 ==> humidity=normal 3 <conf:(1)> lift:(2) lev:(0.11) [1] conv:(1.5) 9. outlook=sunny temperature=hot 2 ==> humidity=high 2 <conf:(1)> lift:(2) lev:(0.07) [1] conv:(1) 10. temperature=hot play=no 2 ==> outlook=sunny 2 <conf:(1)> lift:(2.8) lev:(0.09) [1] conv:(1.29) COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

43 5. Mining association rules Interpreting association rules Interpretation of association rules is not always obvious; for example: If windy = false and play = no then outlook = sunny and humidity = high is not the same as: If windy = false and play = no then outlook = sunny If windy = false and play = no then humidity = high However, it means that the following also holds: If humidity = high and windy = false and play = no then outlook = sunny COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

44 6. Summary Summary a form of unsupervised learning the main contribution from database mining concerned with scalability simple basic algorithm learning = search through a generalization lattice many extensions numerical, temporal, spatial, negation, closed & free itemsets,... sparse instance representation rule interestingness measures (confidence, lift, conviction, etc.) COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

45 6. Summary Association rules summary Other algorithms: ECLAT, FP-growth use more complex data structures to reduce complexity of search ECLAT (Zaki et al.) uses set intersections, etc. FP-Growth (Han et al.) uses prefix tree for maximal frequent item-set mining - CHARM (Zaki et al.) for closed frequent item-set mining - CLOSET+, CHARM, FIMI, etc. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, / 45

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12 Association Rules Charles Sutton Data Mining and Exploration Spring 2012 Based on slides by Chris Williams and Amos Storkey The Goal Find patterns : local regularities that occur more often than you would

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Induction of Association Rules: Apriori Implementation

Induction of Association Rules: Apriori Implementation 1 Induction of Association Rules: Apriori Implementation Christian Borgelt and Rudolf Kruse Department of Knowledge Processing and Language Engineering School of Computer Science Otto-von-Guericke-University

More information

Concept Learning (2) Aims. Introduction. t Previously we looked at concept learning in an arbitrary conjunctive representation

Concept Learning (2) Aims. Introduction. t Previously we looked at concept learning in an arbitrary conjunctive representation Acknowledgements Concept Learning (2) 14s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales Material derived from slides for the book

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Discovering interesting rules from financial data

Discovering interesting rules from financial data Discovering interesting rules from financial data Przemysław Sołdacki Institute of Computer Science Warsaw University of Technology Ul. Andersa 13, 00-159 Warszawa Tel: +48 609129896 email:

More information

Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

BCB 713 Module Spring 2011

BCB 713 Module Spring 2011 Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions

More information

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

More information

Roadmap. PCY Algorithm

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other

More information

Tutorial on Association Rule Mining

Tutorial on Association Rule Mining Tutorial on Association Rule Mining Yang Yang DKE Group, 78-625 August 13, 2010 Outline 1 Quick Review 2 Apriori Algorithm 3 FP-Growth Algorithm 4 Mining Flickr and Tag Recommendation

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

Association Rule Discovery

Association Rule Discovery Association Rule Discovery Association Rules describe frequent co-occurences in sets an itemset is a subset A of all possible items I Example Problems: Which products are frequently bought together by

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

Sequential Data. COMP 527 Data Mining Danushka Bollegala

Sequential Data. COMP 527 Data Mining Danushka Bollegala Sequential Data COMP 527 Data Mining Danushka Bollegala Types of Sequential Data Natural Language Texts Lexical or POS patterns that represent semantic relations between entities Tim Cook is the CEO of

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2017 Han, Kamber & Pei. All

More information


CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/19/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

Rule induction. Dr Beatriz de la Iglesia

Rule induction. Dr Beatriz de la Iglesia Rule induction Dr Beatriz de la Iglesia email: Outline What are rules? Rule Evaluation Classification rules Association rules 2 Rule induction (RI) As their name suggests, RI algorithms

More information

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road

More information

Association Rule Discovery

Association Rule Discovery Association Rule Discovery Association Rules describe frequent co-occurences in sets an item set is a subset A of all possible items I Example Problems: Which products are frequently bought together by

More information

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB) Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B

More information

Association Rules Apriori Algorithm

Association Rules Apriori Algorithm Association Rules Apriori Algorithm Market basket analysis n Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner n Putting both items on promotion at the

More information

Association rule mining

Association rule mining Association rule mining Association rule induction: Originally designed for market basket analysis. Aims at finding patterns in the shopping behavior of customers of supermarkets, mail-order companies,

More information

A STUDY ON ASSOCIATION RULES MINING ALGORITHMS S.Saveetha, M.Phil Scholar, Sengunthar Arts & Science College, Tiruchengode, Tamilnadu, India.

A STUDY ON ASSOCIATION RULES MINING ALGORITHMS S.Saveetha, M.Phil Scholar, Sengunthar Arts & Science College, Tiruchengode, Tamilnadu, India. A STUDY ON ASSOCIATION RULES MINING ALGORITHMS S.Saveetha, M.Phil Scholar, Sengunthar Arts & Science College, Tiruchengode, Tamilnadu, India. S.Saravanan, Assistant Professor, Sengunthar Arts & Science

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Vladimir Estivill-Castro School of Computing and Information Technology With contributions fromj. Han 1 Association Rule Mining A typical example is market basket

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:

More information


COMPARISON OF K-MEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS ABSTRACT International Journal On Engineering Technology and Sciences IJETS COMPARISON OF K-MEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS Dr.C.Kumar Charliepaul 1 G.Immanual Gnanadurai 2 Principal Assistant

More information


OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING ES200 Peterhouse College, Cambridge Frans Coenen, Paul Leng and Graham Goulbourne The Department of Computer Science The University of Liverpool

More information

2. Discovery of Association Rules

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study Mirzaei.Afshin 1, Sheikh.Reza 2 1 Department of Industrial Engineering and

More information

Chapter 6: Association Rules

Chapter 6: Association Rules Chapter 6: Association Rules Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model. Transaction data (no time-dependent) Assume all data are categorical. No good

More information

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find

More information

Association Rules and

Association Rules and Association Rules and Sequential Patterns Road Map Frequent itemsets and rules Apriori algorithm FP-Growth Data formats Class association rules Sequential patterns. GSP algorithm 2 Objectives Association

More information

Road Map. Objectives. Objectives. Frequent itemsets and rules. Items and transactions. Association Rules and Sequential Patterns

Road Map. Objectives. Objectives. Frequent itemsets and rules. Items and transactions. Association Rules and Sequential Patterns Road Map Association Rules and Sequential Patterns Frequent itemsets and rules Apriori algorithm FP-Growth Data formats Class association rules Sequential patterns. GSP algorithm 2 Objectives Association

More information

Chapter 7: Frequent Itemsets and Association Rules

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line

More information

Association mining rules

Association mining rules Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

Lecture 2 Wednesday, August 22, 2007

Lecture 2 Wednesday, August 22, 2007 CS 6604: Data Mining Fall 2007 Lecture 2 Wednesday, August 22, 2007 Lecture: Naren Ramakrishnan Scribe: Clifford Owens 1 Searching for Sets The canonical data mining problem is to search for frequent subsets

More information

Effectiveness of Freq Pat Mining

Effectiveness of Freq Pat Mining Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Fundamental Data Mining Algorithms

Fundamental Data Mining Algorithms 2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University REVIEW What is Data

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Knowledge Discovery in Databases

Knowledge Discovery in Databases Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 3: Frequent Itemsets

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, Dhirendra Kumar Jha, MTRI, Bhopal,

More information

Production rule is an important element in the expert system. By interview with

Production rule is an important element in the expert system. By interview with 2 Literature review Production rule is an important element in the expert system By interview with the domain experts, we can induce the rules and store them in a truth maintenance system An assumption-based

More information

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center Mining Association Rules with Item Constraints Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120, U.S.A. fsrikant,qvu,

More information

Optimized Frequent Pattern Mining for Classified Data Sets

Optimized Frequent Pattern Mining for Classified Data Sets Optimized Frequent Pattern Mining for Classified Data Sets A Raghunathan Deputy General Manager-IT, Bharat Heavy Electricals Ltd, Tiruchirappalli, India K Murugesan Assistant Professor of Mathematics,

More information

Association Rules. A. Bellaachia Page: 1

Association Rules. A. Bellaachia Page: 1 Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...

More information

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home

More information

High dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.

High dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams. High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Network Analysis

More information

Association Rule Mining and Clustering

Association Rule Mining and Clustering Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:

More information

Supervised and Unsupervised Learning (II)

Supervised and Unsupervised Learning (II) Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised

More information

Product presentations can be more intelligently planned

Product presentations can be more intelligently planned Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA ( Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Practical Data Mining COMP-321B. Tutorial 6: Association Rules

Practical Data Mining COMP-321B. Tutorial 6: Association Rules Practical Data Mining COMP-321B Tutorial 6: Association Rules Gabi Schmidberger Mark Hall September 11, 2006 c 2006 University of Waikato 1 Introduction This tutorial is about association rule learners.

More information


ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE Mustapha Muhammad Abubakar Dept. of computer Science & Engineering, Sharda University,Greater Noida, UP, (India) ABSTRACT Apriori algorithm

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University 1/8/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 2 Supermarket shelf

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj},

More information


2 CONTENTS Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a Road Map..................................... 3 5.1.1 Market Basket Analysis: A Motivating Example........................

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo

More information

Chapter 7: Frequent Itemsets and Association Rules

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12 VII.1-1 Chapter VII: Frequent Itemsets and Association

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent

More information

Association Rules. Berlin Chen References:

Association Rules. Berlin Chen References: Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University Ye-In Chang Dept. of Computer Science and

More information

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on

More information

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm Expert Systems: Final (Research Paper) Project Daniel Josiah-Akintonde December

More information

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Topics to be covered Market Basket Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules; Frequent Pattern Mining, Efficient

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Association Rules Apriori Algorithm

Association Rules Apriori Algorithm Association Rules Apriori Algorithm Market basket analysis n Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner n Putting both items on promotion at the

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Association Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.

Association Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model. Comp 135 Machine Learning Computer Science Tufts University Fall 2017 Roni Khardon Unsupervised learning but complementary to data exploration in clustering. The goal is to find weak implications in the

More information

Advance Association Analysis

Advance Association Analysis Advance Association Analysis 1 Minimum Support Threshold 3 Effect of Support Distribution Many real data sets have skewed support distribution Support distribution of a retail data set 4 Effect of Support

More information

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on   to remove this watermark. 119 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 120 CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM 5.1. INTRODUCTION Association rule mining, one of the most important and well researched

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets : A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets J. Tahmores Nezhad ℵ, M.H.Sadreddini Abstract In recent years, various algorithms for mining closed frequent

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information

We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long

We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long 1/21/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets, 1 We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long Requires proving theorems

More information


CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS 23 CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS This chapter introduces the concepts of association rule mining. It also proposes two algorithms based on, to calculate

More information

Parallel Mining Association Rules in Calculation Grids

Parallel Mining Association Rules in Calculation Grids ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. ISSN:

More information

FP-Growth algorithm in Data Compression frequent patterns

FP-Growth algorithm in Data Compression frequent patterns FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : Abstract-The transmission

More information

Data Mining Clustering

Data Mining Clustering Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0

More information