Association Rule Learning

Similar documents
Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Association Pattern Mining. Lijun Zhang

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

Improved Frequent Pattern Mining Algorithm with Indexing

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Induction of Association Rules: Apriori Implementation

Concept Learning (2) Aims. Introduction. t Previously we looked at concept learning in an arbitrary conjunctive representation

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Discovering interesting rules from financial data

Machine Learning: Symbolische Ansätze

CS570 Introduction to Data Mining

BCB 713 Module Spring 2011

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Roadmap. PCY Algorithm

Mining Association Rules in Large Databases

Tutorial on Association Rule Mining

Association Rule Mining. Entscheidungsunterstützungssysteme

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Association Rule Discovery

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Sequential Data. COMP 527 Data Mining Danushka Bollegala

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

CSE 5243 INTRO. TO DATA MINING

Chapter 4: Association analysis:

Performance and Scalability: Apriori Implementa6on

Rule induction. Dr Beatriz de la Iglesia

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Association Rule Discovery

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Frequent Pattern Mining

Association Rules Apriori Algorithm

Association rule mining

A STUDY ON ASSOCIATION RULES MINING ALGORITHMS S.Saveetha, M.Phil Scholar, Sengunthar Arts & Science College, Tiruchengode, Tamilnadu, India.

Mining Association Rules in Large Databases

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

CompSci 516 Data Intensive Computing Systems

COMPARISON OF K-MEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS

OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING

2. Discovery of Association Rules

A Data Mining Framework for Extracting Product Sales Patterns in Retail Store Transactions Using Association Rules: A Case Study

Chapter 6: Association Rules

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Association Rules and

Road Map. Objectives. Objectives. Frequent itemsets and rules. Items and transactions. Association Rules and Sequential Patterns

Chapter 7: Frequent Itemsets and Association Rules

Association mining rules

Data Mining Part 3. Associations Rules

Lecture 2 Wednesday, August 22, 2007

Effectiveness of Freq Pat Mining

An Improved Apriori Algorithm for Association Rules

Fundamental Data Mining Algorithms

Data Mining Course Overview

Knowledge Discovery in Databases

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

Production rule is an important element in the expert system. By interview with

Mining Association Rules with Item Constraints. Ramakrishnan Srikant and Quoc Vu and Rakesh Agrawal. IBM Almaden Research Center

Optimized Frequent Pattern Mining for Classified Data Sets

Association Rules. A. Bellaachia Page: 1

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

High dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.

Association Rule Mining and Clustering

Supervised and Unsupervised Learning (II)

Product presentations can be more intelligently planned

Temporal Weighted Association Rule Mining for Classification

Practical Data Mining COMP-321B. Tutorial 6: Association Rules

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

2 CONTENTS

Mining Frequent Patterns without Candidate Generation

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

Chapter 7: Frequent Itemsets and Association Rules

Frequent Pattern Mining

Association Rules. Berlin Chen References:

Data Mining and Knowledge Discovery: Practice Notes

A Graph-Based Approach for Mining Closed Large Itemsets

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations

Jeffrey D. Ullman Stanford University

Transforming Quantitative Transactional Databases into Binary Tables for Association Rule Mining Using the Apriori Algorithm

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L

Association Rule Mining. Introduction 46. Study core 46

Association Rules Apriori Algorithm

Data Mining and Knowledge Discovery: Practice Notes

Association Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.

Advance Association Analysis

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Chapter 4: Algorithms CS 795

We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS

Parallel Mining Association Rules in Calculation Grids

FP-Growth algorithm in Data Compression frequent patterns

Data Mining Clustering

Transcription:

Association Rule Learning 16s1: COMP9417 Machine Learning and Data Mining School of Computer Science and Engineering, University of New South Wales March 15, 2016 COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 1 / 45

Acknowledgements Material derived from slides for the book Machine Learning by T. Mitchell McGraw-Hill (1997) http://www-2.cs.cmu.edu/~tom/mlbook.html Material derived from slides by Eibe Frank http://www.cs.waikato.ac.nz/ml/weka Material derived from slides for the book Machine Learning by P. Flach Cambridge University Press (2012) http://cs.bris.ac.uk/~flach/mlbook COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 2 / 45

1. Aims Aims This lecture will enable you to describe machine learning approaches to the problem of discovering association rules from data. Following it you should be able to: contrast supervised vs. unsupervised learning define frequent itemsets and association rules reproduce the basic algorithms for discovering frequent itemsets and association rules describe the basic measures used in association rule mining, and some refinements such as closed itemsets Relevant WEKA programs: Apriori Relevant R programs: arules COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 3 / 45

2. Introduction Data Mining for Associations Consider a supermarket, where a database records items bought by customer as a transaction. You are interested in finding associations between sets of items in all customer transactions, e.g. 90% of all transactions that purchase bread and butter also purchase milk. This can be express as the rule: purchase(bread) and purchase(butter) purchase(milk) The antecedent is the conjunction of {purchase(bread), purchase(butter)} and the consequent is purchase(milk). 90% is the confidence factor of the rule. Usually, we are interested in discovering or mining from the data sets of rules which satisfy some initial specifications. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 4 / 45

2. Introduction More Associations Some mining tasks: Find all rules that have Diet Coke as a consequent Find all rules that have bagels in the antecedent Find all rules that have sausage in the antecedent and mustard in the consequent Find all rules relating items located on aisles 9 and 10 Find all rules relating items low in stock in the last 3 days Find best k rules that have bagels in the consequent Note that best can be defined in terms of support, confidence or some other measure on rules. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 5 / 45

2. Introduction Items and transactions Transaction Items 1 nappies 2 beer, crisps 3 apples, nappies 4 beer, crisps, nappies 5 apples 6 apples, beer, crisps, nappies 7 apples, crisps 8 crisps Each transaction in this table involves a set of items; conversely, for each item we can list the transactions in which it was involved: transactions 1, 3, 4 and 6 for nappies, transactions 3, 5, 6 and 7 for apples, and so on. We can also do this for sets of items: e.g., beer and crisps were bought together in transactions 2, 4 and 6; we say that item set {beer,crisps} covers transaction set {2,4,6}. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 6 / 45

2. Introduction Figure 6.17, p.183 An item set lattice {} {Crisps} {Beer} {Nappies} {Apples} {Beer, Crisps} {Nappies, Crisps} {Nappies, Beer} {Crisps, Apples} {Beer, Apples} {Nappies, Apples} {Nappies, Beer, Crisps} {Beer, Crisps, Apples} {Nappies, Crisps, Apples} {Nappies, Beer, Apples} {Nappies, Beer, Crisps, Apples} Item sets in dotted ovals cover a single transaction; in dashed ovals, two transactions; in triangles, three transactions; and in polygons with n sides, n transactions. The maximal item sets with support 3 or more are indicated in green. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 7 / 45

2. Introduction Supervised vs. Unsupervised Learning Supervised learning pre-defined class learn to predict class (a classifier) concept learning, perceptrons, decision trees, etc. Unsupervised learning no pre-defined class find attributes which can be grouped together in characteristic patterns association rules, clustering, etc. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 8 / 45

2. Introduction Association Rules Often rules in machine learning are for classification Association rules are like classification rules except: no pre-defined class, any attribute can appear in the consequent consequent can contain 1 attributes A highly combinatoric process that can generate very many association rules... However, we can use restrictions on the coverage and accuracy of rules, plus syntactic restrictions, to dramatically reduce the number of interesting and potentially useful rules generated. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 9 / 45

3. Some definitions Association Rules Suppose we are given a set of items I, such as the set of items for sale at a retail outlet, the set of attribute-value pairs used to describe a dataset, etc. Definition Itemset. An itemset I I is some subset of items. Definition Transaction. A transaction is a pair T = (tid, I ), where tid is the transaction identifier and I is an itemset. A transaction T = (tid, I ) is said to support an itemset X if X I. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 10 / 45

3. Some definitions Association Rules Definition Transaction database. A transaction database D is a set of transactions such that each transaction has unique identifier. Definition Cover. The cover of an itemset X in D consists of the set of transaction identifiers of transactions in D that support X : cover(x,d) := {tid (tid, I ) D, X I } COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 11 / 45

3. Some definitions Association Rules Definition Support. The support of an itemset X in D is the number of transactions in the cover of X in D: support(x,d) := cover(x,d) An itemset is called frequent (or large) in D if its support in D exceeds some minimum support threshold. Note that support is often defined as a fraction, namely the ratio of transactions in the cover of X to the total number of transactions in D. For example, see: Agrawal, R., Imielinski, T. and Swami, A. (1993) Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Conference 1993, pp 26 28. Either can be used, usually clear from the context. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 12 / 45

3. Some definitions Association Rules Note: support is monotonic when moving down a path in the item set lattice it can never increase. Therefore, the set of frequent item sets is convex and is fully determined by the lower boundary of the largest frequent item sets. These are the maximal frequent item sets no superset of these item sets is frequent. These properties are used in algorithms. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 13 / 45

3. Some definitions Association Rules Definition Confidence. The confidence of a rule of the form X Y in D is the proportion of transactions in the cover of X in D which are also in the cover of X Y in D: cover(x Y,D) confidence(x Y,D) := cover(x,d) We can relate these terms from database mining to the machine learning terms coverage and accuracy as follows: support (coverage) predicts correctly number of transactions (instances) for which rule confidence (accuracy) number of transactions (instances) predicted correctly as a proportion of all instances rule applies to COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 14 / 45

3. Some definitions Confidence measure for Association Rules Confidence of an association rule X Y : P(Y X ) = support(x Y ) support(x ) i.e., the probability of finding the consequent in a transaction given that the transaction also contains the antecedent. N.B. above using support as a frequency see later slide COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 15 / 45

4. Mining large (frequent) itemsets Item sets Back to the supermarket for a moment...... in market basket analysis each transaction (instance) contains the set of items purchased by the customer. An item is therefore a (Boolean) attribute, true if the customer bought the article in the transaction, false otherwise. Now think of the number of possible items you could buy in your local supermarket...... then think of the number of possible item sets we can form from those items...... and then think of the number of possible association rules we can form between items in each item set... COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 16 / 45

4. Mining large (frequent) itemsets Item sets with minimum support 2 for pl ay one-item sets two-item sets three-item sets four-item sets outl = sunny (5) outl = sunny outl = sunny outl = sunny temp = mild (2) temp = hot temp = hot humi = high (2) humi = high play = no (2)............ temp = mild (6) outl = sunny outl = sunny outl = rainy wind = true (2) humi = high humi = normal play = no (3) wind = false play = yes (2)............ humi = high wind = true (3) COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 17 / 45

4. Mining large (frequent) itemsets Algorithm schema for generating item sets Key idea: a k item set can only have minimum support if all of its k 1 item subsets have minimum support (are large ). First, generate all 1-item sets by making a pass through the data set and storing all those above min. support in a hash table Next, generate all 2-item sets from all pairs of 1-item sets from the hash table Then pass through data set counting the support of the 2-item sets, discarding those below min. support The same process is iterated to generate k item sets from k 1 item sets, until no more large item sets are found Note: such algorithms called Apriori -type algorithms after: R. Agrawal & R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, pp 478 499, 1994. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 18 / 45

4. Mining large (frequent) itemsets The Apriori algorithm Agrawal and Srikant (1994) Algorithm APRIORI 1 L 1 = {large 1-itemsets} 2 for (k = 2; L k 1 ; k++) do 3 C k = apriori-gen(l k 1 ) // Get new candidates 4 forall transactions t D do 5 C t = subset(c k, t) // Candidates in transaction 6 forall candidates c C t do 7 c.count++; 8 end 9 L k = {c C k c.count minsup} 10 end COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 19 / 45

4. Mining large (frequent) itemsets Apriori-style algorithms Apriori s anti-monotone heuristic: if any size k pattern is not frequent in the database, its size k + 1 super-pattern can never be frequent. Apriori algorithm uses the anti-monotone heuristic: generates all size k + 1 patterns from frequent size k patterns; tests them for frequency against the database Advantage: a possibly significant reduction in number of candidate patterns generated. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 20 / 45

4. Mining large (frequent) itemsets Example: generating item sets Suppose we have five three-item sets: (ABC ), (ABD), (ACD), (ACE) and (BCD) where A is a feature like outlook = sunny. Let (ABCD) be the union of the first two three-item sets. It is a candidate four-item set since all of its three-item subsets are large. What are they? (ABC ), (ABD), (ACD) and (BCD). Assuming the three-item sets are sorted in lexical order, we only have to consider pairs with the same two first items; otherwise the resulting item set would have more than four items! These are (ABC ) and (ABD) plus (ACD) and (ACE), and we can forget about (BCD). The second pair leads to (ACDE). COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 21 / 45

4. Mining large (frequent) itemsets Example: generating item sets BUT all of the three-item subsets of (ACDE) are not large... What are they? (ACD) and (ACE) are large, but (ADE) and CDE) are not large. Therefore, we only have one candidate four-item set, (ABC D), which must be checked for actual minimum support on the data set. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 22 / 45

4. Mining large (frequent) itemsets Algorithm 6.6, p.184 Maximal item sets Algorithm FrequentItems(D, f 0 ) find all maximal item sets exceeding a given support threshold. Input : data D X ; support threshold f 0. Output : set of maximal frequent item sets M. 1 M ; 2 initialise priority queue Q to contain the empty item set; 3 while Q is not empty do 4 I next item set deleted from front of Q; 5 max true ; // flag to indicate whether I is maximal 6 for each possible extension I of I do 7 if Supp(I ) f 0 then 8 max false ; // frequent extension found, so I is not maximal 9 add I to back of Q; 0 end 1 end 2 if max = true then M M {I }; 3 end 4 return M COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 23 / 45

4. Mining large (frequent) itemsets Maximal item sets If a large (frequent) item set I is maximal it means that no superset of I is large Why would we want only maximal item sets? Reduces redundancy, resulting in fewer item sets Can we characterise this in terms of inductive bias? Prefer the most-specific large item-sets COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 24 / 45

4. Mining large (frequent) itemsets Figure 6.18, p.185 Closed item sets {} {Apples} {Nappies} {Crisps} {Nappies, Apples} {Crisps, Apples} {Beer, Crisps} {Nappies, Beer, Crisps} {Nappies, Beer, Crisps, Apples} Closed item set lattice corresponding to the item sets in Figure 6.17. This lattice has the property that no two adjacent item sets have the same coverage. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 25 / 45

5. Mining association rules From item sets to association rules individual items (attribute-value constraints) form one-item sets two-item sets formed from pairs of one-item sets (except pairs of same attribute with different values), and so on... all item sets have a support (coverage) on the data set any item sets below minimum support are discarded, leaving so-called large item sets generate association rules with a certain minimum confidence from each item set COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 26 / 45

5. Mining association rules From item sets to association rules For example, a 3-item set with support 4: humidity = normal, windy = false, play = yes leads to seven potential rules: Rule Conf If humidity = normal and windy = false Then play = yes 4/4 If humidity = normal and play = yes Then windy = false 4/6 If windy = false and play = yes Then humidity = normal 4/6 If humidity = normal Then windy = false and play = yes 4/7 If windy = false Then humidity = normal and play = yes 4/8 If play = yes Then humidity = normal and windy = false 4/9 Then humidity = normal and windy = false and play = yes 4/14 COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 27 / 45

5. Mining association rules Generating association rules efficiently Two stage algorithm: generate item sets with specified minimum support (coverage) generate association rules with specified minimum confidence (accuracy) from each item set Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining (pp. 307 328) Menlo Park: AAAI Press. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 28 / 45

5. Mining association rules From item-sets to association rules Key idea: say we have a rule A B C D with greater than minimum support and minimum confidence c 1. This is a double-consequent rule. Now consider the two single-consequent rules A B C D A B D C These must also hold with greater than minimum support and confidence. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 29 / 45

5. Mining association rules From item-sets to association rules To see why, first consider support: this must be greater than minimum support since it is the same for each single-consequent as for the double-consequent rule. Let the rule A B C D have confidence c 2. This confidence c 2 c 1, because the denominator (coverage) for c 1 must be that for c 2, since the antecedent of the double-consequent rule has fewer conditions than the single-consequent rule, therefore must cover more instances than the antecedent of the second rule. What is another way of explaining this? Antecedent A B is more general than antecedent A B C. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 30 / 45

5. Mining association rules From item-sets to association rules Now consider the converse, for a given item-set: If any of the single-consequent rules which can form a double-consequent rule do not have better than minimum confidence, then do not consider the double-consequent rule, since it cannot have better than minimum confidence. This gives a basis for an efficient algorithm for constructing rules by building up from candidate single-consequent rules to candidate double-consequent rules, etc. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 31 / 45

5. Mining association rules Algorithm schema for generating association rules First, generate confidences for all single-consequent rules Discard any below minimum confidence Build up candidate double consequent rules from retained single-consequent rules Iterate as for construction of large item sets COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 32 / 45

5. Mining association rules Association rules I Frequent item sets can be used to build association rules, which are rules of the form if B then H where both body B and head H are item sets that frequently appear in transactions together. Pick any edge in Figure 6.17, say the edge between {beer} and {nappies,beer}. We know that the support of the former is 3 and of the latter, 2: that is, three transactions involve beer and two of those involve nappies as well. We say that the confidence of the association rule if beer then nappies is 2/3. Likewise, the edge between {nappies} and {nappies,beer} demonstrates that the confidence of the rule if nappies then beer is 2/4. There are also rules with confidence 1, such as if beer then crisps ; and rules with empty bodies, such as if true then crisps, which has confidence 5/8 (i.e., five out of eight transactions involve crisps). COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 33 / 45

5. Mining association rules Association rules II But we only want to construct association rules that involve frequent items. The rule if beer apples then crisps has confidence 1, but there is only one transaction involving all three and so this rule is not strongly supported by the data. So we first use Algorithm 6.6 to mine for frequent item sets; we then select bodies B and heads H from each frequent set m, discarding rules whose confidence is below a given confidence threshold. Notice that we are free to discard some of the items in the maximal frequent sets (i.e., H B may be a proper subset of m), because any subset of a frequent item set is frequent as well. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 34 / 45

5. Mining association rules Algorithm 6.7, p.185 Association rule mining Algorithm AssociationRules(D, f 0,c 0 ) find all association rules exceeding given support and confidence thresholds. Input : data D X ; support threshold f 0 ; confidence threshold c 0. Output : set of association rules R. 1 R ; 2 M FrequentItems(D, f 0 ) ; // FrequentItems: see Algorithm 6.6 3 for each m M do 4 for each H m and B m such that H B = do 5 if Supp(B H)/Supp(B) c 0 then R R { if B then H }; 6 end 7 end 8 return R COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 35 / 45

5. Mining association rules Association rule example A run of the algorithm with support threshold 3 and confidence threshold 0.6 gives the following association rules: if beer then crisps support 3, confidence 3/3 if crisps then beer support 3, confidence 3/5 if true then crisps support 5, confidence 5/8 Association rule mining often includes a post-processing stage in which superfluous rules are filtered out, e.g., special cases which don t have higher confidence than the general case. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 36 / 45

5. Mining association rules Post-processing One quantity that is often used in post-processing is lift, defined as n Supp(B H) Lift( if B then H ) = Supp(B) Supp(H) where n is the number of transactions. For example, for the the first two association rules above we would have lifts of 8 3 3 5 = 1.6, as Lift( if B then H ) = Lift( if H then B ). For the third rule we have Lift( if true then crisps ) = 8 5 8 5 = 1. This holds for any rule with B =, as Lift( if then H ) = n Supp( H) Supp( ) Supp(H) = n Supp(H) n Supp(H) = 1 More generally, a lift of 1 means that Supp(B H) is entirely determined by the marginal frequencies Supp(B) and Supp(H) and is not the result of any meaningful interaction between B and H. Only association rules with lift larger than 1 are of interest. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 37 / 45

Length=4 & Beak=yes & Teeth=many Length=4 & Teeth=many Length=4 & Gills=no & Teeth=many Length=4 & Beak=yes Length=4 & Gills=no & Beak=yes & Teeth=many Length=4 Length=4 & Gills=no & Beak=yes Length=4 & Gills=no Beak=yes & Teeth=many 5. Mining association rules Length=3 & Teeth=many Teeth=many Length=3 & Beak=yes & Teeth=many Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Teeth=many Length=5 & Beak=yes & Teeth=many Length=5 & Gills=no & Teeth=many Length=3 & Gills=no & Beak=yes Length=3 & Gills=no & Beak=yes & Teeth=many Gills=no & Teeth=many Length=5 & Teeth=many Length=3 & Beak=yes Length=5 & Gills=no & Beak=yes & Teeth=many Beak=yes true Length=3 Gills=no & Beak=yes Gills=no Length=3 & Gills=no Length=5 Length=5 & Beak=yes Length=5 & Gills=no & Beak=yes Length=5 & Gills=no Length=3 & Beak=yes & Teeth=few Length=3 & Teeth=few Length=3 & Gills=no & Beak=yes & Teeth=few Teeth=few Beak=yes & Teeth=few Gills=no & Teeth=few Length=5 & Teeth=few Length=3 & Gills=no & Teeth=few Gills=no & Beak=yes & Teeth=few Length=5 & Beak=yes & Teeth=few Length=5 & Gills=no & Teeth=few Length=5 & Gills=no & Beak=yes & Teeth=few Figure 6.19, p.187 Item sets and dolphins The item set lattice corresponding to the positive examples of the dolphin example in Example 4.4. Each item is a literal Feature = Value; each feature can occur at most once in an item set. The resulting structure is exactly the same as what was called the hypothesis space in Chapter 4. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 38 / 45

5. Mining association rules Figure 6.20, p.188 Closed item sets and dolphins Gills=no & Beak=yes Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Beak=yes Length=5 & Gills=no & Beak=yes Gills=no & Beak=yes & Teeth=few Length=4 & Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Beak=yes & Teeth=many Length=5 & Gills=no & Beak=yes & Teeth=many Length=3 & Gills=no & Beak=yes & Teeth=few Length=5 & Gills=no & Beak=yes & Teeth=few Closed item set lattice corresponding to the item sets in Figure 6.19. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 39 / 45

5. Mining association rules Implementations of Association Rule mining Many implementations, including Weka More optimised versions exist, e.g., http://www.borgelt.net/apriori.html COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 40 / 45

5. Mining association rules Apriori in Weka Relation: weather.symbolic Instances: 14 Attributes: 5 outlook temperature humidity windy play Apriori ======= Minimum support: 0.15 (2 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 17 Generated sets of large itemsets: Size of set of large itemsets L(1): 12 Size of set of large itemsets L(2): 47 Size of set of large itemsets L(3): 39 Size of set of large itemsets L(4): 6 COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 41 / 45

5. Mining association rules Apriori in Weka Best rules found: 1. outlook=overcast 4 ==> play=yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1] conv:(1.43) 2. temperature=cool 4 ==> humidity=normal 4 <conf:(1)> lift:(2) lev:(0.14) [2] conv:(2) 3. humidity=normal windy=false 4 ==> play=yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1] conv:(1.43) 4. outlook=sunny play=no 3 ==> humidity=high 3 <conf:(1)> lift:(2) lev:(0.11) [1] conv:(1.5) 5. outlook=sunny humidity=high 3 ==> play=no 3 <conf:(1)> lift:(2.8) lev:(0.14) [1] conv:(1.93) 6. outlook=rainy play=yes 3 ==> windy=false 3 <conf:(1)> lift:(1.75) lev:(0.09) [1] conv:(1.29) 7. outlook=rainy windy=false 3 ==> play=yes 3 <conf:(1)> lift:(1.56) lev:(0.08) [1] conv:(1.07) 8. temperature=cool play=yes 3 ==> humidity=normal 3 <conf:(1)> lift:(2) lev:(0.11) [1] conv:(1.5) 9. outlook=sunny temperature=hot 2 ==> humidity=high 2 <conf:(1)> lift:(2) lev:(0.07) [1] conv:(1) 10. temperature=hot play=no 2 ==> outlook=sunny 2 <conf:(1)> lift:(2.8) lev:(0.09) [1] conv:(1.29) COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 42 / 45

5. Mining association rules Interpreting association rules Interpretation of association rules is not always obvious; for example: If windy = false and play = no then outlook = sunny and humidity = high is not the same as: If windy = false and play = no then outlook = sunny If windy = false and play = no then humidity = high However, it means that the following also holds: If humidity = high and windy = false and play = no then outlook = sunny COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 43 / 45

6. Summary Summary a form of unsupervised learning the main contribution from database mining concerned with scalability simple basic algorithm learning = search through a generalization lattice many extensions numerical, temporal, spatial, negation, closed & free itemsets,... sparse instance representation rule interestingness measures (confidence, lift, conviction, etc.) COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 44 / 45

6. Summary Association rules summary Other algorithms: ECLAT, FP-growth use more complex data structures to reduce complexity of search ECLAT (Zaki et al.) uses set intersections, etc. FP-Growth (Han et al.) uses prefix tree for maximal frequent item-set mining - CHARM (Zaki et al.) for closed frequent item-set mining - CLOSET+, CHARM, FIMI, etc. COMP9417 ML & DM (CSE, UNSW) Association Rule Learning March 15, 2016 45 / 45