Association Rule Mining (ARM) Komate AMPHAWAN

Association Rule Mining (ARM) Komate AMPHAWAN 1

J-O-K-E???? 2

What can be inferred? I purchase diapers I purchase a new car I purchase OTC cough (ไอ) medicine I purchase a prescription medication (ใบส งยา) I don t show up (แสดงต วตน) for class 3

ARM Proposed by Agrawal et al in 1993. Initially used for Market Basket Analysis to find how items purchased by customers are related. Bread Milk [sup = 5%, conf = 100%] 4

Association Rule Learners to discover elements that co-occur frequently within a data set consisting of multiple independent selections of elements (such as purchasing transactions), and to discover rules, such as implication or correlation, which relate cooccurring elements. to answer questions such as "if a customer purchases product A, how likely is he to purchase product B?" and "What products will a customer buy if he buys products C and D?" are answered by association-finding algorithms. to reduce a potentially huge amount of information to a small, understandable set of statistically supported statements. also known as market basket analysis. 5

Market Basket Analysis A typical and widely-used example of association rule mining. Example: Data are collected using bar-code scanners in supermarkets. Each record will consist of all items in a single purchase transaction. Managers would be interested to know if certain groups of items are consistently purchased together. They could use this data for adjusting store layouts (placing items optimally with respect to each other), for cross-selling, for promotions, for catalog design and to identify customer segments based on buying patterns. 6

Famous & Interesting Finding Beer & Diaper A number of convenience store clerks noticed that men often bought beer at the same time they bought diapers. The store mined its receipts and proved the clerks' observations correct. So, the store began stocking diapers next to the beer coolers, and sales skyrocketed 7

Market Basket Analysis Retail each customer purchases different set of products, different quantities, different times MBA uses this information to: Identify who customers are (not by name) Understand why they make certain purchases Gain insight about its merchandise (products): Fast and slow movers Products which are purchased together Products which might benefit from promotion Take action: Store layouts Which products to put on specials, promote, coupons Combining all of this with a customer loyalty card it 8 becomes even more valuable

Association Rules DM technique most closely allied with Market Basket Analysis AR can be automatically generated AR represent patterns in the data without a specified target variable Good example of undirected data mining Whether patterns make sense is up to humanoids (us!) 9

Associations Rules expressing relationships between items Example cereal, milk fruit People who bought cereal and milk also bought fruit. Stores might want to offer specials on milk and cereal to get people to buy more fruit. 10

Association Rules Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars [Forbes, Sept 8, 1997] Customers who purchase maintenance agreements are very likely to purchase large appliances (author experience) When a new hardware store opens, one of the most commonly sold items is toilet bowl (โถช กโครก) cleaners (author experience) So what 11

Association Rules Apply Elsewhere Besides retail supermarkets, etc Purchases made using credit/debit cards Optional Telco Service purchases Banking services Unusual combinations of insurance claims can be a warning of fraud Medical patient histories 12

Association Rules Association rule types: Actionable Rules contain high-quality, actionable information Trivial Rules information already well-known by those familiar with the business Inexplicable Rules no explanation and do not suggest action Trivial and Inexplicable Rules occur most often 13

Market Basket Analysis Analyze tables of transactions Person A B C D Basket Chips, Salsa, Cookies, Crackers, Coke, Beer Lettuce, Spinach, Oranges, Celery, Apples, Grapes Chips, Salsa, Frozen Pizza, Frozen Cake Lettuce, Spinach, Milk, Butter Can we hypothesize? Chips => Salsa Lettuce => Spinach 14

Market Baskets In general, data consists of TID Basket Transaction ID Subset of items 15

The model: data (Input) I= {i 1, i 2,, i m }: a set of items. Transaction Database T: a set of transactions T = {t 1, t 2,, t n }. Transaction t: a set of items, and t I. 16

Transaction data: supermarket data Market basket transactions: t 1 : {bread, cheese, milk} t 2 : {apple, eggs, salt, yogurt} t n : {biscuit, eggs, milk} Concepts: An item: an item/article in a basket I: the set of all items sold in the store A transaction: items purchased in a basket; it may have TID (transaction ID) A transactional dataset: A set of transactions 17

Transaction data: a set of documents A text document data set. Each document is treated as a bag of keywords doc1: doc2: doc3: doc4: doc5: doc6: doc7: Student, Teach, School Student, School Teach, School, City, Game Baseball, Basketball Basketball, Player, Spectator Baseball, Coach, Game, Team Basketball, Team, City, Game 18

Measuring Interesting Rules Support Ratio of # of transactions containing A and B to the total # of transactions s( A Confidence B) = { T D A B T } D Ratio of # of transactions containing A and B to #of transactions containing A c( A B) { T D A B T} = { T D A T} 19

Measuring Interesting Rules Rules are included/excluded based on two metrics minimum support level -how frequently all of the items in a rule appear in transactions minimum confidence level -how frequently the left hand side of a rule implies the right hand side 20

Two sub-problems in discovering all association rules: ARM Find all sets of items (itemsets) that have transaction support above minimum support Itemsets that qualify are called largeitemsets, and all others small itemsets. Generate from each large itemset, rules that use items from the large itemset. Given a large itemset Y, and X is a subset of Y Take the support of Y and divide it by the support of X If the ratio c is at least minconf, then X (Y -X) is satisfied with confidence factor c 21

Frequent Itemsets itemset any set of items k-itemset an itemset containing k items frequent itemset an itemset that satisfies a minimum support level If I contains m items, how many itemsets are there? 22

Strong Association Rules Given an itemset, it s easy to generate association rules Given itemset, {Chips, Salsa} => Chips, Salsa Chips => Salsa Salsa => Chips Chips, Salsa => Strong rules are interesting Generally defined as those rules satisfying minimum support and minimum confidence 23

Summary of Association Rule Definitions Association Rule (AR): implication X Y where X,Y I and X Y = ; Support of AR (s) X Y: Percentage of transactions that contain X Y Confidence of AR (α) X Y: Ratio of number of transactions that contain X Y to the number that contain X 24

Association Rule Mining Task Given a set of transactions T, the goal of association rule mining is to find all rules having support minsup threshold confidence minconf threshold Brute-force approach: List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Computationally prohibitive! 25

Generating frequent itemsets 26

Frequent Itemset Generation null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2 d possible candidate itemsets 27

Generating Frequent Itemsets Analysis of naïve algorithm 2 m subsets of I Scan n transactions for each subset O(2 m n) tests of s being subset of T Growth is exponential in the number of items! Can we do efficiently? 28

Frequent Itemset Generation Brute-force approach: Each itemset in the lattice is a candidate frequent itemset Count the support of each candidate by scanning the database Transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Match each transaction against every candidate Complexity ~ O(NMw) => Expensive since M = 2 d!!! 29

Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M Reduce the number of transactions (N) Reduce size of N as the size of itemset increases Used by DHP and vertical-based mining algorithms Reduce the number of comparisons (NM) Use efficient data structures to store the candidates or transactions No need to match every candidate against every transaction 30

Classical algorithm to discover frequent itemsets: Apriori 31

The Apriori Algorithm Progressively identifies large itemsets of different sizes A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD 32

Generating Frequent Itemsets Frequent itemsets support the apriori property If A is not a frequent itemset, then any superset of A is not a frequent itemset. Support of an itemset never exceeds the support of its subsets Proof: Let n be the number of transactions. Suppose A is a subset of l transactions. If A A, then A is a subset of l l transactions. Thus, if l/n < minimum support, so is l /n. 33

Illustrating Apriori Principle Found to be Infrequent Pruned supersets 34

Generating Frequent Itemsets Central idea: Build candidate k-itemsets from frequent (k-1)-itemsets Approach Find all frequent 1-itemsets Extend (k-1)-itemsets to candidate k-itemsets Prune candidate itemsets that do not meet the minimum support. 35

Any subset of a frequent itemset must be also frequent an anti-monotone property Apriori: Candidate Generationand-test A transaction containing {beer, diaper, nuts} also contains {beer, diaper} {beer, diaper, nuts} is frequent {beer, diaper} must also be frequent No superset of any infrequent itemset should be generated or tested Many item combinations can be pruned 36

APRIORI Using the downward closure, we can prune unnecessary branches for further consideration APRIORI 1. k = 1 2. Find frequent set L k from C k of all candidate itemsets 3. Form C k+1 from L k ; k = k + 1 4. Repeat 2-3 until C k is empty Details about steps 2 and 3 Step 2: scan D and count each itemset in C k, if it s greater than minsup, it is frequent Step 3: next slide 37

Apriori s Candidate Generation For k=1, C 1 = all 1-itemsets. For k>1, generate C k from L k-1 as follows: The join step C k = k-2 way join of L k-1 with itself If both {a 1,,a k-2, a k-1 } & {a 1,, a k-2, a k } are in L k-1, then add {a 1,,a k-2, a k-1, a k } to C k (We keep items sorted). The prune step Remove {a 1,,a k-2, a k-1, a k } if it contains a non-frequent (k- 1) subset 38

How to Generate Candidates? Suppose the items in L k-1 are listed in an order Step 1: self-joining L k-1 insert intoc k select p.item 1, p.item 2,, p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 =q.item 1,, p.item k-2 =q.item k-2, p.item k-1 < q.item k-1 Step 2: pruning forall itemsets c in C k do forall (k-1)-subsets s of c do if (s is not in L k-1 ) then delete cfrom C k 39

Example of Generating L 3 ={abc, abd, acd, ace, bcd} Candidates Self-joining: L 3 *L 3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L 3 C 4 ={abcd} 40

The Apriori Algorithm Example Database D TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Scan D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 C 1 L 1 itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 C 2 C 2 L 2 itemset sup Scan D {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 C 3 itemset Scan D L 3 {2 3 5} itemset sup {2 3 5} 2 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} 41

The core of the Apriori algorithm: Is Apriori Fast Enough? Performance Bottlenecks Use frequent (k 1)-itemsets to generate candidate frequent k- itemsets Use database scan and pattern matching to collect counts for the candidate itemsets The bottleneck of Apriori: candidate generation Huge candidate sets: 10 4 frequent 1-itemset will generate 10 7 candidate 2-itemsets To discover a frequent pattern of size 100, e.g., {a 1, a 2,, a 100 }, one needs to generate 2 100 10 30 candidates. Multiple scans of database: Needs (n +1 ) scans, n is the length of the longest pattern 42

Apriori Adv/Disadv Advantages: Uses large itemset property. Easily parallelized Easy to implement. Disadvantages: Assumes transaction database is memory resident. Requires up to m database scans. 43

Improving Apriori s Efficiency Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent Transaction reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scans Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB Sampling: mining on a subset of given data, need a lower support threshold + a method to determine the completeness Dynamic itemset counting: add new candidate itemsets immediately (unlike Apriori) when all of their subsets are estimated to be frequent 44

Deriving rules from frequent itemsets 45

Derive rules from frequent itemsets Frequent itemsets!= association rules One more step is required to find association rules For each frequent itemset X, For each proper nonempty subset A of X, Let B = X -A A B is an association rule if Confidence (A B) minconf, where support (A B) = support (AB), and confidence (A B) = support (AB) / support (A) 46

Example deriving rules from frequent itemses Suppose 234 is frequent, with supp=50% Proper nonempty subsets: 23, 24, 34, 2, 3, 4, with supp=50%, 50%, 75%, 75%, 75%, 75% respectively These generate these association rules: 23 => 4, confidence=100% 24 => 3, confidence=100% 34 => 2, confidence=67% 2 => 34, confidence=67% 3 => 24, confidence=67% 4 => 23, confidence=67% All rules have support = 50% 47

Deriving rules To recap, in order to obtain A B, we need to have Support(AB) and Support(A) This step is not as time-consuming as frequent itemsets generation Why? It s also easy to speedup using techniques such as parallel processing. How? Do we really need candidate generation for deriving association rules? Frequent-Pattern Growth (FP-Tree) 48

Algorithm to Generate ARs 49

Computational Complexity Given d unique items: Total number of itemsets = 2 d Total number of possible association rules: R d 1 d = d k d k 1 k= j= 1 k j d d+ 1 = 3 2 + 1 If d=6, R = 602 rules 50

Efficiency improvement 51

Can we improve efficiency? Efficiency Improvement Pruning without checking all k - 1 subsets? Joining and pruning without looping over entire L k-1?. Yes, one way is to use hash trees. One hash tree is created for each pass k Or one hash tree for k-itemset, k = 1, 2, 52

Hash Tree Storing all candidate k-itemsets and their counts. Internal node v at level m contains bucket pointers Which branch next? Use hash of m th item to decide Leaf nodes contain lists of itemsets and counts E.g., C 2 : 12, 13, 15, 23, 25, 35; use identity hash function {} ** root /1 2 \3 ** edge+label /2 3 \5 /3 \5 /5 [12:][13:] [15:] [23:] [25:] [35:] ** leaves 53

How to join using hash tree? Only try to join frequent k-1 itemsets with common parents in the hash tree How to prune using hash tree? To determine if a k-1 itemset is frequent with hash tree can avoid going through all itemsets of L k-1. (The same idea as the previous item) Added benefit: No need to enumerate all k-subsets of transactions. Use traversal to limit consideration of such subsets. Or enumeration is replaced by tree traversal. 54

Further Improvement Speed up searching and matching Reduce number of transactions (a kind of instance selection) Reduce number of passes over data on disk Reduce number of subsets per transaction that must be considered Reduce number of candidates 55

Speed up searching and matching Use hash counts to filter candidates (see example) Method: When counting candidate k-1 itemsets, get counts of hash-groups of k-itemsets Use a hash function h on k-itemsets For each transaction t and k-subset s of t, add 1 to count of h(s) Remove candidates q generated by Apriori if h(q) s count <= minsupp The idea is quite useful for k=2, but often not so useful elsewhere. (For sparse data, k=2 can be the most expensive for Apriori. Why?) 56

Suppose h2 is: Hash-based Example 1,3,4 2,3,5 1,2,3,5 2,5 h2(x,y) = ((order of x) * 10 + (order of y)) mod 7 E.g., h2(1,4) = 0, h2(1,5) = 1, bucket0 bucket1 bucket2 bucket3 bucket4 bucket5 bucket6 14 15 23 24 25 12 13 35 34 counts 3 1 2 0 3 1 3 Then 2-itemsets hashed to buckets 1, 5 cannot be frequent (e.g. 15, 12), so remove them from C 2 57

Working on transactions Remove transactions that do not contain any frequent k-itemsets in each scan Remove from transactions those items that are not members of any candidate k-itemsets e.g., if 12, 24, 14 are the only candidate itemsets contained in 1234, then remove item 3 if 12, 24 are the only candidate itemsets contained in transaction 1234, then remove the transaction from next round of scan. Reducing data size leads to less reading and processing time, but extra writing time 58

Reducing Scans via Partitioning Divide the dataset D into m portions, D1, D2,, Dm, so that each portion can fit into memory. Find frequent itemsets F i in D i, with support minsup, for each i. If it is frequent in D, it must be frequent in some D i. The union of all F i forms a candidate set of the frequent itemsets in D; get their counts. Often this requires only two scans of D. 59

FP-growth 60

FP Growth (Han, Pei, Yin 2000) One problematic aspect of the Apriori is the candidate generation Source of exponential growth Another approach is to use a divide and conquer strategy Idea: Compress the database into a frequent pattern tree representing frequent items 61

Mining Frequent Patterns Without Candidate Generation Compress a large database into a compact, Frequent- Pattern tree (FP-tree) structure highly condensed, but complete for frequent pattern mining avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: sub-database test only! 62

Construct FP-tree from a Transaction DB Steps: TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} 1. Scan DB once, find frequent 1-itemset (single item pattern) 2. Order frequent items in frequency descending order 3. Scan DB again, construct FP-tree Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 min_support = 0.5 {} f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 63

Benefits of the FP-tree Structure Completeness: never breaks a long pattern of any transaction preserves complete information for frequent pattern mining Compactness reduce irrelevant information infrequent items are gone frequency descending ordering: more frequent items are more likely to be shared never be larger than the original database (if not count nodelinks and counts) 64

Mining Frequent Patterns Using FP-tree General idea (divide-and-conquer) Recursively grow frequent pattern path using the FP-tree Method For each item, construct its conditional pattern-base, and then its conditional FP-tree Repeat the process on each newly created conditional FPtree Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern) 65

Major Steps to Mine FP-tree 1) Construct conditional pattern base for each node in the FP-tree 2) Construct conditional FP-tree from each conditional pattern-base 3) Recursively mine conditional FP-trees and grow frequent patterns obtained so far If the conditional FP-tree contains a single path, simply enumerate all the patterns 66

Step 1: From FP-tree to Conditional Pattern Base Starting at the frequent header table in the FP-tree Traverse the FP-tree by following the link of each frequent item Accumulate all of transformed prefix paths of that item to form a conditional pattern base Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 {} f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 Conditional pattern bases item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 67

Properties of FP-tree for Conditional Pattern Base Construction Node-link property For any frequent item a i, all the possible frequent patterns that contain a i can be obtained by following a i 's node-links, starting from a i 's head in the FP-tree header Prefix path property To calculate the frequent patterns for a node a i in a path P, only the prefix sub-path of a i in P need to be accumulated, and its frequency count should carry the same count as node a i. 68

Step 2: Construct Conditional FP-tree For each pattern-base Accumulate the count for each item in the base Construct the FP-tree for the frequent items of the pattern base Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 {} f:4 c:1 c:3 a:3 b:1 b:1 p:1 m:2 p:2 b:1 m:1 m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam 69

Mining Frequent Patterns by Creating Conditional Pattern-Bases Item p m b a c f Conditional pattern-base {(fcam:2), (cb:1)} {(fca:2), (fcab:1)} {(fca:1), (f:1), (c:1)} {(fc:3)} {(f:3)} Empty Conditional FP-tree {(c:3)} p {(f:3, c:3, a:3)} m Empty {(f:3, c:3)} a {(f:3)} c Empty 70

Step 3: Recursively mine the conditional FP-tree {} {} f:3 c:3 a:3 m-conditional FP-tree Cond. pattern base of am : (fc:3) Cond. pattern base of cm : (f:3) f:3 c:3 am-conditional FP-tree {} f:3 cm-conditional FP-tree {} Cond. pattern base of cam : (f:3) f:3 cam-conditional FP-tree 71

Single FP-tree Path Generation Suppose an FP-tree T has a single path P The complete set of frequent pattern of T can be generated by enumeration of all the combinations of the sub-paths of P {} f:3 c:3 a:3 All frequent patterns concerning m m, fm, cm, am, fcm, fam, cam, fcam m-conditional FP-tree 72

Principles of Frequent Pattern Growth Pattern growth property Let α be a frequent itemset in DB, B be α's conditional pattern base, and β be an itemset in B. Then α β is a frequent itemset in DB iff β is frequent in B. abcdef is a frequent pattern, if and only if abcde is a frequent pattern, and f is frequent in the set of transactions containing abcde 73

Why Is Frequent Pattern GrowthFast? Our performance study shows FP-growth is an order of magnitude faster than Apriori, and is also faster than tree-projection Reasoning No candidate generation, no candidate test Use compact data structure Eliminate repeated database scan Basic operation is counting and FP-tree building 74

FP-growth vs. Apriori: Scalability With the Support Threshold 100 90 80 Data set T25I20D10K D1 FP-grow th runtime D1 Apriori runtime Run time(sec.) 70 60 50 40 30 20 10 0 0 0.5 1 1.5 2 2.5 3 Support threshold(%) 75

Association Rules: Advanced Topics 76

Alternative Methods for Frequent Itemset Generation Traversal of Itemset Lattice General-to-specific vs Specific-to-general 77

Alternative Methods for Frequent Itemset Generation Traversal of Itemset Lattice Equivalent Classes 78

Alternative Methods for Frequent Itemset Generation Traversal of Itemset Lattice Breadth-first vs Depth-first 79

Alternative Methods for Frequent Itemset Generation Representation of Database horizontal vs vertical data layout TID Items A B C D E 1 A,B,E 1 1 2 2 1 2 B,C,D 4 2 3 4 3 3 C,E 5 5 4 5 6 4 A,C,D 6 7 8 9 5 A,B,C,D 7 8 9 6 A,E 8 10 7 A,B 9 8 A,B,C 9 A,C,D 10 B 80

Set enumeration tree: Tree Projection null Possible Extension: E(A) = {B,C,D,E} A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE Possible Extension: E(ABC) = {D,E} ABCD ABCE ABDE ACDE BCDE ABCDE 81

Tree Projection Items are listed in lexicographic order Each node P stores the following information: Itemset for node P List of possible lexicographic extensions of P: E(P) Pointer to projected database of its ancestor node Bitvector containing information about which transactions in the projected database contain the itemset 82

Original Database: TID Items 1 {A,B} 2 {B,C,D} 3 {A,C,D,E} 4 {A,D,E} 5 {A,B,C} 6 {A,B,C,D} 7 {B,C} 8 {A,B,C} 9 {A,B,D} 10 {B,C,E} Projected Database Projected Database for node A: TID Items 1 {B} 2 {} 3 {C,D,E} 4 {D,E} 5 {B,C} 6 {B,C,D} 7 {} 8 {B,C} 9 {B,D} 10 {} For each transaction T, projected transaction at node A is T E(A) 83

ECLAT For each item, store a list of transaction ids (tids) Horizontal Data Layout TID Items 1 A,B,E 2 B,C,D 3 C,E 4 A,C,D 5 A,B,C,D 6 A,E 7 A,B 8 A,B,C 9 A,C,D 10 B Vertical Data Layout A B C D E 1 1 2 2 1 4 2 3 4 3 5 5 4 5 6 6 7 8 9 7 8 9 8 10 9 TID-list 84

ECLAT Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets. A 1 4 5 6 7 8 9 B 1 2 5 7 8 10 AB 1 5 7 8 3 traversal approaches: top-down, bottom-up and hybrid Advantage: very fast support counting Disadvantage: intermediate tid-lists may become too large for memory 85

Eclat Algorithm Dynamically process each transaction online maintaining 2-itemset counts. Transform Partition L2 using 1-item prefix Equivalence classes - {AB, AC, AD}, {BC, BD}, {CD} Transform database to vertical form Asynchronous Phase For each equivalence class E Compute frequent (E) 86

Discussion 87

Measuring Interestingness - Discussion What are interesting association rules Novel and actionable Association mining aims to look for valid, novel, useful (= actionable) patterns. Support and confidence are not sufficient for measuring interestingness. Large support & confidence thresholds only a small number of association rules, and they are likely folklores, or known facts. Small support & confidence thresholds too many association rules. 88

Additional step (may be) Two-step approach: 1. Frequent Itemset Generation Generate all itemsets whose support minsup 2. Rule Generation Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Additional step (may be) Overcoming the practical limits imposed by thousands or tens of thousands of unique items 89

Overcoming Practical Limits for Association Rules 1. Generate co-occurrence matrix for single items if OJ then soda 2. Generate co-occurrence matrix for two items if OJ and Milk then soda 3. Generate co-occurrence matrix for three items if OJ and Milk and Window Cleaner then soda 4. Etc 90

Final Thought on Association Rules: The Problem of Lots of Data Fast Food Restaurant could have 100 items on its menu How many combinations are there with 3 different menu items? 161,700! Supermarket 10,000 or more unique items 50 million 2-item combinations 100 billion 3-item combinations Use of product hierarchies (groupings) helps address this common issue Finally, know that the number of transactions in a given timeperiod could also be huge (hence expensive to analyze) 91

How Good is an Association Rule? Is support and confidence enough? Lift (improvement) tells us how much better a rule is at predicting the result than just assuming the result in the first place Lift = P(LHS^RHS) / P(LHS).P(RHS) When lift > 1 then the rule is better at predicting the result than guessing When lift < 1, the rule is doing worse than informed guessing and using the Negative Rule produces a better rule than guessing 92

Computational Complexity Given d unique items: Total number of itemsets = 2 d Total number of possible association rules: R d 1 d = d k d k 1 k= j= 1 k j d d+ 1 = 3 2 + 1 If d=6, R = 602 rules 93

Usability of Association Rules Explainability High Intuitive explanations Accuracy Moderate Depends on rule quality Scalability Moderate/Low Performance of rule systems depends on both no. and complexity of rules Embeddability Moderate/high Can be compiled in many cases Tolerance for sparse data Low Support and confidence are both affected Tolerance for noisy data Moderate How do you use outliers? Development Speed Low/Moderate Needs lot of filtering Dependence on Experts Moderate/high Domain experts to filter rules 94

Unique Features of Association Rules vs. classification Right hand side can have any number of items It can find a classification like rule X cin a different way: such a rule is not about differentiating classes, but about what (X) describes class c vs. clustering It does not have to have class labels For X Y, if Y is considered as a cluster, it can form different clusters sharing the same description (X). 95

Multilevel Association Rules Other Association Rules Often there exist structures in data E.g., yahoo hierarchy, food hierarchy Adjusting minsup for each level Constraint-based Association Rules Knowledge constraints Data constraints Dimension/level constraints Interestingness constraints Rule constraints 96

Q & A 97