1 Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193

2 Netflix Prize Frequent Pattern Mining Frequent Patterns Users evaluate movies from time to time ( Can we predict how much a user likes a movie?!"#$% &% '% (% )% *% +%!"# $%%&# '(&# $%%&# '(&##!)# $%%&# $%%&# '(&# $%%&#!*# $%%&# '(&# '(&# $%%&# $%%&#!+# $%%&# '(&# $%%&#,# If we find pattern (A=Good) AND (C=Bad) (E=Good) holds for many users, we can recommend movie E to user U4! 45 / 193

3 Transaction Data Analysis Frequent Patterns Transactions: customers purchases of commodities bread, milk, cheese if they are bought together Frequent patterns are product combinations that are frequently purchased together by customers Generally, frequent patterns are patterns (set of items, sequence, etc.) that occur frequently in a database [AIS93] 46 / 193

4 Frequent Itemsets Frequent Pattern Mining Frequent Patterns Transaction database TDB TID Items bought 100 f, a, c, d, g, I, m, p 200 a, b, c, f, l, m, o 300 b, f, h, j, o 400 b, c, k, s, p 500 a, f, c, e, l, p, m, n Itemset: a set of items, e.g., acm = {a, c, m} Support of itemsets, e.g., Sup(acm) =3 Given min sup = 3, acm is a frequent pattern Frequent pattern mining: findingall frequent patterns in a given database with respect to a give support threshold 47 / 193

5 A Naïve Attempt Frequent Pattern Mining The Apriori Algorithm Generate all possible itemsets, test their supports against the database How to hold a large number of itemsets into main memory? If there are 100 items, there are possible itemsets How to test the supports of a huge number of itemsets against a large database, say containing 100 million transactions? For a transaction of length 20, we need to update the support of =1, 048, 575 itemsets 48 / 193

6 Transactions in Real Applications The Apriori Algorithm A large department store often carries more than 100 thousand different kinds of items carries more than 17,000 books relevant to data mining Walmart has more than 20 million transactions per day AT&T produces more than 275 million calls per day Mining large transaction databases of many items is a real demand 49 / 193

7 The Apriori Algorithm How to Obtain an Efficient Method? Reducing the number of itemsets that need to be checked Checking the supports of selected itemsets efficiently 50 / 193

8 An Anti-Monotonicity Frequent Pattern Mining The Apriori Algorithm Any subset of a frequent itemset must be also frequent an anti-monotonic property A transaction containing {beer, diaper, nuts} also contains {beer, diaper} If {beer, diaper, nuts} is frequent, {beer, diaper} must also be frequent In other words, any superset of an infrequent itemset must also be infrequent No superset of any infrequent itemset should be generated or tested Many item combinations can be pruned! 51 / 193

9 The Apriori Algorithm Candidate Generation & Test (the Apriori Principle) Find frequent items Generate length (k + 1) candidate itemsets from length k frequent itemsets Test the candidates against DB 52 / 193

10 The Apriori Algorithm Example The Apriori Algorithm Data base D TID Items 10 a, c, d 20 b, c, e 30 a, b, c, e 40 b, e Min_sup=2 Scan D Scan D 3-candidates Itemset bce Freq 3-itemsets Itemset Sup bce 2 1-candidates Itemset Sup a 2 b 3 c 3 d 1 e 3 Freq 2-itemsets Itemset Sup ac 2 bc 2 be 3 ce 2 Freq 1-itemsets Itemset Sup a 2 b 3 c 3 e 3 Counting Itemset Sup ab 1 ac 2 ae 1 bc 2 be 3 ce 2 2-candidates Itemset ab ac ae bc be ce Scan D 53 / 193

11 The Apriori Algorithm [AgSr94] The Apriori Algorithm Require: transaction database TDB, minimum support threshold min sup {C k : the set of length-k candidate itemsets} {L k : the set of length-k frequent itemsets} L 1 {frequent items} k 1 while L k do {candidate generation} C k+1 candidates generated from L k ; sup(x ) 0forX C k+1 for all transaction t TDB, itemsetx C k+1 do if X t then sup(x )++ end if end for L k+1 = {X X C k+1, sup(x ) min sup}; k ++ end while return k i=1 L k 54 / 193

12 How to Find Frequent Items? The Apriori Algorithm Finding frequent items using a one dimensional array for all item x do c[x] 0 end for for all transaction t do if x t then c[x]++ end if end for return {x c[x] min sup} 55 / 193

13 The Apriori Algorithm How to Find Length-2 Frequent Itemsets? Using a 2-dimensional triangle matrix for items i, j (i < j), c[i, j] is the count for itemset ij for all items i and j such that i < j do c[i, j] 0 end for for all transaction t do sort items in t in lexicographic order for i =0tolen(t) 1 do if i is a frequent item then for j=i+1tolen(t) do if j is a frequent item then c[i, j]++ end if end for end if end for end for return {ij i < j c[i, j] min sup} 56 / 193

14 Implementation Frequent Pattern Mining The Apriori Algorithm A 2-dimensional triangle matrix can be implemented using a 1-dimensional array There are n items For items i, j (i>j), c[i,j] = c[(i-1)(2n-i)/2+j-i]; Example: c[3,5] =c[(3-1)* (2*5-3)/2+5-3]=c[9] / 193

15 Candidate Generation Example The Apriori Algorithm Suppose L 3 = {abc, abd, acd, ace, bcd}. How can we generate C 4? Self-joining: L 3 L 3 : abcd abc abd and acde acd ace Pruning: acde is removed because ade L 3 C 4 = {abcd} 58 / 193

16 Candidate Generation Algorithm The Apriori Algorithm Require: the items in every itemset in L k are listed in an order R {self-join L k } INSERT INTO C k+1 SELECT p.item 1, p.item 2,...,p.item k, q.item k FROM L k p, L k q WHERE p.item 1 = q.item 1,...,p.item k 1 = q.item k 1, p.item k < R q.item k {pruning} for itemset X C k+1 do for each k-subset X of X do if X L k then C k+1 = C k+1 {X } end if end for end for return C k+1 59 / 193

17 How to Count Supports? The Apriori Algorithm Why counting supports of candidates a problem? The total number of candidates can be very huge One transaction may contain many candidates Method Candidate itemsets are stored in a hash-tree A leaf node of hash-tree contains a list of itemsets and counts Interior node contains a hash table Subset function: finds all the candidates contained in a transaction 60 / 193

18 Example Frequent Pattern Mining The Apriori Algorithm Subset function 3,6,9 1,4,7 2,5, Transaction: / 193

19 Bottleneck of Freq Pattern Mining The FP-growth Algorithm Multiple database scans are costly Mining long patterns needs many scans and generates many candidates To find frequent itemset i 1 i 2 i 100, 100 scans are needed and the total number of candidates is ( 100 ) ( ) ( ) 100 = Bottleneck: candidate-generation-and-test 62 / 193

20 The FP-growth Algorithm Search Space of Frequent Pattern Mining ABCD ABC ABD ACD BCD AB AC BC AD BD CD A B C D {} Itemset lattice 63 / 193

21 Set Enumeration Tree Frequent Pattern Mining The FP-growth Algorithm Use an order on items, enumerate itemsets in lexicographic order a, ab, abc, abcd, ac, acd, ad, b, bc, bcd, bd, c, dc, d Reduce a lattice to a tree! a b c d ab ac ad bc bd cd abc abd acd bcd abcd 64 / 193

22 Borders of Frequent Itemsets The FP-growth Algorithm Frequent itemsets are connected is trivially frequent X on the border every subset of X is frequent! a b c d ab ac ad bc bd cd abc abd acd bcd abcd 65 / 193

23 Projected Databases Frequent Pattern Mining The FP-growth Algorithm X -projected database the set of transactions containing X TDB X = {t TDB X t} To test whether itemset Xy is frequent, we can use the X -projected database and check whether item y is frequent in the X -projected database! a b c d ab ac ad bc bd cd abc abd acd bcd abcd 66 / 193

24 The FP-growth Algorithm Compressing a Transaction Database by FP-tree The 1st scan: find frequent items Only record frequent items in the FP-tree F-list: f -c-a-b-m-p Header table item f c a b m p f:4 c:3 a:3 m:2 p:2 root b:1 b:1 m:1 c:1 b:1 p:1 The 2nd scan: construct tree Order frequent items in each transaction w.r.t. the f-list Explore sharing among transactions TID Items bought (ordered) freq items 100 f, a, c, d, g, I, m, p f, c, a, m, p 200 a, b, c, f, l,m, o f, c, a, b, m 300 b, f, h, j, o f, b 400 b, c, k, s, p c, b, p 500 a, f, c, e, l, p, m, n f, c, a, m, p 67 / 193

25 Why FP-tree? Frequent Pattern Mining The FP-growth Algorithm Completeness Never break a long pattern in any transaction Preserve complete information for frequent pattern mining no need to scan the database anymore Compactness Reduce irrelevant information infrequent items are removed Items in frequency descending order (f-list): the more frequently occurring, the more likely to be shared Never be larger than the original database (not counting node-links and the count fields) 68 / 193

26 Partitioning Frequent Patterns The FP-growth Algorithm Frequent patterns can be partitioned into subsets according to the f-list: f -c-a-b-m-p Patterns containing p Patterns having m but no p... Patterns having c but no a nor b, m, orp Pattern f Depth-first search of a set enumeration tree The partitioning is complete and does not have any overlap 69 / 193

27 Find Patterns Having Item p The FP-growth Algorithm Only transactions containing p are needed Form p-projected database Starting at entry p of the header table Follow the side-link of frequent item p Accumulate all transformed prefix paths of p p-projected database TDB p fcam: 2 cb: 1 Local frequent item: c:3 Frequent patterns containing p p: 3, pc: 3 Header table item f c a b m p f:4 c:3 a:3 m:2 p:2 root b:1 b:1 m:1 c:1 b:1 p:1 70 / 193

28 The FP-growth Algorithm Find Patterns Having Item m But No p Form m-projected database TDB m Item p is excluded (why?) TDB m = {fca :2, fcab :1} Local frequent items: f, c, a Build FP-tree for TDB m Header table item f c a root f:3 c:3 a:3 m-projected FP-tree Header table item f c a b m p f:4 c:3 a:3 m:2 p:2 root b:1 b:1 m:1 c:1 b:1 p:1 71 / 193

29 Recursive Mining Frequent Pattern Mining The FP-growth Algorithm Patterns having m but no p can be mined recursively Optimization: enumerate patterns from a single-branch FP-tree Enumerate all combination Support = that of the last item Example: m, fm, cm, am, fcm, fam, cam, fcam Header table item f c a root f:3 c:3 a:3 m-projected FP-tree 72 / 193

30 Patterns from a Single Prefix The FP-growth Algorithm When a (projected) FP-tree has a single prefix, we can reduce the single prefix into one virtual node, and join the mining results of the two parts root a 1 :n 1 root r 1 a 1 :n 1 a 2 :n 2 a 3 :n 3! r = a 2 :n 2 + b 1 :m 1 c 1 :k 1 b 1 :m 1 c 1 :k 1 a 3 :n 3 c 2 :k 2 c 3 :k 3 c 2 :k 2 c 3 :k 3 73 / 193

31 The FP-growth Algorithm The FP-growth Algorithm Pattern-growth: recursively grow frequent patterns by pattern and database partitioning for each frequent item x do construct the x-projected database, and then the x-projected FP-tree Recursively mine the x-projected FP-tree, until the resulted FP-tree either is empty, or contains only one path single path generates all the combinations, each of which is a frequent pattern end for 74 / 193

32 From Itemsets to Sequences Sequential Pattern Mining Itemsets: combinations of items, no temporal order Temporal order is important in many situations, such as time-series databases and sequence databases Frequent patterns (frequent) sequential patterns Application example of sequential pattern mining mobile user trajectories using pattern Park a car buy parking ticket visit a coffee shop, all in 15 minutes, we can recommend a coffee shop in a cell phone More applications: medical treatment, natural disasters, science and engineering processes, stocks and markets, telephone calling patterns, Web log clickthrough streams, DNA sequences and gene structures 75 / 193

33 What Is Sequential Pattern Mining? Sequential Pattern Mining Given a set of sequences, find the complete set of frequent subsequences SID sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> A sequence : < (ef) (ab) (df) c b > Given a minimum support threshold min sup = 2, (ab)c is a sequential pattern 76 / 193

34 Sequential Pattern Mining An (Anti)-Monotonic Property of Sequential Patterns If a sequence s is infrequent, then none of the super-sequences of s is frequent Example: let min sup = 2. hb is infrequent hab and (ah)b are infrequent Seq-id Sequence 10 <(bd)cb(ac)> 20 <(bf)(ce)b(fg)> 30 <(ah)(bf)abf> 40 <(be)(ce)d> 50 <a(bd)bcb(ade)> 77 / 193

35 Sequential Pattern Mining Sequential Pattern Mining Algorithm GSP 5 th scan: 1 cand. 1 length-5 seq. pat. 4 th scan: 8 cand. 6 length-4 seq. pat. 3 rd scan: 46 cand. 19 length-3 seq. pat. 20 cand. not in DB at all 2 nd scan: 51 cand. 19 length-2 seq. pat. 10 cand. not in DB at all 1 st scan: 8 cand. 6 length-1 seq. pat. <(bd)cba> <abba> <(bd)bc> Cand. cannot pass sup. threshold <abb> <aab> <aba> <baa> <bab> <aa> <ab> <af> <ba> <bb> <ff> <(ab)> <(ef)> <a> <b> <c> <d> <e> <f> <g> <h> Cand. not in DB at all Seq-id Sequence 10 <(bd)cb(ac)> 20 <(bf)(ce)b(fg)> 30 <(ah)(bf)abf> 40 <(be)(ce)d> 50 <a(bd)bcb(ade)> 78 / 193

36 Sequential Pattern Mining Sequential Pattern Mining Algorithm PrefixSpan Having prefix <a> <a>-projected database <(abc)(ac)d(cf)> <(_d)c(bc)(ae)> <(_b)(df)cb> <(_f)cbc> SID SDB sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> Having prefix <b> Length-1 sequential patterns <a>, <b>, <c>, <d>, <e>, <f> Having prefix <c>,, <f> <b>-projected database Length-2 sequential patterns <aa>, <ab>, <(ab)>, <ac>, <ad>, <af> Having prefix <aa> Having prefix <af> <aa>-proj. db <af>-proj. db 79 / 193

37 Summary Frequent Pattern Mining Summary Frequent patterns: frequent combinations in large transaction databases Mining frequent patterns An anti-monotonic property The Apriori algorithm The FP-growth algorithm Sequential patterns and mining Sequential patterns GSP PrefixSpan 80 / 193

38 To-Do List Frequent Pattern Mining Summary Read the following paper to understand how PrefixSpan mines sequential patterns: J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu. Mining Sequential Patterns by Pattern-growth: The PrefixSpan Approach. IEEE Transactions on Knowledge and Data Engineering, Volume 16, Number 11, pages , November 2004, IEEE Computer Society. There is often redundancy among frequent patterns. Read the following paper to understand how FP-growth can be extended to mine frequent closed itemsets, a type of non-redundant frequent patterns: J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of the 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas,TX, May, / 193


CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Sequence Data: Sequential Pattern Mining Instructor: Yizhou Sun November 27, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B

Data Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases

Data Mining: Concepts and Techniques. Chapter Mining sequence patterns in transactional databases Data Mining: Concepts and Techniques Chapter 8 8.3 Mining sequence patterns in transactional databases Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler BBS654 Data Mining Pinar Duygulu Slides are adapted from Nazli Ikizler 1 Sequence Data Sequence Database: Timeline 10 15 20 25 30 35 Object Timestamp Events A 10 2, 3, 5 A 20 6, 1 A 23 1 B 11 4, 5, 6 B

Lecture 10 Sequential Pattern Mining

Lecture 10 Sequential Pattern Mining Lecture 10 Sequential Pattern Mining Zhou Shuigeng June 3, 2007 Outline Sequence data Sequential patterns Basic algorithm for sequential pattern mining Advanced algorithms for sequential pattern mining

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

Data Mining for Knowledge Management. Association Rules

Data Mining for Knowledge Management. Association Rules 1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB) Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),

BCB 713 Module Spring 2011

BCB 713 Module Spring 2011 Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions

Association Rule Mining

Association Rule Mining Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

Association Rules. A. Bellaachia Page: 1

Association Rules. A. Bellaachia Page: 1 Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...

Advance Association Analysis

Advance Association Analysis Advance Association Analysis 1 Minimum Support Threshold 3 Effect of Support Distribution Many real data sets have skewed support distribution Support distribution of a retail data set 4 Effect of Support

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Sequence Data Instructor: Yizhou Sun November 22, 2015 Announcement TRACE faculty survey myneu->self service tab Homeworks HW5 will be the last homework

Chapter 13, Sequence Data Mining

Chapter 13, Sequence Data Mining CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

Nesnelerin İnternetinde Veri Analizi

Nesnelerin İnternetinde Veri Analizi Bölüm 4. Frequent Patterns in Data Streams What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other

CSE 5243 INTRO. TO DATA MINING CSE 543 INTRO. TO DATA MINING Advanced Frequent Pattern Mining & Locality Sensitive Hashing Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan Parthasarathy

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/19/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 16: Association Rules Jan-Willem van de Meent (credit: Yijun Zhao, Yi Wang, Tan et al., Leskovec et al.) Apriori: Summary All items Count

Effectiveness of Freq Pat Mining

Effectiveness of Freq Pat Mining Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and

Association Rule Mining

Association Rule Mining Huiping Cao, FPGrowth, Slide 1/22 Association Rule Mining FPGrowth Huiping Cao Huiping Cao, FPGrowth, Slide 2/22 Issues with Apriori-like approaches Candidate set generation is costly, especially when

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2017 Han, Kamber & Pei. All

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2013 " An second class in data mining Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Fundamental Data Mining Algorithms 2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University REVIEW What is Data

More information

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line

More information

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

Chapter 6: Mining Association Rules in Large Databases Chapter 6: Mining Association Rules in Large Databases Association rule mining Algorithms for scalable mining of (single-dimensional Boolean) association rules in transactional databases Mining various

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

Association Rule Mining (ARM) Komate AMPHAWAN Association Rule Mining (ARM) Komate AMPHAWAN 1 J-O-K-E???? 2 What can be inferred? I purchase diapers I purchase a new car I purchase OTC cough (ไอ) medicine I purchase a prescription medication (ใบส

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

CS490D: Introduction to Data Mining Prof. Walid Aref CS490D: Introduction to Data Mining Prof. Walid Aref January 30, 2004 Association Rules Mining Association Rules in Large Databases Association rule mining Algorithms for scalable mining of (singledimensional

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information


More information

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Topics to be covered Market Basket Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules; Frequent Pattern Mining, Efficient

More information

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Chapter 6: Association Rules Chapter 6: Association Rules Association rule mining Proposed by Agrawal et al in 1993. It is an important data mining model. Transaction data (no time-dependent) Assume all data are categorical. No good

More information


More information

Data Mining: Foundation, Techniques and Applications Data Mining: Foundation, Techniques and Applications Lesson 5,6: Association Rules/Frequent Patterns Li Cuiping( 李翠平 ) School of Information Renmin University of China Anthony Tung( 鄧锦浩 ) School of Computing

More information


More information

1. Interpret single-dimensional Boolean association rules from transactional databases

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2016 Han, Kamber & Pei. All

More information

H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. Paper s goals. H-mine characteristics. Why a new algorithm? H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases Paper s goals Introduce a new data structure: H-struct J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang Int. Conf. on Data Mining

More information

Association Rules. Berlin Chen References: Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A

More information


More information

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

More information

c 2006 by Shengnan Cong. All rights reserved. c 26 by Shengnan Cong. All rights reserved. A SAMPLING-BASED FRAMEWORK FOR PARALLEL MINING FREQUENT PATTERNS BY SHENGNAN CONG B.E., Tsinghua University, 2 M.S., University of Illinois at Urbana-Champaign,

More information


More information

Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach Data Mining and Knowledge Discovery, 8, 53 87, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

More information

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang, Jiawei Han, Jian Pei Presentation by: Nasimeh Asgarian Department of Computing Science University of Alberta

More information

Association Rule Discovery Association Rule Discovery Association Rules describe frequent co-occurences in sets an item set is a subset A of all possible items I Example Problems: Which products are frequently bought together by

More information

Association Rules Apriori Algorithm Association Rules Apriori Algorithm Market basket analysis n Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner n Putting both items on promotion at the

More information

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

Scalable Frequent Itemset Mining Methods Scalable Frequent Itemset Mining Methods The Downward Closure Property of Frequent Patterns The Apriori Algorithm Extensions or Improvements of Apriori Mining Frequent Patterns by Exploring Vertical Data

More information

Association Rule Discovery Association Rule Discovery Association Rules describe frequent co-occurences in sets an itemset is a subset A of all possible items I Example Problems: Which products are frequently bought together by

More information

Trajectory Pattern Mining. Figures and charts are from some materials downloaded from the internet. Trajectory Pattern Mining Figures and charts are from some materials downloaded from the internet. Outline Spatio-temporal data types Mining trajectory patterns Spatio-temporal data types Spatial extension

More information

Knowledge Discovery in Databases II Winter Term 2015/2016. Optional Lecture: Pattern Mining & High-D Data Mining Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases II Winter Term 2015/2016 Optional Lecture: Pattern Mining

More information

Knowledge Discovery in Databases Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Lecture notes Knowledge Discovery in Databases Summer Semester 2012 Lecture 3: Frequent Itemsets

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application Data Structures Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali 2009-2010 Association Rules: Basic Concepts and Application 1. Association rules: Given a set of transactions, find

More information

Chapter 5, Data Cube Computation CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

Product presentations can be more intelligently planned Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA ( Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules

More information

FP-Growth algorithm in Data Compression frequent patterns FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : Abstract-The transmission

More information

PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures PFPM: Discovering Periodic Frequent Patterns with Novel Periodicity Measures 1 Introduction Frequent itemset mining is a popular data mining task. It consists of discovering sets of items (itemsets) frequently

More information


More information

Association Rules Apriori Algorithm Association Rules Apriori Algorithm Market basket analysis n Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner n Putting both items on promotion at the

More information

Frequent and Sequential Pattern Mining with Multiple Minimum Supports Frequent and Sequential Pattern Mining with Multiple Minimum Supports 中正資管胡雅涵 1 Outline Brief review on frequent and sequential pattern mining The rare item problem and the concept of multiple minimum

More information

Tutorial on Association Rule Mining Tutorial on Association Rule Mining Yang Yang DKE Group, 78-625 August 13, 2010 Outline 1 Quick Review 2 Apriori Algorithm 3 FP-Growth Algorithm 4 Mining Flickr and Tag Recommendation

More information

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

More information

Sequential PAttern Mining using A Bitmap Representation Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu Dept. of Computer Science Cornell University ABSTRACT We introduce a new algorithm for mining

More information

Association Rules and Association Rules and Sequential Patterns Road Map Frequent itemsets and rules Apriori algorithm FP-Growth Data formats Class association rules Sequential patterns. GSP algorithm 2 Objectives Association

More information

Distributed frequent sequence mining with declarative subsequence constraints. Alexander Renz-Wieland April 26, 2017 Distributed frequent sequence mining with declarative subsequence constraints Alexander Renz-Wieland April 26, 2017 Sequence: succession of items Words in text Products bought by a customer Nucleotides

More information

Road Map. Objectives. Objectives. Frequent itemsets and rules. Items and transactions. Association Rules and Sequential Patterns Road Map Association Rules and Sequential Patterns Frequent itemsets and rules Apriori algorithm FP-Growth Data formats Class association rules Sequential patterns. GSP algorithm 2 Objectives Association

More information

Production rule is an important element in the expert system. By interview with 2 Literature review Production rule is an important element in the expert system By interview with the domain experts, we can induce the rules and store them in a truth maintenance system An assumption-based

More information

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:

More information

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

A Comprehensive Survey on Sequential Pattern Mining A Comprehensive Survey on Sequential Pattern Mining Irfan Khan 1 Department of computer Application, S.A.T.I. Vidisha, (M.P.), India Anoop Jain 2 Department of computer Application, S.A.T.I. Vidisha, (M.P.),

More information


More information

An Algorithm for Mining Large Sequences in Databases 149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, ABSTRACT Frequent sequence mining is a fundamental and essential

More information


More information

2. Discovery of Association Rules 2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining

More information

Decision Support Systems Decision Support Systems 2011/2012 Week 7. Lecture 12 Some Comments on HWs You must be cri-cal with respect to results Don t blindly trust EXCEL/MATLAB/R/MATHEMATICA It s fundamental for an engineer! E.g.:

More information

CLOLINK: An Adapted Algorithm for Mining Closed Frequent Itemsets Journal of Computing and Information Technology - CIT 20, 2012, 4, 265 276 doi:10.2498/cit.1002017 265 CLOLINK: An Adapted Algorithm for Mining Closed Frequent Itemsets Adebukola Onashoga Department of

More information

CSCI6405 Project - Association rules mining CSCI6405 Project - Association rules mining Xuehai Wang B00182688 Xiaobo Chen B00123238 December 7, 2003 Chen Shen B00188996 Contents 1 Introduction: 2

More information

Unsupervised learning: Data Mining. Associa6on rules and frequent itemsets mining Unsupervised learning: Data Mining Associa6on rules and frequent itemsets mining Data Mining concepts Is the computa6onal process of discovering pa

More information

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12 VII.1-1 Chapter VII: Frequent Itemsets and Association

More information

Association Analysis: Basic Concepts and Algorithms 5 Association Analysis: Basic Concepts and Algorithms Many business enterprises accumulate large quantities of data from their dayto-day operations. For example, huge amounts of customer purchase data

More information