CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem. Muhammad Asiful Islam, SBID:

Similar documents
Classification by Association

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Rule induction. Dr Beatriz de la Iglesia

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

2. Discovery of Association Rules

Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12

Association mining rules

Data Mining Part 3. Associations Rules

Lecture 5: Decision Trees (Part II)

An Improved Apriori Algorithm for Association Rules

Lecture notes for April 6, 2005

Decision Trees: Discussion

Association Rule Mining. Entscheidungsunterstützungssysteme

Decision Support Systems 2012/2013. MEIC - TagusPark. Homework #5. Due: 15.Apr.2013

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Association Rules Outline

BITS F464: MACHINE LEARNING

AUTOMATIC classification has been a goal for machine

The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti

BCB 713 Module Spring 2011

Chapter 7: Frequent Itemsets and Association Rules

Tutorial on Association Rule Mining

Classification with Decision Tree Induction

Frequent Pattern Mining

Performance Based Study of Association Rule Algorithms On Voter DB

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Decision Support Systems

Decision Trees. Query Selection

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)

Association Rule Mining: FP-Growth

CS570 Introduction to Data Mining

Association Rules Apriori Algorithm

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Optimization using Ant Colony Algorithm

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Unsupervised learning: Data Mining. Associa6on rules and frequent itemsets mining

CompSci 516 Data Intensive Computing Systems

Advanced learning algorithms

Association rule mining

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Decision tree learning

Association Rules Apriori Algorithm

Optimal Bangla Keyboard Layout using Data Mining Technique S. M. Kamruzzaman 1 Md. Hijbul Alam 2 Abdul Kadar Muhammad Masum 3 Mohammad Mahadi Hassan 4

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Chapter 13, Sequence Data Mining

Frequent Itemsets Melange

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

Decision Trees Using Weka and Rattle

gspan: Graph-Based Substructure Pattern Mining

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

Introduction to Machine Learning

Effectiveness of Freq Pat Mining

Mining Association Rules in Large Databases

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining

Association Rule Mining. Introduction 46. Study core 46

Mining Top-K Association Rules. Philippe Fournier-Viger 1 Cheng-Wei Wu 2 Vincent Shin-Mu Tseng 2. University of Moncton, Canada

An Efficient Reduced Pattern Count Tree Method for Discovering Most Accurate Set of Frequent itemsets

Association Rule Learning

Mining Association Rules in Large Databases

Supervised and Unsupervised Learning (II)

RANKING AND SUGGESTING POPULAR ITEMSETS IN MOBILE STORES USING MODIFIED APRIORI ALGORITHM

Mining Frequent Itemsets from Uncertain Databases using probabilistic support

Data Mining Concepts

ISSUES IN DECISION TREE LEARNING

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Applications of Finite State Machines

Performance Analysis of Distributed Association Rule Mining with Apriori Algorithm

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets

9/9/2004. Release 3.5. Decision Trees 350 1

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

A New Technique to Optimize User s Browsing Session using Data Mining

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Data Mining and Machine Learning: Techniques and Algorithms

Sequential Data. COMP 527 Data Mining Danushka Bollegala

Association Rule Discovery

Research and Improvement of Apriori Algorithm Based on Hadoop

Improved Apriori Algorithms- A Survey

Components of Data mining Algorithms- An Overview

Sensitive Rule Hiding and InFrequent Filtration through Binary Search Method

A Trie-based APRIORI Implementation for Mining Frequent Item Sequences

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

Association Rules. A. Bellaachia Page: 1

Association Rules. Berlin Chen References:

International Journal of Computer Trends and Technology (IJCTT) volume 27 Number 2 September 2015

MARKET BASKET ANALYSIS

Transcription:

CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem Muhammad Asiful Islam, SBID: 106506983

Original Data Outlook Humidity Wind PlayTenis Sunny High Weak No Sunny High Strong No Overcast Normal Weak Yes Rain High Weak Yes Rain Normal Weak Yes Rain High Strong No Overcast Normal Strong Yes Sunny High Weak No Rain Normal Weak Yes Sunny Normal Strong Yes Overcast High Strong Yes Overcast Normal Weak Yes Rain High Strong No

Converted Data Outlook =Sunny (I1) Outlook =Overcast (I2) Outlook = Rain (I3) Humidity =High (I4) Humidity =Normal (I5) Wind =Weak (I6) Wind =Strong (I7) PlayTenis =Yes (I8) PlayTenis =No (I9) + - - + - + - - + + - - + - - + - + - + - - + + - + - - - + + - + - + - - - + - + + - + - - - + + - - + - + - + - - + - + + - + - - + - + - - + - - + - + + - + - + - - - + - + + - - + - + - - + + - - + - - + + - + - - - + + - - + - +

Generating 1-itemset Frequent Pattern Scan D for count of each candidate Item Set Support Count {I1} 4 {I2} 4 {I3} 5 Compare candidate support count with minimum support count Item Set Support Count {I1} 4 {I2} 4 {I3} 5 {I4} 7 {I4} 7 {I5} 6 {I5} 6 {I6} 7 {I6,} 7 {I7} 6 {I7} 6 {I8} 8 {I8} 8 {I9} 5 {I9} 5 C1 L1 Let, the minimum support count be 4. So, min_sup = 4/13 = 30.76% Let, minimum confidence required is 70%.

Generating 2-itemset Frequent Pattern Generate C2 candidates from L1 Item Set {I1,I2} {I1,I3} {I1,I4} {I1,I5} {I1,I6} {I1,I7} {I1,I8} {I1,I9} {I2,I3} {I2,I4} {I2,I5} {I2,I6} {I2,I7} {I2,I8} {I2,I9} Scan D for count of each candidate Item Set Support Count {I1,I2} 0 {I1,I3} 0 {I1,I4} 3 {I1,I5} 1 {I1,I6} 2 {I1,I7} 2 {I1,I8} 1 {I1,I9} 3 {I2,I3} 0 {I2,I4} 1 {I2,I5} 3 {I2,I6} 2 {I2,I7} 2 {I2,I8} 4 Compare candidate support count with minimum support count Item Set Support Count {I2,I8} 4 {I4,I7} 4 {I3,I4} {I2,I9} 0 {I4,I9} 5 {I3,I5} {I3,I4} 3 {I5,I6} 4 {I3,I6} {I3,I7} {I3,I8} {I3,I5} 2 {I3,I6} 3 {I3,I7} 2 {I5,I8} 6 {I6,I8} 5 {I3,I9} {I3,I8} 3 {I4,I5} {I3,I9} 2 {I4,I6} {I4,I7} {I4,I5} 0 {I4,I6} 3 L2 {I4,I8} {I4,I7} 4 {I4,I9} {I4,I8} 2 {I5,I6} {I4,I9} 5 {I5,I7} {I5,I6} 4 {I5,I8} {I5,I7} 2 {I5,I9} {I5,I8} 6 {I6,I7} {I5,I9} 0 {I6,I8} {I6,I9} {I7,I8} {I7,I9} {I8,I9} C2 {I6,I7} 0 {I6,I8} 5 {I6,I9} 2 {I7,I8} 3 {I7,I9} 3 {I8,I9} 0 C2

Join Operation L2 = {{I2, I8}, {I4, I7}, {I4, I9}, {I5, I6}, {I5, I8}, {I6, I8}}. -To find C3, we compute L2 join L2 -Similar to natural join operation in Data Base Join Operation illustrated: -Consider an itemset {I2,I8} from L2, -Now search for other itemsets containing I2 or I8 in L2; -this gives us itemsets{i5,i8} and {I6,I8} join = {I2,I8} {I5,I8} {I2,I5,I8} So, it s like joining two database tuples with common column value I8 So, {I2,I8} joining with {I5,I8} results {I2,I5,I8} and, {I2,I8} joining with {I6,I8} results {I2,I6,I8} Similarly, using the same technique for every itemset in L2 we get;- C3= L2 Join L2 = {{I2, I5, I8}, {I2, I6, I8}, {I4, I7, I9}, {I5, I6, I8}}.

Generating 3-itemset Frequent Pattern Generate C3 candidates from L2 Item Set {I5,I6,I8} Scan D for count of each candidate Item Set Support Count {I5,I6,I8} 4 Compare candidate support count with minimum support count Item Set Support Count {I5,I6,I8} 4 C3 C3 L3 -Use Apriori Property -- all subsets of a frequent itemset must also be frequent C3= L2 Join L2 = {{I2, I5, I8}, {I2, I6, I8}, {I4, I7, I9}, {I5, I6, I8}}. -For example, lets take {I5, I6, I8}.The 2-item subsets of it are {I5, I6}, {I5, I8} & {I6, I8}. Since all 2- item subsets of {I5, I6, I8} are members of L2, We will keep {I5, I6, I8} in C3. Lets take another example of {I2, I5, I8}which shows how the pruning is performed. The 2-item subsets are {I2, I5}, {I2, I8} & {I5,I8}. -BUT, {I2, I5} is not a member of L2and hence it is not frequent, violating Apriori Property. Thus We will have to remove {I2, I5, I8} from C3. Therefore, C3= {{I5, I6, I8}} after checking for all members of result of Join operation for Pruning.

Generating 4-itemset Frequent Pattern The algorithm uses L3 Join L3 to generate a candidate set of 4-itemsets, C4. Thus, C4= φ, and algorithm terminates, having found all of the frequent items. This completes our Apriori Algorithm. Next? These frequent itemsets will be used to generate strong association rules (where strong association rules satisfy both minimum support & minimum confidence). We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I6}, {I7}, {I8}, {I9} {I2,I8}, {I4,I7}, {I4,I9}, {I5,I6}, {I5,I8}, {I6,I8}, {I5,I6,I8}}. Now, to generate classification rules, we need to consider only the frequent item sets which contains I8 or I9.

Generating Association Rules from Frequent Itemsets Consider only the frequent item sets which contains I8 or I9: {I2,I8}, {I4,I9}, {I5,I8}, {I6,I8}, {I5,I6,I8} Consider {I2,I8} - R1: I2 I8 Confidence = sc{i2,i8}/sc{i2} = 4/4 = 100% selected. Consider {I4,I9} - R2: I4 I9 Confidence = sc{i4,i9}/sc{i4} = 5/7 = 71.42% selected. Consider {I5,I8} - R3: I5 I8 Confidence = sc{i5,i8}/sc{i5} = 6/6 = 100% selected. Consider {I6,I8} - R4: I6 I8 Confidence = sc{i6,i8}/sc{i6} = 5/7 = 71.42% selected. Consider{I5,I6,I8} - R5: I5 ^ I6 I8 Confidence = sc{i5,i6,i8}/sc{i5,i6} = 4/4 = 100% selected.

Classification Rules Rule (A B) [support, confidence] R1: I2 I8 [30.76%, 100%] R2: I4 I9 [38.46%, 71.42%] R3: I5 I8 [46.15%, 100%] R4: I6 I8 [38.46%, 71.42%] R6: I5 ^ I6 I8 [30.76%, 100%] Alternatively:- 1. Outlook = overcast PlayTenis = yes [30.76%, 100%] 2. Humidity = high PlayTenis = no [38.46%, 71.42%] 3. Humidity = normal PlayTenis = yes [46.15%, 100%] 4. Wind = weak PlayTenis = yes [38.46%, 71.42%] 5. Humidity = normal AND Wind = weak PlayTenis = yes [30.76%, 100%]

Test Data Set Outlook Humidity Wind PlayTenis Overcast Normal Strong Yes Rain Normal Weak Yes Rain Normal Strong No Sunny Normal Weak Yes Sunny High Strong No Given this test data, lets classify each of them and determine the accuracy:- 1: Outlook = overcast, And using rule 1, we get PlayTenis = Yes, So, this example is correctly classified. Note that rule 3 also applies. 2: Humidity = normal ^ Wind = weak, And using rule 5, we get PlayTenis = Yes, So, this example is correctly classified. 3: Humidity = normal, And using rule 3, we get PlayTenis = Yes. But actual class is No. So, this example is incorrectly classified. 4: Humidity = normal ^ Wind = weak, And using rule 5, we get PlayTenis = Yes, So, this example is correctly classified. 5: Humidity = high, And using rule 2, we get PlayTenis = No, So, this example is correctly classified. So, the accuracy of our association rule based classifier is = 4/5 = 80%

Thank You