Association Rules. Charles Sutton Data Mining and Exploration Spring Based on slides by Chris Williams and Amos Storkey. Thursday, 8 March 12

Similar documents
Association Rule Learning

Classification with Decision Tree Induction

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CSE 634/590 Data mining Extra Credit: Classification by Association rules: Example Problem. Muhammad Asiful Islam, SBID:

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Rule induction. Dr Beatriz de la Iglesia

Association Pattern Mining. Lijun Zhang

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

2. Discovery of Association Rules

Data Mining. Part 1. Introduction. 1.4 Input. Spring Instructor: Dr. Masoud Yaghini. Input

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

CS513-Data Mining. Lecture 2: Understanding the Data. Waheed Noor

Basic Concepts Weka Workbench and its terminology

Data Mining Algorithms: Basic Methods

Data Mining Part 3. Associations Rules

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Association Rule Mining and Clustering

Chapter 4 Data Mining A Short Introduction

BCB 713 Module Spring 2011

A STUDY ON ASSOCIATION RULES MINING ALGORITHMS S.Saveetha, M.Phil Scholar, Sengunthar Arts & Science College, Tiruchengode, Tamilnadu, India.

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Decision Trees: Discussion

DATA MINING II - 1DL460

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

BITS F464: MACHINE LEARNING

Mining Association Rules in Large Databases

CS570 Introduction to Data Mining

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Roadmap. PCY Algorithm

Performance and Scalability: Apriori Implementa6on

Data Mining Concepts

CHAPTER 5 WEIGHTED SUPPORT ASSOCIATION RULE MINING USING CLOSED ITEMSET LATTICES IN PARALLEL

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Machine Learning Chapter 2. Input

Chapter 6: Association Rules

Data Mining Techniques

Jeffrey D. Ullman Stanford University

Association Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.

Input: Concepts, Instances, Attributes

Supervised and Unsupervised Learning (II)

Frequent Pattern Mining

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Introduction to Machine Learning

Induction of Association Rules: Apriori Implementation

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Association Rule Mining: FP-Growth

ARTIFICIAL INTELLIGENCE (CS 370D)

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Data Mining. Part 1. Introduction. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Input

ISSUES IN DECISION TREE LEARNING

Advanced learning algorithms

Lecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Hash-Based Improvements to A-Priori. Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms

Data Mining Practical Machine Learning Tools and Techniques

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING

Association rule mining

AUTOMATIC classification has been a goal for machine

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Subgroup Mining (with a brief intro to Data Mining)

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)

Data Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input

Improvements to A-Priori. Bloom Filters Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results

MACHINE LEARNING Example: Google search

Data Mining Clustering

Advance Association Analysis

Data Mining Practical Machine Learning Tools and Techniques

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Chapter 4: Association analysis:

Association Analysis. CSE 352 Lecture Notes. Professor Anita Wasilewska

Interestingness Measurements

Association Rules. A. Bellaachia Page: 1

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Data Analytics and Boolean Algebras

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Chapter 4: Algorithms CS 795

Production rule is an important element in the expert system. By interview with

Decision Tree Learning

Market basket analysis

CHAPTER 8. ITEMSET MINING 226

Learning Rules. Learning Rules from Decision Trees

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Association Rules. Berlin Chen References:

Data Representation Information Retrieval and Data Mining. Prof. Matteo Matteucci

Data Mining and Machine Learning: Techniques and Algorithms

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Association Rule Discovery

Mining Frequent Patterns without Candidate Generation

Data Mining and Analytics

Transcription:

Association Rules Charles Sutton Data Mining and Exploration Spring 2012 Based on slides by Chris Williams and Amos Storkey

The Goal Find patterns : local regularities that occur more often than you would expect. Examples: If a person buys wine at a supermarket, they also buy cheese. (confidence: 20%) If a person likes Lord of the Rings and Star Wars, they like Star Trek (confidence: 90%) Look like they could be used for classification, but There is not a single class label in mind. They can predict any attribute or a set of attributes. They are unsupervised Not intended to be used together as a set Often mined from very large data sets

Example Data Market basket analysis, e.g., supermarket Item Chicken Onion Rocket Caviar Haggis Transactions trip to market 1 1 1 1 1 1 1 1 1 1 1 1 1 1.... These are databases that companies have already.

Other Examples Collaborative-filtering type data: e.g., Films a person has watched Rows: patients, columns: medical tests (Cabena et al, 1998) Survey data (Impact Resources, Inc., Columbus OH, 1987) Feature Demographic # Values Type 1 Sex 2 Categorical 2 Marital status 5 Categorical 3 Age 7 Ordinal 4 Education 6 Ordinal 5 Occupation 9 Categorical 6 Income 9 Ordinal 7 Years in Bay Area 5 Ordinal 8 Dual incomes 3 Categorical 9 Number in household 9 Ordinal 10 Number of children 9 Ordinal 11 Householder status 3 Categorical 12 Type of home 5 Categorical 13 Ethnic classification 8 Categorical 14 Language in home 3 Categorical [ ]

Toy Example Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High False No D2 Sunny Hot High True No D3 Overcast Hot High False Yes D4 Rain Mild High False Yes D5 Rain Cool Normal False Yes D6 Rain Cool Normal True No D7 Overcast Cool Normal True Yes D8 Sunny Mild High False No D9 Sunny Cool Normal False Yes D10 Rain Mild Normal False Yes D11 Sunny Mild Normal True Yes D12 Overcast Mild High True Yes D13 Overcast Hot Normal False Yes D14 Rain Mild High True No

Itemsets, Coverage, etc Call each column an attribute A 1,A 2,...A m An item set is a set of attribute value pairs (A i1 = a j1 ) (A i2 = a j2 )...(A ik = a jk ) Example: In the Play Tennis data Humidity = Normal Play = Yes Windy = False) The support of an item set is its frequency in the data set Example: support ( Humidity = Normal Play = Yes Windy = False) ) = 4 The confidence of an association rule if Y=y then Z=z is Example: P(Z = z Y = y) P(Windy = False Play = Yes Humidity = Normal) = 4/7

Item sets to rules First: We will find frequent item sets Then: We convert them to rules An itemset of size k can give rise to 2k -1 rules Example: itemset Windy=False, Play=Yes, Humidity=Normal Results in 7 rules including: IF Windy=False and Humidity=Normal THEN Play=Yes (4/4) IF Play=Yes THEN Humidity=Normal and Windy=False (4/9) IF True THEN Windy=False and Play=Yes and Humidity=Normal (4/14) We keep rules only whose confidence is greater than a threshold

Finding Frequent Itemsets Task: Find all item sets with support Insight: A large set can be no more frequent than its subsets, e.g., support(wind = False) support(wind = False, Outlook = Sunny) So search through itemsets in order of number of items An efficient algorithm for this is APRIORI (Agarwal and Srikant, 1994; Mannila et al, 1994)

APRIORI Algorithm (for binary variables) i = 1 C i = {{A} A is a variable} while C i is not empty database pass: for each set in C i test if it is frequent let L i be collection of frequent sets from C i candidate formation: let C i+1 be those sets of size i + 1 all of whose subsets are frequent end while

Single database pass is linear in C i n, make a pass for each i until C i is empty Candidate formation Find all pairs of sets {U, V } from L i such that U size i + 1 and test if this union is really a potential candidate. O( L i 3 ) V has Example: 5 three-item sets (ABC), (ABD), (ACD), (ACE), (BCD) Candidate four-item sets (ABCD) ok (ACDE) not ok because (CDE) is not present above

Comments Some association rules will be trivial, some interesting. Need to sort through them Example: pregnant => female (confidence: 1) Also can miss interesting but rare rules Example: vodka --> caviar (low support) Really this is a type of exploratory data analysis For rule A -->B, can be useful to compare P(B A) to P(B) APRIORI can be generalised to structures like subsequences and subtrees