Data Mining Algorithms

Similar documents
Jarek Szlichta

Chapter 7: Frequent Itemsets and Association Rules

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Association Rule Mining. Entscheidungsunterstützungssysteme

Data Mining Clustering

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Association Rule Discovery

Machine Learning: Symbolische Ansätze

COMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017

Frequent Pattern Mining

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Association mining rules

Association Rule Discovery

Association Rule Mining. Introduction 46. Study core 46

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Machine Learning - Clustering. CS102 Fall 2017

An Introduction to Data Mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

Classification by Association

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Jeffrey D. Ullman Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

We will be releasing HW1 today It is due in 2 weeks (1/25 at 23:59pm) The homework is long

Data warehouse and Data Mining

High dim. data. Graph data. Infinite data. Machine learning. Apps. Locality sensitive hashing. Filtering data streams.

Nesnelerin İnternetinde Veri Analizi

An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth

Chapter 4 Data Mining A Short Introduction

Supervised and Unsupervised Learning (II)

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

Big Data Analytics CSCI 4030

Mining Association Rules in Large Databases

Chapter 4: Association analysis:

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Association rule mining

Association Rule with Frequent Pattern Growth. Algorithm for Frequent Item Sets Mining

ASSOCIATION RULE MINING: MARKET BASKET ANALYSIS OF A GROCERY STORE

Improved Frequent Pattern Mining Algorithm with Indexing

Association Rules Apriori Algorithm

2. Discovery of Association Rules

Decision Support Systems

Interestingness Measurements

Association Rules Outline

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Association Rules Apriori Algorithm

13 Frequent Itemsets and Bloom Filters

Outline. Project Update Data Mining: Answers without Queries. Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone

Association Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.

CS 6093 Lecture 7 Spring 2011 Basic Data Mining. Cong Yu 03/21/2011

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Mining Association Rules in Large Databases

Association Rules. CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University. Slides adapted from lectures by Jeff Ullman

Fundamental Data Mining Algorithms

Association Pattern Mining. Lijun Zhang

Tutorial on Association Rule Mining

Association Rules. Juliana Freire. Modified from Jeff Ullman, Jure Lescovek, Bing Liu, Jiawei Han

Efficient Frequent Itemset Mining Mechanism Using Support Count

Data Mining Course Overview

Pattern Discovery Using Apriori and Ch-Search Algorithm

Chapter 7: Frequent Itemsets and Association Rules

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

BCB 713 Module Spring 2011

Association Rules. Berlin Chen References:

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Frequent Item Sets & Association Rules

Production rule is an important element in the expert system. By interview with

A Comparative Study of Association Mining Algorithms for Market Basket Analysis

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Unstructured Data. CS102 Winter 2019

Association Rule Learning

COMP Associa0on Rules

Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points)

Interestingness Measurements

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Chapter 6: Association Rules

Knowledge Discovery in Databases

Data Mining Concepts & Tasks

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

signicantly higher than it would be if items were placed at random into baskets. For example, we

Knowledge Discovery & Data Mining

DATA MINING APRIORI ALGORITHM IMPLEMENTATION USING R

ISM 50 - Business Information Systems

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2018

Enhanced Outlier Detection Method Using Association Rule Mining Technique

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Association Analysis: Basic Concepts and Algorithms

Data Mining Concepts

Lecture notes for April 6, 2005

ITEM ARRANGEMENT PATTERN IN WAREHOUSE USING APRIORI ALGORITHM (GIANT KAPASAN CASE STUDY)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Frequent Pattern Mining

Association Technique in Data Mining and Its Applications

Transcription:

Algorithms Fall 2017

Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Looking for patterns in data Machine Learning Using data to make inferences or predictions Data Visualization Graphical depiction of data Data Collection and Preparation

Looking for patterns in data Similar to unsupervised machine learning Popularity predates popularity of machine learning Data mining often associated with specific data types and patterns We will focus on market-basket data Widely applicable (despite the name) And two types of data mining patterns Frequent item-sets Association rules

Other Data and Patterns Other types of data Graphs/networks Streams Text ( text mining ) Specific techniques for each one Other patterns Similar items Structural patterns in large graphs/networks Clusters, anomalies

(In)Famous Early Success Stories Victoria s Secret Walmart Beer & Diapers

Market-Basket Data Originated with retail data Each shopper buys market basket of groceries Mine data for patterns in buying habits General definition Domain of items Transaction - one or more items occurring together Dataset set of transactions (usually large)

Market-Basket Examples Items Groceries Online goods University courses University students Movies Symptoms Menu items Words Transaction Grocery cart Virtual shopping cart Student transcript Party Person Patient Restaurant customer Document

Algorithms Frequent Item-Sets sets of items that occur frequently together in transactions Groceries bought together Courses taken by same students Students going to parties together Movies watched by same people Association Rules When certain items occur together, another item frequently occurs with them Shoppers who buy phone + charger also buy case Students who take Databases also take Machine Learning Diners who order curry and rice also order bread

Frequent Item-Sets Sets of items that occur frequently together in transactions Ø How large is a set? Ø What does frequently mean?

Frequent Item-Sets Sets of items that occur frequently together in transactions Ø How large is a set? Usually specify a minimum min-set-size Possibly also a maximum max-set-size Ø What does frequently mean? Notion of support

Support Support for a set of items S in a dataset of transactions is the fraction of the transactions containing S: # of transactions containing S total # of transactions Specify support-threshold for frequent item-sets Only return sets where support > support-threshold

Your Turn Transactions: T1: milk, eggs, juice T2: milk, juice, cookies T3: eggs, chips T4: milk, eggs T5: milk, juice, cookies, chips What are the frequent item-sets if: min-set-size = 2 (no max-set-size) support-threshold = 0.3 Support: # of transactions containing S total # of transactions

Computing Frequent Item-Sets Apriori algorithm Efficiency relies on the following property: If S is a frequent item-set satisfying support-threshold t, then every subset of S is also a frequent item-set satisfying support-threshold t. Or the contrapositive: If S is not a frequent item-set satisfying support-threshold t, then no superset of S can be a frequent item-set satisfying support-threshold t.

Association Rules When a set of items S occurs together, another item i frequently occurs with them S à i Ø How large is a set? Ø What does occurs together mean? Ø What does frequently occurs with them mean?

Association Rules When a set of items S occurs together, another item i frequently occurs with them S à i Ø How large is a set? Usually specify a minimum min-set-size for S Possibly also a maximum max-set-size for S Ø What does occurs together mean? Ø What does frequently occurs with them mean?

Association Rules When a set of items S occurs together, another item i frequently occurs with them S à i Ø How large is a set? Usually specify a minimum min-set-size for S Possibly also a maximum max-set-size for S Ø What does occurs together mean? Notion of support Ø What does frequently occurs with them mean? Notion of confidence

Support and Confidence Support for association rule S à i in a dataset of transactions is fraction of transactions containing S: # of transactions containing S total # of transactions Confidence for association rule S à i in a dataset of transactions is the fraction of transactions containing S that also contain i: # of transactions containing S and i # of transactions containing S

Support and Confidence Specify support-threshold and confidence-threshold for association rules Only return rules where: support > support-threshold and confidence > confidence-threshold

Your Turn Transactions: T1: milk, eggs, juice T2: milk, juice, cookies T3: eggs, chips T4: milk, eggs T5: milk, juice, cookies, chips Support: # of transactions containing S total # of transactions What are the association rules S à i if: min-set-size = 1 (no max-set-size) support-threshold = 0.5 confidence-threshold = 0.5 Confidence: # of transactions containing S and i # of transactions containing S

Computing Association Rules 1. Use frequent item-sets to find left-hand sides S satisfying support threshold 2. Then extend to find right-hand sides S à i satisfying confidence threshold NOT a property: If S à i is an association rule satisfying support-threshold t and confidence-threshold c, and S Í S, then S à i is an an association rule satisfying support-threshold t and confidence-threshold c. Why Not?

Association Rules: Lift Association rule S à i might have high confidence because item i appears frequently, not because it s associated with S. Confidence Lift for association rule S à i in a dataset of transactions is the fraction of transactions containing S that also contain i :, divided by the overall frequency of i : #trans # containing of transactions S and containing i #trans Scontaining and i i #trans containing # of transactions S containing total S#trans

Lift: Examples Transactions: T1: milk, eggs, juice Lift = 1: no association T2: milk, juice, cookies Lift > 1: association T3: eggs, chips Lift < 1: anti-association T4: milk, eggs T5: milk, juice, cookies, chips juice à cookies Lift = (2/3) (2/5) = 10/6 = 1.67 eggs à milk Lift = (2/3) (4/5) = 10/12 = 0.83 Lift: #trans containing S and i #trans containing i #trans containing S total #trans

Algorithms Fall 2017