Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.
|
|
- Adele Elaine Johnston
- 5 years ago
- Views:
Transcription
1 Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline Problem Getting the data: Data Warehouses,, OLAP Supervised learning: decision trees Unsupervised learning association rules Faloutsos/Pavlo CMUSCS 2 Problem Given: multiple data sources Find: patterns (classifiers, rules, clusters, outliers...) PGH NY sales(pid, cid, date, $price)??? Data Warehousing First step: collect the data, in a single place (= Data Warehouse) How? How often? How about discrepancies / nonhomegeneities? customers( cid, age, income,...) SF Faloutsos/Pavlo CMUSCS 3 Faloutsos/Pavlo CMUSCS 4 1
2 Faloutsos & Pavlo 15415/615 Data Warehousing First step: collect the data, in a single place (= Data Warehouse) How? A: Triggers/Materialized views How often? A: [Art!] How about discrepancies / nonhomegeneities? A: Wrappers/Mediators Data Warehousing Step 2: collect counts. (/OLAP) Eg.: Faloutsos/Pavlo CMUSCS 5 Faloutsos/Pavlo CMUSCS 6 OLAP Problem: is it true that shirts in large s sell better in dark s?, : DIMENSIONS count : MEASURE sales... Faloutsos/Pavlo CMUSCS 7 Faloutsos/Pavlo CMUSCS 8 2
3 Faloutsos & Pavlo 15415/615, : DIMENSIONS count : MEASURE, : DIMENSIONS count : MEASURE Faloutsos/Pavlo CMUSCS 9 Faloutsos/Pavlo CMUSCS 10, : DIMENSIONS count : MEASURE, : DIMENSIONS count : MEASURE Faloutsos/Pavlo CMUSCS 11 Faloutsos/Pavlo CMUSCS 12 3
4 Faloutsos & Pavlo 15415/615, : DIMENSIONS count : MEASURE DataCube Faloutsos/Pavlo CMUSCS 13 SQL query to generate DataCube: Naively (and painfully:) select,, count(*) from sales where pid = shirt group by, select, count(*) from sales where pid = shirt group by... Faloutsos/Pavlo CMUSCS 14 SQL query to generate DataCube: with cube by keyword: select,, count(*) from sales where pid = shirt cube by, DataCube issues: Q1: How to store them (and/or materialize portions on demand) Q2: Which operations to allow Faloutsos/Pavlo CMUSCS 15 Faloutsos/Pavlo CMUSCS 16 4
5 Faloutsos & Pavlo 15415/615 DataCube issues: Q1: How to store them (and/or materialize portions on demand) A: ROLAP/MOLAP Q2: Which operations to allow A: rollup, drill down, slice, dice [More details: book by HanKamber] Q1: How to store a datacube? Faloutsos/Pavlo CMUSCS 17 Faloutsos/Pavlo CMUSCS 18 Q1: How to store a datacube? A1: Relational (ROLAP) Q1: How to store a datacube? A2: Multidimensional (MOLAP) A3: Hybrid (HOLAP) Faloutsos/Pavlo CMUSCS 19 Faloutsos/Pavlo CMUSCS 20 5
6 Faloutsos & Pavlo 15415/615 Pros/Cons: ROLAP strong points: (DSS, Metacube) Pros/Cons: ROLAP strong points: (DSS, Metacube) use existing RDBMS technology scale up better with dimensionality Faloutsos/Pavlo CMUSCS 21 Faloutsos/Pavlo CMUSCS 22 Pros/Cons: MOLAP strong points: (EssBase/hyperion.com) faster indexing (careful with: highdimensionality; sparseness) Q1: How to store a datacube Q2: What operations should we support? HOLAP: (MS SQL server OLAP services) detail data in ROLAP; summaries in MOLAP Faloutsos/Pavlo CMUSCS 23 Faloutsos/Pavlo CMUSCS 24 6
7 Faloutsos & Pavlo 15415/615 Q2: What operations should we support? Q2: What operations should we support? Rollup Faloutsos/Pavlo CMUSCS 25 Faloutsos/Pavlo CMUSCS 26 Q2: What operations should we support? Drilldown Q2: What operations should we support? Slice Faloutsos/Pavlo CMUSCS 27 Faloutsos/Pavlo CMUSCS 28 7
8 Faloutsos & Pavlo 15415/615 Q2: What operations should we support? Dice Q2: What operations should we support? Rollup Drilldown Slice Dice (Pivot/rotate; drillacross; drillthrough top N moving averages, etc) Faloutsos/Pavlo CMUSCS 29 Faloutsos/Pavlo CMUSCS 30 D/W OLAP Conclusions D/W: copy (summarized) data analyze OLAP concepts: DataCube R/M/HOLAP servers dimensions ; measures Outline Problem Getting the data: Data Warehouses,, OLAP Supervised learning: decision trees Unsupervised learning association rules (clustering) Faloutsos/Pavlo CMUSCS 31 Faloutsos/Pavlo CMUSCS 32 8
9 Faloutsos & Pavlo 15415/615 Decision trees Problem Decision trees Pictorially, we have Faloutsos/Pavlo CMUSCS 33?? num. attr#2 (eg., chollevel) num. attr#1 (eg., age ) Faloutsos/Pavlo CMUSCS 34 Decision trees and we want to label? Decision trees so we build a decision tree: num. attr#2 (eg., chollevel)? num. attr#2 (eg., chollevel) 40? num. attr#1 (eg., age ) Faloutsos/Pavlo CMUSCS num. attr#1 (eg., age ) Faloutsos/Pavlo CMUSCS 36 9
10 Faloutsos & Pavlo 15415/615 Decision trees so we build a decision tree: age<50 Y N chol. <40 Y N... Faloutsos/Pavlo CMUSCS 37 Outline Problem Getting the data: Data Warehouses,, OLAP Supervised learning: decision trees problem approach scalability enhancements Unsupervised learning association rules (clustering) Faloutsos/Pavlo CMUSCS 38 Decision trees Typically, two steps: tree building tree pruning (for overtraining/overfitting) How? num. attr#2 (eg., chollevel) num. attr#1 (eg., age ) Faloutsos/Pavlo CMUSCS 39 Faloutsos/Pavlo CMUSCS 40 10
11 Faloutsos & Pavlo 15415/615 How? A: Partition, recursively pseudocode: Partition ( Dataset S) if all points in S have same label then return evaluate splits along each attribute A pick best split, to divide S into S1 and S2 Partition(S1); Partition(S2) Faloutsos/Pavlo CMUSCS 41 Q1: how to introduce splits along attribute A i Q2: how to evaluate a split? Faloutsos/Pavlo CMUSCS 42 Q1: how to introduce splits along attribute A i A1: for num. attributes: binary split, or multiple split for categorical attributes: compute all subsets (expensive!), or use a greedy algo Q1: how to introduce splits along attribute A i Q2: how to evaluate a split? Faloutsos/Pavlo CMUSCS 43 Faloutsos/Pavlo CMUSCS 44 11
12 Faloutsos & Pavlo 15415/615 Q1: how to introduce splits along attribute A i Q2: how to evaluate a split? A: by how close to uniform each subset is ie., we need a measure of uniformity: entropy: H(p, p) 1 Any other measure? p Faloutsos/Pavlo CMUSCS 45 Faloutsos/Pavlo CMUSCS 46 entropy: H(p, p ) gini index: 1p 2 p 2 entropy: H(p, p ) gini index: 1p 2 p p p (How about multiple labels?) Faloutsos/Pavlo CMUSCS 47 Faloutsos/Pavlo CMUSCS 48 12
13 Faloutsos & Pavlo 15415/615 Intuition: entropy: #bits to encode the class label gini: classification error, if we randomly guess with prob. p Faloutsos/Pavlo CMUSCS 49 Thus, we choose the split that reduces entropy/classificationerror the most: Eg.: num. attr#2 (eg., chollevel) num. attr#1 (eg., age ) Faloutsos/Pavlo CMUSCS 50 Before split: we need (n n ) * H( p, p ) = (76) * H(7/13, 6/13) bits total, to encode all the class labels After the split we need: 0 bits for the first half and (26) * H(2/8, 6/8) bits for the second half What for? num. attr#2 (eg., chollevel) Tree pruning num. attr#1 (eg., age )... Faloutsos/Pavlo CMUSCS 51 Faloutsos/Pavlo CMUSCS 52 13
14 Faloutsos & Pavlo 15415/615 Tree pruning Shortcut for scalability: DYNAMIC pruning: stop expanding the tree, if a node is reasonably homogeneous ad hoc threshold [Agrawal, vldb92] ( Minimum Description Language (MDL) criterion (SLIQ) [Mehta, edbt96] ) Tree pruning Q: How to do it? A1: use a training and a testing set prune nodes that improve classification in the testing set. (Drawbacks?) (A2: or, rely on MDL (= Minimum Description Language) ) Faloutsos/Pavlo CMUSCS 53 Faloutsos/Pavlo CMUSCS 54 Outline Problem Getting the data: Data Warehouses,, OLAP Supervised learning: decision trees problem approach scalability enhancements Unsupervised learning association rules (clustering) Faloutsos/Pavlo CMUSCS 55 Scalability enhancements Interval Classifier [Agrawal,vldb92]: dynamic pruning SLIQ: dynamic pruning with MDL; vertical partitioning of the file (but label column has to fit in core) SPRINT: even more clever partitioning Faloutsos/Pavlo CMUSCS 56 14
15 Faloutsos & Pavlo 15415/615 Conclusions for classifiers Classification through trees Building phase splitting policies Pruning phase (to avoid overfitting) For scalability: dynamic pruning clever data partitioning Faloutsos/Pavlo CMUSCS 57 Outline Problem Getting the data: Data Warehouses,, OLAP Supervised learning: decision trees problem approach scalability enhancements Unsupervised learning association rules (clustering) Faloutsos/Pavlo CMUSCS 58 Association rules idea [AgrawalSIGMOD93] Consider market basket case: (milk, bread) (milk) (milk, chocolate) (milk, bread) Find interesting things, eg., rules of the form: milk, bread > chocolate 90% Association rules idea In general, for a given rule Ij, Ik,... Im > Ix c c = confidence (how often people by Ix, given that they have bought Ij,... Im s = support: how often people buy Ij,... Im, Ix Faloutsos/Pavlo CMUSCS 59 Faloutsos/Pavlo CMUSCS 60 15
16 Faloutsos & Pavlo 15415/615 Association rules idea Problem definition: given a set of market baskets (=binary matrix, of N rows/ baskets and M columns/products) minsupport s and minconfidence c find all the rules with higher support and confidence Association rules idea Closely related concept: large itemset Ij, Ik,... Im, Ix is a large itemset, if it appears more than minsupport times Observation: once we have a large itemset, we can find out the qualifying rules easily (how?) Thus, let s focus on how to find large itemsets Faloutsos/Pavlo CMUSCS 61 Faloutsos/Pavlo CMUSCS 62 Association rules idea Naive solution: scan database once; keep 2** I counters Drawback? Improvement? Association rules idea Naive solution: scan database once; keep 2** I counters Drawback? 2**1000 is prohibitive... Improvement? scan the db I times, looking for 1, 2, etc itemsets Eg., for I =3 items only (A, B, C), we have Faloutsos/Pavlo CMUSCS 63 Faloutsos/Pavlo CMUSCS 64 16
17 Faloutsos & Pavlo 15415/615 Association rules idea Association rules idea A,B A,C B,C A B C first pass minsup:10 Faloutsos/Pavlo CMUSCS 65 A B C first pass minsup:10 Faloutsos/Pavlo CMUSCS 66 Association rules idea Antimonotonicity property: if an itemset fails to be large, so will every superset of it (hence all supersets can be pruned) Sketch of the (famous!) apriori algorithm Let L(i1) be the set of large itemsets with i1 elements Let C(i) be the set of candidate itemsets (of i) Association rules idea Compute L(1), by scanning the database. repeat, for i=2,3..., join L(i1) with itself, to generate C(i) two itemset can be joined, if they agree on their first i2 elements prune the itemsets of C(i) (how?) scan the db, finding the counts of the C(i) itemsets set this to be L(i) unless L(i) is empty, repeat the loop Faloutsos/Pavlo CMUSCS 67 Faloutsos/Pavlo CMUSCS 68 17
18 Faloutsos & Pavlo 15415/615 Association rules Conclusions Association rules: a great tool to find patterns easy to understand its output finetuned algorithms exist Overall Conclusions Data Mining = ``Big Data Analytics = Business Intelligence: of high commercial, government and research interest DM = DB ML StatSys Data warehousing / OLAP: to get the data Tree classifiers (SLIQ, SPRINT) Association Rules apriori algorithm (clustering: BIRCH, CURE, OPTICS) Faloutsos/Pavlo CMUSCS 69 Faloutsos/Pavlo CMUSCS 70 Reading material Agrawal, R., T. Imielinski, A. Swami, Mining Association Rules between Sets of Items in Large Databases, SIGMOD M. Mehta, R. Agrawal and J. Rissanen, `SLIQ: A Fast Scalable Classifier for Data Mining', Proc. of the Fifth Int'l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996 Additional references Agrawal, R., S. Ghosh, et al. (Aug. 2327, 1992). An Interval Classifier for Database Mining Applications. VLDB Conf. Proc., Vancouver, BC, Canada. Jiawei Han and Micheline Kamber, Data Mining, Morgan Kaufman, 2001, chapters , , Faloutsos/Pavlo CMUSCS 71 Faloutsos/Pavlo CMUSCS 72 18
Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem
More informationRoadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.
15-721 DB Sys. Design & Impl. Association Rules Christos Faloutsos www.cs.cmu.edu/~christos Roadmap 1) Roots: System R and Ingres... 7) Data Analysis - data mining datacubes and OLAP classifiers association
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #21: Data Mining and Warehousing
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #21: Data Mining and Warehousing Overview Tradi8onal database systems are tuned to many, small, simple queries. New applica8ons
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationData Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems
Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationData Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems
Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationChapter 4 Data Mining A Short Introduction
Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview
More informationData warehouses Decision support The multidimensional model OLAP queries
Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationAssociation Rule Mining. Entscheidungsunterstützungssysteme
Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set
More informationFrequent Pattern Mining
Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationData Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationAssociation mining rules
Association mining rules Given a data set, find the items in data that are associated with each other. Association is measured as frequency of occurrence in the same context. Purchasing one product when
More informationWhat Is Data Mining? CMPT 354: Database I -- Data Mining 2
Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT
More informationFundamental Data Mining Algorithms
2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data
More informationMining Association Rules in Large Databases
Mining Association Rules in Large Databases Vladimir Estivill-Castro School of Computing and Information Technology With contributions fromj. Han 1 Association Rule Mining A typical example is market basket
More informationProduct presentations can be more intelligently planned
Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth
More informationData Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3
Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?
More informationPATTERN DISCOVERY IN TIME-ORIENTED DATA
PATTERN DISCOVERY IN TIME-ORIENTED DATA Mohammad Saraee, George Koundourakis and Babis Theodoulidis TimeLab Information Management Group Department of Computation, UMIST, Manchester, UK Email: saraee,
More informationETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection
More informationSyllabus. Syllabus. Motivation Decision Support. Syllabus
Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationCT75 DATA WAREHOUSING AND DATA MINING DEC 2015
Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More informationDynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers
Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers A. Srivastava E. Han V. Kumar V. Singh Information Technology Lab Dept. of Computer Science Information Technology Lab Hitachi
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationBCB 713 Module Spring 2011
Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions
More informationOn-Line Application Processing
On-Line Application Processing WAREHOUSING DATA CUBES DATA MINING 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming,
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationAssociation Rules. Berlin Chen References:
Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More informationEfficient Computation of Data Cubes. Network Database Lab
Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationMining Frequent Patterns with Counting Inference at Multiple Levels
International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:
More informationAssociation Rules. A. Bellaachia Page: 1
Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...
More informationOutline. Project Update Data Mining: Answers without Queries. Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone
Outline Principles of Information and Database Management 198:336 Week 12 Apr 25 Matthew Stone Project Update Data Mining: Answers without Queries Patterns and statistics Finding frequent item sets Classification
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationComparative Study of Subspace Clustering Algorithms
Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that
More informationJarek Szlichta Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques
Jarek Szlichta http://data.science.uoit.ca/ Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques Frequent Itemset Mining Methods Apriori Which Patterns Are
More informationFrequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar
Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationVALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Section : CSE - 2 Subject Code : IT6702 Subject Name : Data warehousing
More informationAssociation Rule Mining. Introduction 46. Study core 46
Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent
More informationINSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad
INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program
More informationFrequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L
Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Topics to be covered Market Basket Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules; Frequent Pattern Mining, Efficient
More informationRule induction. Dr Beatriz de la Iglesia
Rule induction Dr Beatriz de la Iglesia email: b.iglesia@uea.ac.uk Outline What are rules? Rule Evaluation Classification rules Association rules 2 Rule induction (RI) As their name suggests, RI algorithms
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationMining Association Rules in OLAP Cubes
Mining Association Rules in OLAP Cubes Riadh Ben Messaoud, Omar Boussaid, and Sabine Loudcher Rabaséda Laboratory ERIC University of Lyon 2 5 avenue Pierre Mès-France, 69676, Bron Cedex, France rbenmessaoud@eric.univ-lyon2.fr,
More informationAssociation rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)
Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationProduction rule is an important element in the expert system. By interview with
2 Literature review Production rule is an important element in the expert system By interview with the domain experts, we can induce the rules and store them in a truth maintenance system An assumption-based
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP Data Warehousing
More informationMining Association Rules in Large Databases
Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other
More informationDynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering
Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of
More informationData warehouse and Data Mining
Data warehouse and Data Mining Lecture No. 14 Data Mining and its techniques Naeem A. Mahoto Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationCHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI
CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)
More informationDATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data
More informationLecture notes for April 6, 2005
Lecture notes for April 6, 2005 Mining Association Rules The goal of association rule finding is to extract correlation relationships in the large datasets of items. Many businesses are interested in extracting
More informationPESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore
Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic
More informationProcessing of Very Large Data
Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first
More informationCHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)
CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination
More informationOLAP Systems and Multidimensional Expressions
OLAP Systems and Multidimensional Expressions Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology)
More informationClassification by Association
Classification by Association Cse352 Ar*ficial Intelligence Professor Anita Wasilewska Generating Classification Rules by Association When mining associa&on rules for use in classifica&on we are only interested
More informationCS 1655 / Spring 2013! Secure Data Management and Web Applications
CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA
More informationAlgorithm for Efficient Multilevel Association Rule Mining
Algorithm for Efficient Multilevel Association Rule Mining Pratima Gautam Department of computer Applications MANIT, Bhopal Abstract over the years, a variety of algorithms for finding frequent item sets
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision
More informationAssociation Rules Outline
Association Rules Outline Goal: Provide an overview of basic Association Rule mining techniques Association Rules Problem Overview Large/Frequent itemsets Association Rules Algorithms Apriori Sampling
More informationData Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.
More informationOn-Line Analytical Processing (OLAP) Traditional OLTP
On-Line Analytical Processing (OLAP) CSE 6331 / CSE 6362 Data Mining Fall 1999 Diane J. Cook Traditional OLTP DBMS used for on-line transaction processing (OLTP) order entry: pull up order xx-yy-zz and
More informationAssociation Analysis. CSE 352 Lecture Notes. Professor Anita Wasilewska
Association Analysis CSE 352 Lecture Notes Professor Anita Wasilewska Association Rules Mining An Introduction This is an intuitive (more or less ) introduction It contains explanation of the main ideas:
More informationMultidimensional Queries
Multidimensional Queries Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationInterestingness Measurements
Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationMining Quantitative Association Rules on Overlapped Intervals
Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,
More informationInterestingness Measurements
Interestingness Measurements Objective measures Two popular measurements: support and confidence Subjective measures [Silberschatz & Tuzhilin, KDD95] A rule (pattern) is interesting if it is unexpected
More informationPerformance Based Study of Association Rule Algorithms On Voter DB
Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,
More informationModel for Load Balancing on Processors in Parallel Mining of Frequent Itemsets
American Journal of Applied Sciences 2 (5): 926-931, 2005 ISSN 1546-9239 Science Publications, 2005 Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets 1 Ravindra Patel, 2 S.S.
More informationA Review on Cluster Based Approach in Data Mining
A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationData Mining. Vera Goebel. Department of Informatics, University of Oslo
Data Mining Vera Goebel Department of Informatics, University of Oslo 2012 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More informationWarehousing. Data Mining
On Line Application Processing Warehousing Data Cubes Data Mining 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more timeconsuming,
More information