Chapter 5, Data Cube Computation

Size: px
Start display at page:

Download "Chapter 5, Data Cube Computation"

Transcription

1 CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full materialization Materializing all the cells of all of the cuboids for a given data cube Issues in time and space Iceberg Cube Partial materialization Materializing the cells of only interesting cuboids Materializing only the cells in a cuboid whose measure value is above the minimum threshold Closed Cube Materializing only closed cells 1

2 Typical Computation Process Example all time item location supplier time,item time,location item,location location,supplier time,supplier item,supplier time,item,location time,location,supplier time,item,supplier item,location,supplier time, item, location, supplier Computation Techniques Aggregating Aggregating from the smallest child cuboid Caching Caching the result of a cuboid for the computation of other cuboids to reduce disk I/O Sorting, Hashing and Grouping Sorting, hashing, and grouping operations are applied to a dimension in order to reorder and cluster Pruning A priori pruning the cells with lower support than minimum threshold 2

3 CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation Multi-Way Array Aggregation Features Array-based bottom-up approach Uses multi-dimensional chunks all No direct tuple comparisons Simultaneous aggregation on multiple dimensions A B C Intermediate aggregate values are re-used for computing ancestor cuboids AB AC BC Full materialization (No iceberg optimization, No Apriori pruning) ABC 3

4 Aggregation Strategy Partitions array into chunks Chunk: a small sub-cube which fits in memory Data addressing Uses chunk id and offset Multi-way Aggregation Computes aggregates in multi-way Visits chunks in the order to minimize memory access to minimize memory space B C c c c c0 60 b3 B b b b a0 a1 a2 a3 A Aggregation Process Example c c c c0 60 b3 B b b b a0 a1 a2 a3 4

5 Aggregation Process Example - Continued c c c c0 60 b3 B b b b a0 a1 a2 a3 Optimization of Memory Usage What is the best traversing order? Depends on the memory space required Example Suppose the data size on each dimension A, B and C is 40, 400 and 4000, respectively. Minimum memory required when traversing the order, 1,2,3,4,5,, 64 Total memory required is a2 a a0 a1 A B C c c c c0 60 b3 B b b b0 5

6 Summary of Multi-Way Method Cuboids should be sorted and computed according to the data size on each dimension Keeps the smallest plane in the main memory, fetches and computes only one chunk at a time for the largest plane Problems Full materialization extremely large storage is required Computes well only for a small number of dimensions ( high dimensional data partial materialization ) Reference Zhao, Y., Deshpande, P.M. and Naughton, J.F., An Array-Based Algorithm for Simultaneous Multidimensional Aggregates, In Proceedings of SIGMOD (1997) CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation 6

7 Bottom-Up Computation (BUC) Characteristics Top-down approach Partial materialization (iceberg cube computation) Divides dimensions into partitions 1 all and facilitates iceberg pruning No simultaneous aggregation 2 A 10 B 14 C 16 D 3 AB 7 AC 9 AD 11 BC 13 BD 15 CD 4 ABC 6 ABD 8 ACD 12 BCD 5 ABCD Iceberg Pruning Process Partitioning Sorts data values Partitions into blocks that fit in memory Apriori Pruning For each block If it does not satisfy min_sup, its descendants are pruned If it satisfies min_sup, materialization and a recursive call including the next dimension 7

8 Apriori Pruning Example Description in SQL Iceberg cube computation Example compute cube iceberg_cube as select A, B, C, D, count(*) from R cube by A, B, C, D having count(*) min_sup (,,, ) 0-D cuboid ( a1,,, ) ( a1, b1,, ) ( a1, b1, c1, ) ( a1, b1, c1, d1 ) ( a1, b1, c2, ) ( a1, b2,, ) 1-D cuboid 2-D cuboid 3-D cuboid pruning 3-D cuboid pruning Summary of BUC Method Computation of sparse data cubes Problems Sensitive to the order of dimensions The most discriminating dimension should be used first Dimensions should be in the order of decreasing cardinality Dimensions should be in the order of increasing maximum number of duplicates Reference Beyer, K. and Ramakrishnan, R., Bottom-Up Computation of Sparse and Iceberg CUBEs, In Proceedings of SIGMOD (1999) 8

9 CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation Characteristics of Star-Cubing Characteristics Integrated method of top-down and bottom-up cube computation Explores both multidimensional aggregation (as Multi-way) and apriori pruning (as BUC) all Explores shared dimensions e.g., dimension A is a shared A/A B/B C/C D dimension of ACD and AD e.g., ABD/AB means ABD has AB/AB AC/AC AD/A BC/BC BD/B CD the shared dimension AB ABC/ABC ABD/AB ACD/A BCD ABCD 9

10 Iceberg Pruning Strategy Iceberg Pruning in a Shared Dimension If AB is a shared dimension of ABD, then the cuboid AB is computed simultaneously with ABD If the aggregate on a shared dimension does not satisfy the iceberg condition (min_sup), then all the cells extended from this shared dimension cannot satisfy the condition either If we can compute the shared dimensions before the actual cuboid, we can apply Apriori pruning Pruning while aggregating simultaneously on multiple dimensions Cuboid Trees Cuboid Trees Tree structure to represent cuboids Base cuboid tree, 3-D cuboid trees, 2-D cuboid trees, Each level represents a dimension Each node represents an attribute Each node includes the attribute value, aggregate value, descendant(s) a1: 30 root: 100 a2: 20 a3: 20 a4: 20 The path from the root to a leaf represents a tuple Example b1: 10 b2: 10 b3: 10 count(a1,b1,*,*) = 10 c1: 5 c2: 5 count(a1,b1,c1,*) = 5 d1: 2 d2: 3 10

11 Star Nodes Star Nodes If the single dimensional aggregate does not satisfy min_sup, no need to consider the node in the iceberg cube computation The nodes are replaced by * A B C D count Example (min_sup = 2) a1 b1 * * 1 a1 b1 * * 1 A B C D count a1 * * * 1 a1 b1 c1 d1 1 a2 * c3 d4 1 a1 b1 c4 d3 1 a2 * c3 d4 1 a1 b2 c2 d2 1 a2 b3 c3 d4 1 a2 b4 c3 d4 1 A B C D count a1 b1 * * 2 a1 * * * 1 a2 * c3 d4 2 Star Tree Structure Star Tree A cuboid tree that is compressed using star nodes Star Tree Construction Uses the compressed table Keeps the star table for lookup of star nodes Lossless compression from the original cuboid tree root:5 a1:3 a2:2 b*:1 b1:2 b*:2 c*:1 d*:1 c*:2 d*:2 c3:2 d4:2 Star Table b2 * b3 * b4 * c1 * c2 * d1 *... 11

12 Multi-Way Star-Tree Aggregation Aggregation DFS (depth-first-search) from the root of a star tree Creates star trees for the cuboids on the next level root:5 1 BCD:5 1 a1cd/a1:3 2 a1b*d/a1b*:1 3 a1b*c*/a1b*c*:1 4 a1:32 a2:2 b*:1 3 c*:1 4 d*:15 b*:13 b1:2 b*:2 c*:1 4 d*:1 5 c*:14 c*:2 c3:2 d*:1 5 d*:15 d*:2 d4:2 Base Tree BCD Tree ACD/A Tree ABD/AB Tree ABC/ABC Tree Pruning Prunes if the aggregates do not satisfy min_sup Prunes if all the nodes in the generated tree are star nodes Implementation of Star Cubing Method Multi-way aggregation & iceberg pruning all A/A C/C D/D BCD:3 b*:3 b1:2 AC/AC AD/A pruning BC/BC pruning BD/B CD c*:1 c3:2 c*:2 d*:1 d4:2 d*:2 pruning ABC/ABC pruning ABD/AB ACD/A BCD root:5 a1:3 a2:2 b*:1 b1:2 b*:2 ABCD c*:1 c*:2 c3:2 d*:1 d*:2 d4:2 12

13 Summary of Star Cubing Problems Sensitive to the order of dimensions The order of decreasing cardinality Reference Xin, D., Han, J., Li, X. and Wah, B.W., Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration, In Proceedings of VLDB (2003) CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation 13

14 Shell Fragment Cube Computation Motivations The computation and storage of iceberg cube are still costly for high dimensional data ( the curse of dimensionality ) Hard to determine an appropriate iceberg threshold No update in an iceberg cube Features Reduces a high dimensional cube into a set of lower dimensional cubes Lossless reduction Online re-construction of high-dimensional data cube Fragmentation Strategy Observation OLAP occurs only on a small subset of dimensions at a time Fragmentation Partitions the set of dimensions into shell fragments Computes data cubes for each shell fragment ( 20 3-D data cube computation is much better than 1 60-D data cube.) Retains inverted indices or value-list indices Semi-Online Computation Given the pre-computed fragment cubes, dynamically compute cube cells of the high-dimensional cube online 14

15 Inverted Index Example TID A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 Divides the 5 dimensions into 2 shell fragments, (A, B, C) and (D, E) Builds the inverted index (1-D inverted index) Value TID List Size a a b b c d d2 2 1 e e e3 5 1 Cube Computation Process Generalizes the 1-D inverted indices to multi-dimensional inverted indices Computes cuboids using the inverted indices Example Shell fragment cube ABC contains 7 cuboids, A, B, C, AB, AC, BC, ABC The cuboid AB is computed using the 2-D inverted index below Cell Intersection TID List Size (a1,b1) {1,2,3} {1,4,5} {1} 1 (a1,b2) {1,2,3} {2,3} {2,3} 2 (a2,b1) {4,5} {1,4,5} {4,5} 2 (a2,b2) {4,5} {2,3} {} 0 15

16 Online Query Computation Process Given shell fragment cubes Divides the query into shell fragments Fetches the corresponding TID list for each fragment from the cubes Intersects the TID lists from each fragment to construct instantiated base table Computes the online cube using the base table with any data cube computation algorithm Computes the query Shell Fragment Cube Process ABCDEF ABC Cube Dimensions DEF Cube D Cuboid EF Cuboid DE Cuboid Cell T-ID List d1 e1 {1, 3, 8, 9} d1 e2 {2, 4, 6, 7} d2 e1 {5, 10} 16

17 Online Query Process ABCDEFGHI J KLMN Instantiated Base Table Online Cube Summary of Shell Fragment Cube Advantages Significant increase of the speed of data cube computation Significant decrease of the storage usage Various applications in very high dimensional data Problems Tradeoffs between the preprocessing time to construct a data cube and the online computation time for queries Tradeoffs between the storage for a data cube and the memory for online computation Reference Li, X., Han, J. and Gonzalez, H., High-Dimensional OLAP: A Minimal Cubing Approach, In Proceedings of VLDB (2004) 17

18 Questions? Lecture Slides are found on the Course Website, 18

Data Cube Technology

Data Cube Technology Data Cube Technology Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl

More information

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids Chapter 5: Data Cube Technology Data Cube Technology Data Cube Computation: Basic Concepts Data Cube Computation Methods Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:

More information

Efficient Computation of Data Cubes. Network Database Lab

Efficient Computation of Data Cubes. Network Database Lab Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database

More information

2 CONTENTS

2 CONTENTS Contents 4 Data Cube Computation and Data Generalization 3 4.1 Efficient Methods for Data Cube Computation............................. 3 4.1.1 A Road Map for Materialization of Different Kinds of Cubes.................

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 5

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 5 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 5 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN:

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN: International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-6, Issue-1 E-ISSN: 2347-2693 Precomputing Shell Fragments for OLAP using Inverted Index Data Structure D. Datta

More information

Different Cube Computation Approaches: Survey Paper

Different Cube Computation Approaches: Survey Paper Different Cube Computation Approaches: Survey Paper Dhanshri S. Lad #, Rasika P. Saste * # M.Tech. Student, * M.Tech. Student Department of CSE, Rajarambapu Institute of Technology, Islampur(Sangli), MS,

More information

Data Mining: Concepts and Techniques. Chapter 4

Data Mining: Concepts and Techniques. Chapter 4 Data Mining: Concepts and Techniques Chapter 4 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights

More information

EFFICIENT computation of data cubes has been one of the

EFFICIENT computation of data cubes has been one of the IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 1, JANUARY 2007 111 Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach Dong Xin, Student Member, IEEE,

More information

Data Mining: Concepts and Techniques. Example of Star Schema. Multidimensional Data. A Short Recap on Data Cubes Data Cube: A Lattice of Cuboids

Data Mining: Concepts and Techniques. Example of Star Schema. Multidimensional Data. A Short Recap on Data Cubes Data Cube: A Lattice of Cuboids Data Mining: Concepts and Techniques Sales Data Cube: A Short Recap on Data Cubes Data Cube: A Lattice of Cuboids all 0-D(apex) cuboid Chapter 4 time item location supplier -D cuboids Jiawei Han Department

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

Chapter 13, Sequence Data Mining

Chapter 13, Sequence Data Mining CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

Computing Data Cubes Using Massively Parallel Processors

Computing Data Cubes Using Massively Parallel Processors Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University

More information

An Empirical Comparison of Methods for Iceberg-CUBE Construction. Leah Findlater and Howard J. Hamilton Technical Report CS August, 2000

An Empirical Comparison of Methods for Iceberg-CUBE Construction. Leah Findlater and Howard J. Hamilton Technical Report CS August, 2000 An Empirical Comparison of Methods for Iceberg-CUBE Construction Leah Findlater and Howard J. Hamilton Technical Report CS-2-6 August, 2 Copyright 2, Leah Findlater and Howard J. Hamilton Department of

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Chapter 4, Data Warehouse and OLAP Operations

Chapter 4, Data Warehouse and OLAP Operations CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent

More information

PnP: Parallel And External Memory Iceberg Cube Computation

PnP: Parallel And External Memory Iceberg Cube Computation : Parallel And External Memory Iceberg Cube Computation Ying Chen Dalhousie University Halifax, Canada ychen@cs.dal.ca Frank Dehne Griffith University Brisbane, Australia www.dehne.net Todd Eavis Concordia

More information

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

Multi-Cube Computation

Multi-Cube Computation Multi-Cube Computation Jeffrey Xu Yu Department of Sys. Eng. and Eng. Management The Chinese University of Hong Kong Hong Kong, China yu@se.cuhk.edu.hk Hongjun Lu Department of Computer Science Hong Kong

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

ECT7110 Introduction to Data Warehousing

ECT7110 Introduction to Data Warehousing ECT7110 Introduction to Data Warehousing Prof. Wai Lam ECT7110 Introduction to Data Warehousing 1 What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database

More information

CS 6093 Lecture 7 Spring 2011 Basic Data Mining. Cong Yu 03/21/2011

CS 6093 Lecture 7 Spring 2011 Basic Data Mining. Cong Yu 03/21/2011 CS 6093 Lecture 7 Spring 2011 Basic Data Mining Cong Yu 03/21/2011 Announcements No regular office hour next Monday (March 28 th ) Office hour next week will be on Tuesday through Thursday by appointment

More information

Jarek Szlichta Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques

Jarek Szlichta  Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques Jarek Szlichta http://data.science.uoit.ca/ Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques Frequent Itemset Mining Methods Apriori Which Patterns Are

More information

Data Warehousing 資料倉儲

Data Warehousing 資料倉儲 Data Warehousing 資料倉儲 Data Cube Computation and Data Generation 992DW05 MI4 Tue. 8,9 (15:10-17:00) L413 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學資訊管理學系

More information

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

More information

ECLT 5810 Introduction to Data Warehousing

ECLT 5810 Introduction to Data Warehousing ECLT 5810 Introduction to Data Warehousing Prof. Wai Lam ECLT 5810 Introduction to Data Warehousing 1 What is Data Warehouse? Provides tools for business executives Systematically organize and understand

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 16: Association Rules Jan-Willem van de Meent (credit: Yijun Zhao, Yi Wang, Tan et al., Leskovec et al.) Apriori: Summary All items Count

More information

An Empirical Comparison of Methods for Iceberg-CUBE Construction

An Empirical Comparison of Methods for Iceberg-CUBE Construction From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. An Empirical Comparison of Methods for Iceberg-CUBE Construction Leah Findlater and Howard J. Hamilton Department

More information

Data Mining for Knowledge Management. Association Rules

Data Mining for Knowledge Management. Association Rules 1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad

More information

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking Dong Xin Zheng Shao Jiawei Han Hongyan Liu University of Illinois at Urbana-Champaign, Urbana, IL 680, USA October 6, 005 Abstract

More information

Effectiveness of Freq Pat Mining

Effectiveness of Freq Pat Mining Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

Dta Mining and Data Warehousing

Dta Mining and Data Warehousing CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:

More information

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking Dong Xin Zheng Shao Jiawei Han Hongyan Liu University of Illinois at Urbana-Champaign, Urbana, IL 6, USA Tsinghua University,

More information

Frequent Pattern Mining

Frequent Pattern Mining Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Roadmap. PCY Algorithm

Roadmap. PCY Algorithm 1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY

More information

CS490D: Introduction to Data Mining Chris Clifton

CS490D: Introduction to Data Mining Chris Clifton CS490D: Introduction to Data Mining Chris Clifton January 16, 2004 Data Warehousing Data Warehousing and OLAP Technology for Data Mining What is a data warehouse? A multi-dimensional data model Data warehouse

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Data Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road

More information

BCB 713 Module Spring 2011

BCB 713 Module Spring 2011 Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:

More information

Association Rule Mining: FP-Growth

Association Rule Mining: FP-Growth Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong We have already learned the Apriori algorithm for association rule mining. In this lecture, we will discuss a faster

More information

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)

Association rules. Marco Saerens (UCL), with Christine Decaestecker (ULB) Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),

More information

Quotient Cube: How to Summarize the Semantics of a Data Cube

Quotient Cube: How to Summarize the Semantics of a Data Cube Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)

More information

Lecture 2 Data Cube Basics

Lecture 2 Data Cube Basics CompSci 590.6 Understanding Data: Theory and Applica>ons Lecture 2 Data Cube Basics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Papers 1. Gray- Chaudhuri- Bosworth- Layman- Reichart- Venkatrao-

More information

Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

Chapter 4: Association analysis:

Chapter 4: Association analysis: Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2017 Han, Kamber & Pei. All

More information

Using Tiling to Scale Parallel Data Cube Construction

Using Tiling to Scale Parallel Data Cube Construction Using Tiling to Scale Parallel Data Cube Construction Ruoming in Karthik Vaidyanathan Ge Yang Gagan Agrawal Department of Computer Science and Engineering Ohio State University, Columbus OH 43210 jinr,vaidyana,yangg,agrawal

More information

Information Sciences

Information Sciences Information Sciences 181 (2011) 2626 2655 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Multidimensional cyclic graph approach: Representing

More information

Count of Group-bys Sizes < X 400. Sum of Group-bys Sizes < X Group-by Size / Input Size

Count of Group-bys Sizes < X 400. Sum of Group-bys Sizes < X Group-by Size / Input Size Bottom-Up Computation of Sparse and Iceberg CUBEs Kevin Beyer Computer Sciences Department University of Wisconsin { Madison beyer@cs.wisc.edu Raghu Ramakrishnan Computer Sciences Department University

More information

Chapter 7: Frequent Itemsets and Association Rules

Chapter 7: Frequent Itemsets and Association Rules Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 25 Data Mining and Mining Association Rules Instructor: Sudeepa Roy Due CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcements

More information

Association Rules. A. Bellaachia Page: 1

Association Rules. A. Bellaachia Page: 1 Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...

More information

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load

More information

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L

Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Topics to be covered Market Basket Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules; Frequent Pattern Mining, Efficient

More information

Map-Reduce for Cube Computation

Map-Reduce for Cube Computation 299 Map-Reduce for Cube Computation Prof. Pramod Patil 1, Prini Kotian 2, Aishwarya Gaonkar 3, Sachin Wani 4, Pramod Gaikwad 5 Department of Computer Science, Dr.D.Y.Patil Institute of Engineering and

More information

Mining Association Rules in Large Databases

Mining Association Rules in Large Databases Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other

More information

CHAPTER 8. ITEMSET MINING 226

CHAPTER 8. ITEMSET MINING 226 CHAPTER 8. ITEMSET MINING 226 Chapter 8 Itemset Mining In many applications one is interested in how often two or more objectsofinterest co-occur. For example, consider a popular web site, which logs all

More information

Data Warehousing & On-Line Analytical Processing

Data Warehousing & On-Line Analytical Processing Data Warehousing & On-Line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl

More information

PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining

PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining Journal of Parallel and Distributed Computing 61, 285321 (2001) doi:10.1006jpdc.2000.1691, available online at http:www.idealibrary.com on PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/19/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged. Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights

More information

Specifying logic functions

Specifying logic functions CSE4: Components and Design Techniques for Digital Systems Specifying logic functions Instructor: Mohsen Imani Slides from: Prof.Tajana Simunic and Dr.Pietro Mercati We have seen various concepts: Last

More information

OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING

OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING ES200 Peterhouse College, Cambridge Frans Coenen, Paul Leng and Graham Goulbourne The Department of Computer Science The University of Liverpool

More information

Sequential PAttern Mining using A Bitmap Representation

Sequential PAttern Mining using A Bitmap Representation Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu Dept. of Computer Science Cornell University ABSTRACT We introduce a new algorithm for mining

More information

Data Warehousing & On-line Analytical Processing

Data Warehousing & On-line Analytical Processing Data Warehousing & On-line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ Chapter 4: Data Warehousing and On-line

More information

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations

Basic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and

More information

Fundamental Data Mining Algorithms

Fundamental Data Mining Algorithms 2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data

More information

Range CUBE: Efficient Cube Computation by Exploiting Data Correlation

Range CUBE: Efficient Cube Computation by Exploiting Data Correlation Range CUBE: Efficient Cube Computation by Exploiting Data Correlation Ying Feng Divyakant Agrawal Amr El Abbadi Ahmed Metwally Department of Computer Science University of California, Santa Barbara Email:

More information

A Simple and Efficient Method for Computing Data Cubes

A Simple and Efficient Method for Computing Data Cubes A Simple and Efficient Method for Computing Data Cubes Viet Phan-Luong Université Aix-Marseille LIF - UMR CNRS 6166 Marseille, France Email: viet.phanluong@lif.univ-mrs.fr Abstract Based on a construction

More information

Association Rules Extraction with MINE RULE Operator

Association Rules Extraction with MINE RULE Operator Association Rules Extraction with MINE RULE Operator Marco Botta, Rosa Meo, Cinzia Malangone 1 Introduction In this document, the algorithms adopted for the implementation of the MINE RULE core operator

More information

CS 412 Intro. to Data Mining

CS 412 Intro. to Data Mining CS 412 Intro. to Data Mining Chapter 4. Data Warehousing and On-line Analytical Processing Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2017 1 2 3 Chapter 4: Data Warehousing and

More information

Product presentations can be more intelligently planned

Product presentations can be more intelligently planned Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules

More information

Data Warehousing & OLAP

Data Warehousing & OLAP Data Warehousing & OLAP Data Mining: Concepts and Techniques Chapter 3 Jiawei Han and An Introduction to Database Systems C.J.Date, Eighth Eddition, Addidon Wesley, 4 1 What is Data Warehousing? What is

More information

Data Cube Materialization Using Map Reduce

Data Cube Materialization Using Map Reduce Data Cube Materialization Using Map Reduce Kawhale Rohitkumar 1, Sarita Patil 2 Student, Dept. of Computer Engineering, G.H Raisoni College of Engineering and Management, Pune, SavitribaiPhule Pune University,

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory

Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home

More information

Cube-Lifecycle Management and Applications

Cube-Lifecycle Management and Applications Cube-Lifecycle Management and Applications Konstantinos Morfonios National and Kapodistrian University of Athens, Department of Informatics and Telecommunications, University Campus, 15784 Athens, Greece

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

Multi-Dimensional Partitioning in BUC for Data Cubes

Multi-Dimensional Partitioning in BUC for Data Cubes San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 Multi-Dimensional Partitioning in BUC for Data Cubes Kenneth Yeung San Jose State University Follow

More information

MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining

MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei 2012 Han, Kamber &

More information

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining

Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes

More information

RELATIONAL OPERATORS #1

RELATIONAL OPERATORS #1 RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Communication and Memory Optimal Parallel Data Cube Construction

Communication and Memory Optimal Parallel Data Cube Construction Communication and Memory Optimal Parallel Data Cube Construction Ruoming Jin Ge Yang Karthik Vaidyanathan Gagan Agrawal Department of Computer and Information Sciences Ohio State University, Columbus OH

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

DKT 122/3 DIGITAL SYSTEM 1

DKT 122/3 DIGITAL SYSTEM 1 Company LOGO DKT 122/3 DIGITAL SYSTEM 1 BOOLEAN ALGEBRA (PART 2) Boolean Algebra Contents Boolean Operations & Expression Laws & Rules of Boolean algebra DeMorgan s Theorems Boolean analysis of logic circuits

More information

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Datenbanksysteme II: Caching and File Structures. Ulf Leser Datenbanksysteme II: Caching and File Structures Ulf Leser Content of this Lecture Caching Overview Accessing data Cache replacement strategies Prefetching File structure Index Files Ulf Leser: Implementation

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}

More information