Chapter 5, Data Cube Computation
|
|
- Vernon Clifford Baldwin
- 6 years ago
- Views:
Transcription
1 CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full materialization Materializing all the cells of all of the cuboids for a given data cube Issues in time and space Iceberg Cube Partial materialization Materializing the cells of only interesting cuboids Materializing only the cells in a cuboid whose measure value is above the minimum threshold Closed Cube Materializing only closed cells 1
2 Typical Computation Process Example all time item location supplier time,item time,location item,location location,supplier time,supplier item,supplier time,item,location time,location,supplier time,item,supplier item,location,supplier time, item, location, supplier Computation Techniques Aggregating Aggregating from the smallest child cuboid Caching Caching the result of a cuboid for the computation of other cuboids to reduce disk I/O Sorting, Hashing and Grouping Sorting, hashing, and grouping operations are applied to a dimension in order to reorder and cluster Pruning A priori pruning the cells with lower support than minimum threshold 2
3 CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation Multi-Way Array Aggregation Features Array-based bottom-up approach Uses multi-dimensional chunks all No direct tuple comparisons Simultaneous aggregation on multiple dimensions A B C Intermediate aggregate values are re-used for computing ancestor cuboids AB AC BC Full materialization (No iceberg optimization, No Apriori pruning) ABC 3
4 Aggregation Strategy Partitions array into chunks Chunk: a small sub-cube which fits in memory Data addressing Uses chunk id and offset Multi-way Aggregation Computes aggregates in multi-way Visits chunks in the order to minimize memory access to minimize memory space B C c c c c0 60 b3 B b b b a0 a1 a2 a3 A Aggregation Process Example c c c c0 60 b3 B b b b a0 a1 a2 a3 4
5 Aggregation Process Example - Continued c c c c0 60 b3 B b b b a0 a1 a2 a3 Optimization of Memory Usage What is the best traversing order? Depends on the memory space required Example Suppose the data size on each dimension A, B and C is 40, 400 and 4000, respectively. Minimum memory required when traversing the order, 1,2,3,4,5,, 64 Total memory required is a2 a a0 a1 A B C c c c c0 60 b3 B b b b0 5
6 Summary of Multi-Way Method Cuboids should be sorted and computed according to the data size on each dimension Keeps the smallest plane in the main memory, fetches and computes only one chunk at a time for the largest plane Problems Full materialization extremely large storage is required Computes well only for a small number of dimensions ( high dimensional data partial materialization ) Reference Zhao, Y., Deshpande, P.M. and Naughton, J.F., An Array-Based Algorithm for Simultaneous Multidimensional Aggregates, In Proceedings of SIGMOD (1997) CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation 6
7 Bottom-Up Computation (BUC) Characteristics Top-down approach Partial materialization (iceberg cube computation) Divides dimensions into partitions 1 all and facilitates iceberg pruning No simultaneous aggregation 2 A 10 B 14 C 16 D 3 AB 7 AC 9 AD 11 BC 13 BD 15 CD 4 ABC 6 ABD 8 ACD 12 BCD 5 ABCD Iceberg Pruning Process Partitioning Sorts data values Partitions into blocks that fit in memory Apriori Pruning For each block If it does not satisfy min_sup, its descendants are pruned If it satisfies min_sup, materialization and a recursive call including the next dimension 7
8 Apriori Pruning Example Description in SQL Iceberg cube computation Example compute cube iceberg_cube as select A, B, C, D, count(*) from R cube by A, B, C, D having count(*) min_sup (,,, ) 0-D cuboid ( a1,,, ) ( a1, b1,, ) ( a1, b1, c1, ) ( a1, b1, c1, d1 ) ( a1, b1, c2, ) ( a1, b2,, ) 1-D cuboid 2-D cuboid 3-D cuboid pruning 3-D cuboid pruning Summary of BUC Method Computation of sparse data cubes Problems Sensitive to the order of dimensions The most discriminating dimension should be used first Dimensions should be in the order of decreasing cardinality Dimensions should be in the order of increasing maximum number of duplicates Reference Beyer, K. and Ramakrishnan, R., Bottom-Up Computation of Sparse and Iceberg CUBEs, In Proceedings of SIGMOD (1999) 8
9 CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation Characteristics of Star-Cubing Characteristics Integrated method of top-down and bottom-up cube computation Explores both multidimensional aggregation (as Multi-way) and apriori pruning (as BUC) all Explores shared dimensions e.g., dimension A is a shared A/A B/B C/C D dimension of ACD and AD e.g., ABD/AB means ABD has AB/AB AC/AC AD/A BC/BC BD/B CD the shared dimension AB ABC/ABC ABD/AB ACD/A BCD ABCD 9
10 Iceberg Pruning Strategy Iceberg Pruning in a Shared Dimension If AB is a shared dimension of ABD, then the cuboid AB is computed simultaneously with ABD If the aggregate on a shared dimension does not satisfy the iceberg condition (min_sup), then all the cells extended from this shared dimension cannot satisfy the condition either If we can compute the shared dimensions before the actual cuboid, we can apply Apriori pruning Pruning while aggregating simultaneously on multiple dimensions Cuboid Trees Cuboid Trees Tree structure to represent cuboids Base cuboid tree, 3-D cuboid trees, 2-D cuboid trees, Each level represents a dimension Each node represents an attribute Each node includes the attribute value, aggregate value, descendant(s) a1: 30 root: 100 a2: 20 a3: 20 a4: 20 The path from the root to a leaf represents a tuple Example b1: 10 b2: 10 b3: 10 count(a1,b1,*,*) = 10 c1: 5 c2: 5 count(a1,b1,c1,*) = 5 d1: 2 d2: 3 10
11 Star Nodes Star Nodes If the single dimensional aggregate does not satisfy min_sup, no need to consider the node in the iceberg cube computation The nodes are replaced by * A B C D count Example (min_sup = 2) a1 b1 * * 1 a1 b1 * * 1 A B C D count a1 * * * 1 a1 b1 c1 d1 1 a2 * c3 d4 1 a1 b1 c4 d3 1 a2 * c3 d4 1 a1 b2 c2 d2 1 a2 b3 c3 d4 1 a2 b4 c3 d4 1 A B C D count a1 b1 * * 2 a1 * * * 1 a2 * c3 d4 2 Star Tree Structure Star Tree A cuboid tree that is compressed using star nodes Star Tree Construction Uses the compressed table Keeps the star table for lookup of star nodes Lossless compression from the original cuboid tree root:5 a1:3 a2:2 b*:1 b1:2 b*:2 c*:1 d*:1 c*:2 d*:2 c3:2 d4:2 Star Table b2 * b3 * b4 * c1 * c2 * d1 *... 11
12 Multi-Way Star-Tree Aggregation Aggregation DFS (depth-first-search) from the root of a star tree Creates star trees for the cuboids on the next level root:5 1 BCD:5 1 a1cd/a1:3 2 a1b*d/a1b*:1 3 a1b*c*/a1b*c*:1 4 a1:32 a2:2 b*:1 3 c*:1 4 d*:15 b*:13 b1:2 b*:2 c*:1 4 d*:1 5 c*:14 c*:2 c3:2 d*:1 5 d*:15 d*:2 d4:2 Base Tree BCD Tree ACD/A Tree ABD/AB Tree ABC/ABC Tree Pruning Prunes if the aggregates do not satisfy min_sup Prunes if all the nodes in the generated tree are star nodes Implementation of Star Cubing Method Multi-way aggregation & iceberg pruning all A/A C/C D/D BCD:3 b*:3 b1:2 AC/AC AD/A pruning BC/BC pruning BD/B CD c*:1 c3:2 c*:2 d*:1 d4:2 d*:2 pruning ABC/ABC pruning ABD/AB ACD/A BCD root:5 a1:3 a2:2 b*:1 b1:2 b*:2 ABCD c*:1 c*:2 c3:2 d*:1 d*:2 d4:2 12
13 Summary of Star Cubing Problems Sensitive to the order of dimensions The order of decreasing cardinality Reference Xin, D., Han, J., Li, X. and Wah, B.W., Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration, In Proceedings of VLDB (2003) CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Multi-way Array Aggregation BUC: Iceberg Cube Computation Star-Cubing Shell Fragment Cube Computation 13
14 Shell Fragment Cube Computation Motivations The computation and storage of iceberg cube are still costly for high dimensional data ( the curse of dimensionality ) Hard to determine an appropriate iceberg threshold No update in an iceberg cube Features Reduces a high dimensional cube into a set of lower dimensional cubes Lossless reduction Online re-construction of high-dimensional data cube Fragmentation Strategy Observation OLAP occurs only on a small subset of dimensions at a time Fragmentation Partitions the set of dimensions into shell fragments Computes data cubes for each shell fragment ( 20 3-D data cube computation is much better than 1 60-D data cube.) Retains inverted indices or value-list indices Semi-Online Computation Given the pre-computed fragment cubes, dynamically compute cube cells of the high-dimensional cube online 14
15 Inverted Index Example TID A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 Divides the 5 dimensions into 2 shell fragments, (A, B, C) and (D, E) Builds the inverted index (1-D inverted index) Value TID List Size a a b b c d d2 2 1 e e e3 5 1 Cube Computation Process Generalizes the 1-D inverted indices to multi-dimensional inverted indices Computes cuboids using the inverted indices Example Shell fragment cube ABC contains 7 cuboids, A, B, C, AB, AC, BC, ABC The cuboid AB is computed using the 2-D inverted index below Cell Intersection TID List Size (a1,b1) {1,2,3} {1,4,5} {1} 1 (a1,b2) {1,2,3} {2,3} {2,3} 2 (a2,b1) {4,5} {1,4,5} {4,5} 2 (a2,b2) {4,5} {2,3} {} 0 15
16 Online Query Computation Process Given shell fragment cubes Divides the query into shell fragments Fetches the corresponding TID list for each fragment from the cubes Intersects the TID lists from each fragment to construct instantiated base table Computes the online cube using the base table with any data cube computation algorithm Computes the query Shell Fragment Cube Process ABCDEF ABC Cube Dimensions DEF Cube D Cuboid EF Cuboid DE Cuboid Cell T-ID List d1 e1 {1, 3, 8, 9} d1 e2 {2, 4, 6, 7} d2 e1 {5, 10} 16
17 Online Query Process ABCDEFGHI J KLMN Instantiated Base Table Online Cube Summary of Shell Fragment Cube Advantages Significant increase of the speed of data cube computation Significant decrease of the storage usage Various applications in very high dimensional data Problems Tradeoffs between the preprocessing time to construct a data cube and the online computation time for queries Tradeoffs between the storage for a data cube and the memory for online computation Reference Li, X., Han, J. and Gonzalez, H., High-Dimensional OLAP: A Minimal Cubing Approach, In Proceedings of VLDB (2004) 17
18 Questions? Lecture Slides are found on the Course Website, 18
Data Cube Technology
Data Cube Technology Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl
More informationData Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids
Chapter 5: Data Cube Technology Data Cube Technology Data Cube Computation: Basic Concepts Data Cube Computation Methods Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/
More informationData Warehousing and Data Mining
Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:
More informationEfficient Computation of Data Cubes. Network Database Lab
Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database
More information2 CONTENTS
Contents 4 Data Cube Computation and Data Generalization 3 4.1 Efficient Methods for Data Cube Computation............................. 3 4.1.1 A Road Map for Materialization of Different Kinds of Cubes.................
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 5
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 5 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationInternational Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN:
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-6, Issue-1 E-ISSN: 2347-2693 Precomputing Shell Fragments for OLAP using Inverted Index Data Structure D. Datta
More informationDifferent Cube Computation Approaches: Survey Paper
Different Cube Computation Approaches: Survey Paper Dhanshri S. Lad #, Rasika P. Saste * # M.Tech. Student, * M.Tech. Student Department of CSE, Rajarambapu Institute of Technology, Islampur(Sangli), MS,
More informationData Mining: Concepts and Techniques. Chapter 4
Data Mining: Concepts and Techniques Chapter 4 Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber, All rights
More informationEFFICIENT computation of data cubes has been one of the
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 1, JANUARY 2007 111 Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach Dong Xin, Student Member, IEEE,
More informationData Mining: Concepts and Techniques. Example of Star Schema. Multidimensional Data. A Short Recap on Data Cubes Data Cube: A Lattice of Cuboids
Data Mining: Concepts and Techniques Sales Data Cube: A Short Recap on Data Cubes Data Cube: A Lattice of Cuboids all 0-D(apex) cuboid Chapter 4 time item location supplier -D cuboids Jiawei Han Department
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationComputing Complex Iceberg Cubes by Multiway Aggregation and Bounding
Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,
More informationChapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the
Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule
More informationChapter 13, Sequence Data Mining
CSI 4352, Introduction to Data Mining Chapter 13, Sequence Data Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University Topics Single Sequence Mining Frequent sequence
More informationChapter 4: Mining Frequent Patterns, Associations and Correlations
Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent
More informationComputing Data Cubes Using Massively Parallel Processors
Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University
More informationAn Empirical Comparison of Methods for Iceberg-CUBE Construction. Leah Findlater and Howard J. Hamilton Technical Report CS August, 2000
An Empirical Comparison of Methods for Iceberg-CUBE Construction Leah Findlater and Howard J. Hamilton Technical Report CS-2-6 August, 2 Copyright 2, Leah Findlater and Howard J. Hamilton Department of
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,
More informationChapter 4, Data Warehouse and OLAP Operations
CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining
More informationMining Frequent Patterns without Candidate Generation
Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview
More informationFrequent Pattern Mining
Frequent Pattern Mining...3 Frequent Pattern Mining Frequent Patterns The Apriori Algorithm The FP-growth Algorithm Sequential Pattern Mining Summary 44 / 193 Netflix Prize Frequent Pattern Mining Frequent
More informationPnP: Parallel And External Memory Iceberg Cube Computation
: Parallel And External Memory Iceberg Cube Computation Ying Chen Dalhousie University Halifax, Canada ychen@cs.dal.ca Frank Dehne Griffith University Brisbane, Australia www.dehne.net Todd Eavis Concordia
More informationApriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the
More informationMulti-Cube Computation
Multi-Cube Computation Jeffrey Xu Yu Department of Sys. Eng. and Eng. Management The Chinese University of Hong Kong Hong Kong, China yu@se.cuhk.edu.hk Hongjun Lu Department of Computer Science Hong Kong
More informationDATA MINING II - 1DL460
DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationECT7110 Introduction to Data Warehousing
ECT7110 Introduction to Data Warehousing Prof. Wai Lam ECT7110 Introduction to Data Warehousing 1 What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database
More informationCS 6093 Lecture 7 Spring 2011 Basic Data Mining. Cong Yu 03/21/2011
CS 6093 Lecture 7 Spring 2011 Basic Data Mining Cong Yu 03/21/2011 Announcements No regular office hour next Monday (March 28 th ) Office hour next week will be on Tuesday through Thursday by appointment
More informationJarek Szlichta Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques
Jarek Szlichta http://data.science.uoit.ca/ Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques Frequent Itemset Mining Methods Apriori Which Patterns Are
More informationData Warehousing 資料倉儲
Data Warehousing 資料倉儲 Data Cube Computation and Data Generation 992DW05 MI4 Tue. 8,9 (15:10-17:00) L413 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University 淡江大學資訊管理學系
More informationFrequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar
Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction
More informationECLT 5810 Introduction to Data Warehousing
ECLT 5810 Introduction to Data Warehousing Prof. Wai Lam ECLT 5810 Introduction to Data Warehousing 1 What is Data Warehouse? Provides tools for business executives Systematically organize and understand
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 16: Association Rules Jan-Willem van de Meent (credit: Yijun Zhao, Yi Wang, Tan et al., Leskovec et al.) Apriori: Summary All items Count
More informationAn Empirical Comparison of Methods for Iceberg-CUBE Construction
From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. An Empirical Comparison of Methods for Iceberg-CUBE Construction Leah Findlater and Howard J. Hamilton Department
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More informationC-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking Dong Xin Zheng Shao Jiawei Han Hongyan Liu University of Illinois at Urbana-Champaign, Urbana, IL 680, USA October 6, 005 Abstract
More informationEffectiveness of Freq Pat Mining
Effectiveness of Freq Pat Mining Too many patterns! A pattern a 1 a 2 a n contains 2 n -1 subpatterns Understanding many patterns is difficult or even impossible for human users Non-focused mining A manager
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationANU MLSS 2010: Data Mining. Part 2: Association rule mining
ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements
More informationDta Mining and Data Warehousing
CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:
More informationC-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking Dong Xin Zheng Shao Jiawei Han Hongyan Liu University of Illinois at Urbana-Champaign, Urbana, IL 6, USA Tsinghua University,
More informationFrequent Pattern Mining
Frequent Pattern Mining How Many Words Is a Picture Worth? E. Aiden and J-B Michel: Uncharted. Reverhead Books, 2013 Jian Pei: CMPT 741/459 Frequent Pattern Mining (1) 2 Burnt or Burned? E. Aiden and J-B
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationRoadmap. PCY Algorithm
1 Roadmap Frequent Patterns A-Priori Algorithm Improvements to A-Priori Park-Chen-Yu Algorithm Multistage Algorithm Approximate Algorithms Compacting Results Data Mining for Knowledge Management 50 PCY
More informationCS490D: Introduction to Data Mining Chris Clifton
CS490D: Introduction to Data Mining Chris Clifton January 16, 2004 Data Warehousing Data Warehousing and OLAP Technology for Data Mining What is a data warehouse? A multi-dimensional data model Data warehouse
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationData Mining: Concepts and Techniques. Chapter 5. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 5 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road
More informationBCB 713 Module Spring 2011
Association Rule Mining COMP 790-90 Seminar BCB 713 Module Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline What is association rule mining? Methods for association rule mining Extensions
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:
More informationAssociation Rule Mining: FP-Growth
Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong We have already learned the Apriori algorithm for association rule mining. In this lecture, we will discuss a faster
More informationAssociation rules. Marco Saerens (UCL), with Christine Decaestecker (ULB)
Association rules Marco Saerens (UCL), with Christine Decaestecker (ULB) 1 Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004),
More informationQuotient Cube: How to Summarize the Semantics of a Data Cube
Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)
More informationLecture 2 Data Cube Basics
CompSci 590.6 Understanding Data: Theory and Applica>ons Lecture 2 Data Cube Basics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Papers 1. Gray- Chaudhuri- Bosworth- Layman- Reichart- Venkatrao-
More informationPerformance and Scalability: Apriori Implementa6on
Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their day-to-day operations, huge amounts of customer purchase data are collected daily
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013-2017 Han, Kamber & Pei. All
More informationUsing Tiling to Scale Parallel Data Cube Construction
Using Tiling to Scale Parallel Data Cube Construction Ruoming in Karthik Vaidyanathan Ge Yang Gagan Agrawal Department of Computer Science and Engineering Ohio State University, Columbus OH 43210 jinr,vaidyana,yangg,agrawal
More informationInformation Sciences
Information Sciences 181 (2011) 2626 2655 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Multidimensional cyclic graph approach: Representing
More informationCount of Group-bys Sizes < X 400. Sum of Group-bys Sizes < X Group-by Size / Input Size
Bottom-Up Computation of Sparse and Iceberg CUBEs Kevin Beyer Computer Sciences Department University of Wisconsin { Madison beyer@cs.wisc.edu Raghu Ramakrishnan Computer Sciences Department University
More informationChapter 7: Frequent Itemsets and Association Rules
Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 VII.1&2 1 Motivational Example Assume you run an on-line
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 25 Data Mining and Mining Association Rules Instructor: Sudeepa Roy Due CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcements
More informationAssociation Rules. A. Bellaachia Page: 1
Association Rules 1. Objectives... 2 2. Definitions... 2 3. Type of Association Rules... 7 4. Frequent Itemset generation... 9 5. Apriori Algorithm: Mining Single-Dimension Boolean AR 13 5.1. Join Step:...
More informationData Warehousing ETL. Esteban Zimányi Slides by Toon Calders
Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load
More informationFrequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L
Frequent Pattern Mining S L I D E S B Y : S H R E E J A S W A L Topics to be covered Market Basket Analysis, Frequent Itemsets, Closed Itemsets, and Association Rules; Frequent Pattern Mining, Efficient
More informationMap-Reduce for Cube Computation
299 Map-Reduce for Cube Computation Prof. Pramod Patil 1, Prini Kotian 2, Aishwarya Gaonkar 3, Sachin Wani 4, Pramod Gaikwad 5 Department of Computer Science, Dr.D.Y.Patil Institute of Engineering and
More informationMining Association Rules in Large Databases
Mining Association Rules in Large Databases Association rules Given a set of transactions D, find rules that will predict the occurrence of an item (or a set of items) based on the occurrences of other
More informationCHAPTER 8. ITEMSET MINING 226
CHAPTER 8. ITEMSET MINING 226 Chapter 8 Itemset Mining In many applications one is interested in how often two or more objectsofinterest co-occur. For example, consider a popular web site, which logs all
More informationData Warehousing & On-Line Analytical Processing
Data Warehousing & On-Line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl
More informationPARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining
Journal of Parallel and Distributed Computing 61, 285321 (2001) doi:10.1006jpdc.2000.1691, available online at http:www.idealibrary.com on PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/19/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.
More informationAssociation Pattern Mining. Lijun Zhang
Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms
More informationMarket baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.
Frequent itemset Association&decision rule mining University of Szeged What frequent itemsets could be used for? Features/observations frequently co-occurring in some database can gain us useful insights
More informationSpecifying logic functions
CSE4: Components and Design Techniques for Digital Systems Specifying logic functions Instructor: Mohsen Imani Slides from: Prof.Tajana Simunic and Dr.Pietro Mercati We have seen various concepts: Last
More informationOPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING
OPTIMISING ASSOCIATION RULE ALGORITHMS USING ITEMSET ORDERING ES200 Peterhouse College, Cambridge Frans Coenen, Paul Leng and Graham Goulbourne The Department of Computer Science The University of Liverpool
More informationSequential PAttern Mining using A Bitmap Representation
Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Jason Flannick, Johannes Gehrke, and Tomi Yiu Dept. of Computer Science Cornell University ABSTRACT We introduce a new algorithm for mining
More informationData Warehousing & On-line Analytical Processing
Data Warehousing & On-line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ Chapter 4: Data Warehousing and On-line
More informationBasic Concepts: Association Rules. What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations
What Is Frequent Pattern Analysis? COMP 465: Data Mining Mining Frequent Patterns, Associations and Correlations Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and
More informationFundamental Data Mining Algorithms
2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data
More informationRange CUBE: Efficient Cube Computation by Exploiting Data Correlation
Range CUBE: Efficient Cube Computation by Exploiting Data Correlation Ying Feng Divyakant Agrawal Amr El Abbadi Ahmed Metwally Department of Computer Science University of California, Santa Barbara Email:
More informationA Simple and Efficient Method for Computing Data Cubes
A Simple and Efficient Method for Computing Data Cubes Viet Phan-Luong Université Aix-Marseille LIF - UMR CNRS 6166 Marseille, France Email: viet.phanluong@lif.univ-mrs.fr Abstract Based on a construction
More informationAssociation Rules Extraction with MINE RULE Operator
Association Rules Extraction with MINE RULE Operator Marco Botta, Rosa Meo, Cinzia Malangone 1 Introduction In this document, the algorithms adopted for the implementation of the MINE RULE core operator
More informationCS 412 Intro. to Data Mining
CS 412 Intro. to Data Mining Chapter 4. Data Warehousing and On-line Analytical Processing Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2017 1 2 3 Chapter 4: Data Warehousing and
More informationProduct presentations can be more intelligently planned
Association Rules Lecture /DMBI/IKI8303T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, Objectives Introduction What is Association Mining? Mining Association Rules
More informationData Warehousing & OLAP
Data Warehousing & OLAP Data Mining: Concepts and Techniques Chapter 3 Jiawei Han and An Introduction to Database Systems C.J.Date, Eighth Eddition, Addidon Wesley, 4 1 What is Data Warehousing? What is
More informationData Cube Materialization Using Map Reduce
Data Cube Materialization Using Map Reduce Kawhale Rohitkumar 1, Sarita Patil 2 Student, Dept. of Computer Engineering, G.H Raisoni College of Engineering and Management, Pune, SavitribaiPhule Pune University,
More informationData Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems
Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check
More informationTutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining. Gyozo Gidofalvi Uppsala Database Laboratory
Tutorial on Assignment 3 in Data Mining 2009 Frequent Itemset and Association Rule Mining Gyozo Gidofalvi Uppsala Database Laboratory Announcements Updated material for assignment 3 on the lab course home
More informationCube-Lifecycle Management and Applications
Cube-Lifecycle Management and Applications Konstantinos Morfonios National and Kapodistrian University of Athens, Department of Informatics and Telecommunications, University Campus, 15784 Athens, Greece
More informationData Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems
Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,
More informationMulti-Dimensional Partitioning in BUC for Data Cubes
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 Multi-Dimensional Partitioning in BUC for Data Cubes Kenneth Yeung San Jose State University Follow
More informationMSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining
MSCIT 5210/MSCBD 5002: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei 2012 Han, Kamber &
More informationData Mining in Bioinformatics Day 5: Frequent Subgraph Mining
Data Mining in Bioinformatics Day 5: Frequent Subgraph Mining Chloé-Agathe Azencott & Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institutes
More informationRELATIONAL OPERATORS #1
RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationProcessing of Very Large Data
Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first
More informationCommunication and Memory Optimal Parallel Data Cube Construction
Communication and Memory Optimal Parallel Data Cube Construction Ruoming Jin Ge Yang Karthik Vaidyanathan Gagan Agrawal Department of Computer and Information Sciences Ohio State University, Columbus OH
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationDKT 122/3 DIGITAL SYSTEM 1
Company LOGO DKT 122/3 DIGITAL SYSTEM 1 BOOLEAN ALGEBRA (PART 2) Boolean Algebra Contents Boolean Operations & Expression Laws & Rules of Boolean algebra DeMorgan s Theorems Boolean analysis of logic circuits
More informationDatenbanksysteme II: Caching and File Structures. Ulf Leser
Datenbanksysteme II: Caching and File Structures Ulf Leser Content of this Lecture Caching Overview Accessing data Cache replacement strategies Prefetching File structure Index Files Ulf Leser: Implementation
More informationAssociation Rule Mining
Association Rule Mining Generating assoc. rules from frequent itemsets Assume that we have discovered the frequent itemsets and their support How do we generate association rules? Frequent itemsets: {1}
More information