DW Performance Optimization (II)

Size: px
Start display at page:

Download "DW Performance Optimization (II)"

Transcription

1 DW Performance Optimization (II)

2 Overview Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University 8 - DWML course

3 Data Cube Datacube queries compute aggregates over fact tables at different granularities CUBE BY: Product, Time, Location Aggregation Function: SUM(Sales) Cube table (i.e., results) Prod. Time Loc. Sales TV TV VCR Q Q 4Q A Canada Mexico 5 TV PC VCR sum Product Time Qtr Qtr Qtr 4Qtr sum U.S.A Canada Mexico Location A Canada 8 sum Mexico Aalborg University 8 - DWML course

4 ROLAP vs MOLAP ROLAP data cube Stored in a relational table Good for sparse data cube? scalable storage use response time MOLAP data cube Stored in special multidimensional data structures Good for dense data cube? storage use (foreign keys not needed) response time scalable ROLAP data cube d d SUM 5 4 MOLAP data cube d \ d 5 4 Aalborg University 8 - DWML course 4

5 ROLAP Technique(s) Aalborg University 8 - DWML course 5

6 ROLAP data cube computation Problem: How do we compute efficiently a data cube from a fact table? Constraints/background Fact table is huge, e.g., sales fact with terabyte Main memory size is relatively small, e.g., gigabyte The memory CANNOT fit the whole fact table Need to apply methods like external memory sorting, data partitioning We focus on I/O cost as I/O time >> CPU time External mergesort Aalborg University 8 - DWML course

7 Computing Data Cube Dimensions Product (P) Time (T) Location (L) Data cube and the lattice model This cube is a lattice with 8 nodes 8 different colored parts in the cube How do we compute this cube from the fact table? Each node is a GROUP BY query White: GROUP BY product, time, location Light green: GROUP BY product, location Gray: GROUP BY location Computation cost: 8 times of external sorting over the fact table TV PC VCR sum Product PTL PL PT TL P T L none Time Qtr Qtr Qtr 4Qtr sum U.S.A Canada Mexico sum Location Aalborg University 8 - DWML course

8 Computing Data Cube Computation sharing along a lattice path E.g., path: PTL PT P none Along a path, GROUP BY is performed for the first node Results of other nodes can be obtained at the same time Remaining paths to consider PL L TL T Cost: times of external sorting over fact table i.e., number of nodes in the middle layer size same PTL same PT same P PTL PL PT TL P T L none Base table (sorted by PTL) Prod. Time Loc. Sales PC Q Canada PC Q Canada PC Q Mexico PC Q Mexico 4 PC Q A 8 PC Q Canada 5 PC 4Q A 9 TV Q Canada Aalborg University 8 - DWML course 8

9 Sparseness Sparse relation / base table Large number of CUBE BY attributes (i.e., large lattice) Large Domain with CUBE BY attributes Base table size is a small fraction of the cross product size of attribute domains Existing methods are not efficient E.g., even a single external sorting operation is expensive, requiring multiple passes over the fact table Aalborg University 8 - DWML course 9

10 Partitioned-Cube Computing data cube efficiently for sparse data Fast Computation of Sparse Datacubes, in VLDB 99 Main memory has a fixed size and we cannot read the whole fact table into main memory Partitioning is faster than using external sorting Partitioned-Cube Partition the large relations into fragments that can fit into the memory It follows the recursive structure of datacubes A sub-datacube is obtained by fixing each possible value of a CUBE BY attribute Aalborg University 8 - DWML course

11 Partitioned-Cube (cont.) Algorithm Partition-Cube(R, {B,, B m }, A, G) R: a set of tuples {B,, B m }: CUBE BY attributes A: measure value G: aggregate function F: finest granularity datacube tuples D: remaining tuples (those with ) : if (R fits in memory) then return Memory-Cube(R, {B,, B m }, A, G) : choose an attribute B j among {B,, B m }, then scan R and partition on B j into {R,, R n } : for (i = to n) (F i, D i ) = Partition-Cube(R i, {B,, B m }, A, G) : let F = union of F i s 4: let (F, D ) = Partition-Cube(F, {B,, B j-, B j+, B m }, A, G) 5: let D = union of F, D and D i s : return (F, D) Note: n min{ m, # slots in memory } Country Aalborg University 8 - DWML course Relation R B B A Year G=SUM Sales 5 8 8

12 Outline of Steps for the Example B B B B Attributes: Country (B ), Year (B ) Choose attribute B, partition the base table on B Compute the cube B B Compute other cubes B * (excluding B B ), i.e., cube B Consider the cube B B, project out the attribute B Remaining attribute(s): B Compute the cube B Compute the cube * (excluding B ), i.e., cube None none Relation R B B A Country Year Sales Aalborg University 8 - DWML course

13 Partitioned-Cube (cont.) STEP #. Memory-Cube. Applicable when the relation fits in main memory Read the input relation into memory Compute the datacube (in memory only) by using the computation sharing method on slide #8 Aalborg University 8 - DWML course

14 Partitioned-Cube (cont.) STEP # select an attribute (say, Country) partition the large relation into fragments that can be fit into the memory (assuming the memory can hold 4 tuples in this example) R Country Year Sales 5 R Country Year Sales Country Year Sales 8 R 8 Aalborg University 8 - DWML course 4

15 Partitioned-Cube (cont.) STEP #.: Process the tuples in R Now R fits in main memory We execute step # to compute sub-datacubes Compute F : GROUP BY Country, Year Compute D : GROUP BY any other combination with Country E.g., GROUP BY Country GROUP BY Country, Year R F Country Year Sales 5 Country Year Sales GROUP BY Country Country Year Sales 9 Aalborg University 8 - DWML course 5 D

16 Partitioned-Cube (cont.) STEP #.: Process the tuples in R In the same way, we compute F and D GROUP BY Country, Year R F Country Year Sales 8 Country Year Sales 5 GROUP BY Country Country Year Sales 8 D Aalborg University 8 - DWML course

17 Partitioned-Cube (cont.) Step #: F = F F Step #4: set Country to in F (i.e., Country not in GROUP BY) call Partition-Cube on F, to obtain F and D F Country Year Sales F 8 Country Year Sales 5 5 Country Year Sales 8 F 5 5 GROUP BY Year Country Year Sales 4 Aalborg University 8 - DWML course F D GROUP BY none Country Year Sales 5

18 Partitioned-Cube (cont.) Step : D = (D F ) i= D i Step : return F, D Country Year Sales 4 Country Year Sales F 8 F 5 D D Country Year Sales 9 5 D Country Year Sales 8 D Country Year Sales 5 Aalborg University 8 - DWML course 8

19 Partitioned-Cube (cont.) Recursively execute STEP # if there are more than attributes Not in this example but in the next exercise Aalborg University 8 - DWML course 9

20 Partitioned-Cube Exercise Run the Partitioned-Cube Alg. on this example Attributes: B, B, B Verify your final result by using the fact table The following outline is given to you Choose attribute B, partition the base table on B Relation R B B B A Prod. Loc. Year Sales PC 8 Compute the cube B B B Compute other cubes B * (excluding B B B ), i.e., cubes B B, B B, B TV PC 4 Consider the cube B B B, project out the attribute B TV Remaining attribute(s): B B TV Choose attribute B, partition the cube B B on B PC Compute the cube B B Compute other cubes B * (excluding B B ), i.e., cube B PC Consider the cube B B, project out the attribute B Remaining attribute(s): B Compute the cube B Compute the cube * (excluding B ), i.e., cube None TV 5 G=SUM Aalborg University 8 - DWML course

21 MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University 8 - DWML course

22 Range Sum Query Range Sum Query Given a MOLAP data cube Specify a range for each (numeric) dimension Compute the SUM of these values Example: Measure: salary Numeric attributes: age, time Find the revenue from customers with an age from to 5, in years from to 5 Brute-force approach Accumulate the SUM value while visiting relevant cells What happens if the query covers many cells in the cube? Better solution Year Range Queries in OLAP Data Cubes, in ACM SIGMOD Age MOLAP data cube Aalborg University 8 - DWML course

23 Prefix Sum Array Consider a D array as a data cube for the moment Age as attribute v, Year as attribute v (values starting from ) Construct a prefix sum array P P[i,j] = Σ v=..i Σ v=..j A[v,v ] Fast computation of P by visiting them in lexicographic order and reusing previous values A range sum query is of the form RangeSum([l,h ], [l,h ]) Σ v=l..h Σ v=l..h A[v,v ] Using prefix sum array to answer query fast RangeSum([,], [,]) Easy. That s 4. RangeSum([,], [,]) Wait Index v \ v Index v \ v Cube A Prefix-Sum Array P (of A) Aalborg University 8 - DWML course

24 Query Processing RangeSum([,], [,]) can be rewritten as the sum of + RangeSum([,], [,]) Index v \ v Cube A 4 RangeSum([,], [,]) 5 RangeSum([,], [,]) 8 + RangeSum([,], [,]) 4 Using the prefix-sum array P, we have +4 + = Advantage Prefix-Sum Array P (of A) We only need to fetch 4 values, regardless of the range [l,h ], [l,h ] Index 4 Can we discard the array A? Why? Any entry A[i,j] is equivalent to RangeSum([i,i], [j,j]) v \ v Aalborg University 8 - DWML course 4

25 Update / Maintenance Cube A Insert a tuple (,) with measure value δ Index v \ v 4 Increment the count of A[,] by δ 5 Increment the count of P[i,j] by δ, for any i and j P[,], P[4,], P[,], P[4,] 4 8 If we insert the tuple (,), then we need to increment the whole prefix-array! Updates in data warehouse are often done in batch Index Prefix-Sum Array P (of A) 4 P can be updated in low amortized cost v \ v Aalborg University 8 - DWML course 5

26 Extension to Multi-dimensional Case Suppose that there are d dimensions and the cube A has N entries RangeSum([l,h ], [l,h ],, [l d,h d ]) Can be computed by a straightforward method, using Π j=..d (h j l j +) cell values Prefix-sum array P (of A) has N entries also Pre-computation time of P: O(dN) No need to keep A afterwards By using the prefix-sum array P, we only need to access d elements of P to compute RangeSum Aalborg University 8 - DWML course

27 Blocked Prefix Sum Array Tradeoff between array space and query processing time But we now need to keep the original array A Blocked Prefix Sum Array Define b as the length of a group of cells Only keep each entry P[i,j] where (i+) mod b = or i is the last index (j+) mod b = or j is the last index Processing a range sum query RangeSum([,], [,]) = 4 RangeSum([,], [,]) Decompose into internal region and boundary region Internal region can be processed by the above techniques Border region will be discussed in the next slide Index v \ v Aalborg University 8 - DWML course Index v \ v 8 Cube A Blocked Prefix-Sum Array P, b=

28 Blocked Prefix Sum Array Decompose query range into regions Internal region (dark gray) Compute the sum using blocked prefix sum array P Border region (light gray) Compute the sum by accessing cells in the original array A For each border cell, we choose the cheaper way to compute its sum Visit the cells (of A) within the range, or Visit the complement cells (of A) a block How about the update cost of Blocked Prefix Sum Array, compared to Prefix Sum Array? Aalborg University 8 - DWML course 8

29 Range Max Query A range max query is of the form RangeMax([l,h ], [l,h ]) max v=l..h max v=l..h A[v,v ] Can we build a prefix max array? Consider the queries RangeMax([,], [,]) RangeMax([,], [,]) The prefix property does not hold for the range max query If the global maximum value is at (,), then it overwrites other maximum value in any local region Cube A Index v \ v Prefix-Max Array??? Index v \ v Aalborg University 8 - DWML course 9

30 Multiway Augmented Tree Multiway tree structure Balanced tree Each node stores its associated region and the maximum value in that region Branch-and-bound search RangeMax([,], [,]) Visit root node Consider its subtrees * and * Visit the subtree * first Check leaf nodes and Obtain the maximum value 5 Track back to the branch * No need to visit its subtree Return 5 as the result root v v v Index v \ v 9 8 Cube A ** * (9) * (5) * () * () 5 4 Aalborg University 8 - DWML course

31 Update Multiway Augmented Tree Cube A Insertion Insert a tuple (,) with measure value δ A[,] = max(δ, A[,]) Compare the new value with the old value, and update the tree as follows Deletion Delete a tuple (,) with measure δ If δ equals to A[,], then we need to search tuples in that cell and update A[,] Compare the new value with the old value, and update the tree as follows Tree update If the new value is different from the old value, then we locate the leaf node and propagate changes upwards the tree Efficient update when compared to prefix cube root v Index v \ v ** * (9) * (5) * () * () v v Aalborg University 8 - DWML course

32 Variant of the Tree Cube A Index v \ v Variant of the Tree Like the previous multiway tree Still a balanced tree Each node is associated with a region and the maximum value in that region The only difference: The way that the branches are divided [,][,] root The previous solution for RangeMax queries is still applicable on this tree [,][,] (9) [,][,] () [,][,] (8) [,][,] () Aalborg University 8 - DWML course

33 Using the Tree for Range Sum Query Cube A Can we apply the multiway augmented tree for answering range sum query? YES, if for each node, we store the sum of values in its region But, it is not as efficient as the prefix-sum array Processing the query RangeSum([,], [,]) Visit only the relevant tree nodes and accumulate the sum result Question: Do we need to visit the subtree of [,][,] [,][,] [,][,] [,][,] Index v \ v [,][,] (4) [,][,] (5) [,][,] () root [,][,] () [,][,] (9) Aalborg University 8 - DWML course

34 Summary Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University 8 - DWML course 4

Efficient Computation of Data Cubes. Network Database Lab

Efficient Computation of Data Cubes. Network Database Lab Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

Overview. DW Performance Optimization. Aggregates. Aggregate Use Example

Overview. DW Performance Optimization. Aggregates. Aggregate Use Example Overview DW Performance Optimization Choosing aggregates Maintaining views Bitmapped indices Other optimization issues Original slides were written by Torben Bach Pedersen Aalborg University 07 - DWML

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders Data Warehousing Conclusion Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders Motivation for the Course Database = a piece of software to handle data: Store, maintain, and query Most ideal system

More information

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

A Multi-Dimensional Data Model

A Multi-Dimensional Data Model A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 18: Data Analysis and Mining

Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)? Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

2 CONTENTS

2 CONTENTS Contents 4 Data Cube Computation and Data Generalization 3 4.1 Efficient Methods for Data Cube Computation............................. 3 4.1.1 A Road Map for Materialization of Different Kinds of Cubes.................

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Indexing and Hashing

Indexing and Hashing C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter

More information

Cube-Lifecycle Management and Applications

Cube-Lifecycle Management and Applications Cube-Lifecycle Management and Applications Konstantinos Morfonios National and Kapodistrian University of Athens, Department of Informatics and Telecommunications, University Campus, 15784 Athens, Greece

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Information Sciences

Information Sciences Information Sciences 181 (2011) 2626 2655 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Multidimensional cyclic graph approach: Representing

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Nesnelerin İnternetinde Veri Analizi

Nesnelerin İnternetinde Veri Analizi Bölüm 4. Frequent Patterns in Data Streams w3.gazi.edu.tr/~suatozdemir What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

On-Line Analytical Processing (OLAP) Traditional OLTP

On-Line Analytical Processing (OLAP) Traditional OLTP On-Line Analytical Processing (OLAP) CSE 6331 / CSE 6362 Data Mining Fall 1999 Diane J. Cook Traditional OLTP DBMS used for on-line transaction processing (OLTP) order entry: pull up order xx-yy-zz and

More information

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple

More information

Compressed Aggregations for mobile OLAP Dissemination

Compressed Aggregations for mobile OLAP Dissemination INTERNATIONAL WORKSHOP ON MOBILE INFORMATION SYSTEMS (WMIS 2007) Compressed Aggregations for mobile OLAP Dissemination Ilias Michalarias Arkadiy Omelchenko {ilmich,omelark}@wiwiss.fu-berlin.de 1/25 Overview

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Multidimensional Indexes [14]

Multidimensional Indexes [14] CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting

More information

Exam Datawarehousing INFOH419 July 2013

Exam Datawarehousing INFOH419 July 2013 Exam Datawarehousing INFOH419 July 2013 Lecturer: Toon Calders Student name:... The exam is open book, so all books and notes can be used. The use of a basic calculator is allowed. The use of a laptop

More information

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See   for conditions on re-use Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: How business intelligence is a comprehensive framework to support business decision making How operational

More information

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks. Physical Disk Structure Physical Data Organization and Indexing Chapter 11 1 4 Access Path Refers to the algorithm + data structure (e.g., an index) used for retrieving and storing data in a table The

More information

CS 1655 / Spring 2013! Secure Data Management and Web Applications

CS 1655 / Spring 2013! Secure Data Management and Web Applications CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Sql Fact Constellation Schema In Data Warehouse With Example

Sql Fact Constellation Schema In Data Warehouse With Example Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL

More information

ECT7110 Introduction to Data Warehousing

ECT7110 Introduction to Data Warehousing ECT7110 Introduction to Data Warehousing Prof. Wai Lam ECT7110 Introduction to Data Warehousing 1 What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Made available courtesy of Springer Verlag Germany: The original publication is available at

Made available courtesy of Springer Verlag Germany: The original publication is available at CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees By: Joachim Hammer, and Lixin Fu Joachim Hammer, Lixin Fu, CubiST ++ Evaluating Ad-Hoc CUBE Queries Using Statistics Trees, Distributed and

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem

More information

Evaluation of Top-k OLAP Queries Using Aggregate R trees

Evaluation of Top-k OLAP Queries Using Aggregate R trees Evaluation of Top-k OLAP Queries Using Aggregate R trees Nikos Mamoulis 1, Spiridon Bakiras 2, and Panos Kalnis 3 1 Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, nikos@cs.hku.hk

More information

Chapter 5, Data Cube Computation

Chapter 5, Data Cube Computation CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full

More information

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability

More information

2 CONTENTS

2 CONTENTS Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a Road Map..................................... 3 5.1.1 Market Basket Analysis: A Motivating Example........................

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Proceedings of the IE 2014 International Conference  AGILE DATA MODELS AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

ECLT 5810 Introduction to Data Warehousing

ECLT 5810 Introduction to Data Warehousing ECLT 5810 Introduction to Data Warehousing Prof. Wai Lam ECLT 5810 Introduction to Data Warehousing 1 What is Data Warehouse? Provides tools for business executives Systematically organize and understand

More information

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deletion in a B-tree Disk Storage Data is stored on disk (i.e., secondary memory) in blocks. A block is

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2018/19 Unit 5 J. Gamper 1/48 Advanced Data Management Technologies Unit 5 Logical Design and DW Applications J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Decision Support Chapter 25 CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support

More information

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem. Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Chapter 4, Data Warehouse and OLAP Operations

Chapter 4, Data Warehouse and OLAP Operations CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining

More information

Oracle Essbase XOLAP and Teradata

Oracle Essbase XOLAP and Teradata Oracle Essbase XOLAP and Teradata Steve Kamyszek, Partner Integration Lab, Teradata Corporation 09.14 EB5844 ALLIANCE PARTNER Table of Contents 2 Scope 2 Overview 3 XOLAP Functional Summary 4 XOLAP in

More information

Quotient Cube: How to Summarize the Semantics of a Data Cube

Quotient Cube: How to Summarize the Semantics of a Data Cube Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

What is a Multi-way tree?

What is a Multi-way tree? B-Tree Motivation for studying Multi-way and B-trees A disk access is very expensive compared to a typical computer instruction (mechanical limitations) -One disk access is worth about 200,000 instructions.

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Dta Mining and Data Warehousing

Dta Mining and Data Warehousing CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

Advanced Databases. Lecture 1- Query Processing. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Advanced Databases. Lecture 1- Query Processing. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch Advanced Databases Lecture 1- Query Processing Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Overview Measures of Query Cost Selection Operation Sorting Join Operation Other

More information

Data Cube Technology

Data Cube Technology Data Cube Technology Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl

More information

OBIEE Performance Improvement Tips and Techniques

OBIEE Performance Improvement Tips and Techniques OBIEE Performance Improvement Tips and Techniques Vivek Jain, Manager Deloitte Speaker Bio Manager with Deloitte Consulting, Information Management (BI/DW) Skills in OBIEE, OLAP, RTD, Spatial / MapViewer,

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP Unit 7: Basics in MS Power BI for Excel M7-5: OLAP Outline: Introduction Learning Objectives Content Exercise What is an OLAP Table Operations: Drill Down Operations: Roll Up Operations: Slice Operations:

More information

Data Cubes in Dynamic Environments

Data Cubes in Dynamic Environments Data Cubes in Dynamic Environments Steven P. Geffner Mirek Riedewald Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California, Santa Barbara, CA 9 Λ Abstract The data cube,

More information

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids Chapter 5: Data Cube Technology Data Cube Technology Data Cube Computation: Basic Concepts Data Cube Computation Methods Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/

More information

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced

More information

Big Data 13. Data Warehousing

Big Data 13. Data Warehousing Ghislain Fourny Big Data 13. Data Warehousing fotoreactor / 123RF Stock Photo 2 The road to analytics Aurelio Scetta / 123RF Stock Photo 3 Another history of data management (T. Hofmann) 1970s 2000s Age

More information

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages! Professor: Pete Keleher! keleher@cs.umd.edu! } Keep sorted by some search key! } Insertion! Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow

More information

On-Line Application Processing

On-Line Application Processing On-Line Application Processing WAREHOUSING DATA CUBES DATA MINING 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming,

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

Indexing: Overview & Hashing. CS 377: Database Systems

Indexing: Overview & Hashing. CS 377: Database Systems Indexing: Overview & Hashing CS 377: Database Systems Recap: Data Storage Data items Records Memory DBMS Blocks blocks Files Different ways to organize files for better performance Disk Motivation for

More information