DW Performance Optimization (II)
|
|
- Marjorie McLaughlin
- 5 years ago
- Views:
Transcription
1 DW Performance Optimization (II)
2 Overview Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University 8 - DWML course
3 Data Cube Datacube queries compute aggregates over fact tables at different granularities CUBE BY: Product, Time, Location Aggregation Function: SUM(Sales) Cube table (i.e., results) Prod. Time Loc. Sales TV TV VCR Q Q 4Q A Canada Mexico 5 TV PC VCR sum Product Time Qtr Qtr Qtr 4Qtr sum U.S.A Canada Mexico Location A Canada 8 sum Mexico Aalborg University 8 - DWML course
4 ROLAP vs MOLAP ROLAP data cube Stored in a relational table Good for sparse data cube? scalable storage use response time MOLAP data cube Stored in special multidimensional data structures Good for dense data cube? storage use (foreign keys not needed) response time scalable ROLAP data cube d d SUM 5 4 MOLAP data cube d \ d 5 4 Aalborg University 8 - DWML course 4
5 ROLAP Technique(s) Aalborg University 8 - DWML course 5
6 ROLAP data cube computation Problem: How do we compute efficiently a data cube from a fact table? Constraints/background Fact table is huge, e.g., sales fact with terabyte Main memory size is relatively small, e.g., gigabyte The memory CANNOT fit the whole fact table Need to apply methods like external memory sorting, data partitioning We focus on I/O cost as I/O time >> CPU time External mergesort Aalborg University 8 - DWML course
7 Computing Data Cube Dimensions Product (P) Time (T) Location (L) Data cube and the lattice model This cube is a lattice with 8 nodes 8 different colored parts in the cube How do we compute this cube from the fact table? Each node is a GROUP BY query White: GROUP BY product, time, location Light green: GROUP BY product, location Gray: GROUP BY location Computation cost: 8 times of external sorting over the fact table TV PC VCR sum Product PTL PL PT TL P T L none Time Qtr Qtr Qtr 4Qtr sum U.S.A Canada Mexico sum Location Aalborg University 8 - DWML course
8 Computing Data Cube Computation sharing along a lattice path E.g., path: PTL PT P none Along a path, GROUP BY is performed for the first node Results of other nodes can be obtained at the same time Remaining paths to consider PL L TL T Cost: times of external sorting over fact table i.e., number of nodes in the middle layer size same PTL same PT same P PTL PL PT TL P T L none Base table (sorted by PTL) Prod. Time Loc. Sales PC Q Canada PC Q Canada PC Q Mexico PC Q Mexico 4 PC Q A 8 PC Q Canada 5 PC 4Q A 9 TV Q Canada Aalborg University 8 - DWML course 8
9 Sparseness Sparse relation / base table Large number of CUBE BY attributes (i.e., large lattice) Large Domain with CUBE BY attributes Base table size is a small fraction of the cross product size of attribute domains Existing methods are not efficient E.g., even a single external sorting operation is expensive, requiring multiple passes over the fact table Aalborg University 8 - DWML course 9
10 Partitioned-Cube Computing data cube efficiently for sparse data Fast Computation of Sparse Datacubes, in VLDB 99 Main memory has a fixed size and we cannot read the whole fact table into main memory Partitioning is faster than using external sorting Partitioned-Cube Partition the large relations into fragments that can fit into the memory It follows the recursive structure of datacubes A sub-datacube is obtained by fixing each possible value of a CUBE BY attribute Aalborg University 8 - DWML course
11 Partitioned-Cube (cont.) Algorithm Partition-Cube(R, {B,, B m }, A, G) R: a set of tuples {B,, B m }: CUBE BY attributes A: measure value G: aggregate function F: finest granularity datacube tuples D: remaining tuples (those with ) : if (R fits in memory) then return Memory-Cube(R, {B,, B m }, A, G) : choose an attribute B j among {B,, B m }, then scan R and partition on B j into {R,, R n } : for (i = to n) (F i, D i ) = Partition-Cube(R i, {B,, B m }, A, G) : let F = union of F i s 4: let (F, D ) = Partition-Cube(F, {B,, B j-, B j+, B m }, A, G) 5: let D = union of F, D and D i s : return (F, D) Note: n min{ m, # slots in memory } Country Aalborg University 8 - DWML course Relation R B B A Year G=SUM Sales 5 8 8
12 Outline of Steps for the Example B B B B Attributes: Country (B ), Year (B ) Choose attribute B, partition the base table on B Compute the cube B B Compute other cubes B * (excluding B B ), i.e., cube B Consider the cube B B, project out the attribute B Remaining attribute(s): B Compute the cube B Compute the cube * (excluding B ), i.e., cube None none Relation R B B A Country Year Sales Aalborg University 8 - DWML course
13 Partitioned-Cube (cont.) STEP #. Memory-Cube. Applicable when the relation fits in main memory Read the input relation into memory Compute the datacube (in memory only) by using the computation sharing method on slide #8 Aalborg University 8 - DWML course
14 Partitioned-Cube (cont.) STEP # select an attribute (say, Country) partition the large relation into fragments that can be fit into the memory (assuming the memory can hold 4 tuples in this example) R Country Year Sales 5 R Country Year Sales Country Year Sales 8 R 8 Aalborg University 8 - DWML course 4
15 Partitioned-Cube (cont.) STEP #.: Process the tuples in R Now R fits in main memory We execute step # to compute sub-datacubes Compute F : GROUP BY Country, Year Compute D : GROUP BY any other combination with Country E.g., GROUP BY Country GROUP BY Country, Year R F Country Year Sales 5 Country Year Sales GROUP BY Country Country Year Sales 9 Aalborg University 8 - DWML course 5 D
16 Partitioned-Cube (cont.) STEP #.: Process the tuples in R In the same way, we compute F and D GROUP BY Country, Year R F Country Year Sales 8 Country Year Sales 5 GROUP BY Country Country Year Sales 8 D Aalborg University 8 - DWML course
17 Partitioned-Cube (cont.) Step #: F = F F Step #4: set Country to in F (i.e., Country not in GROUP BY) call Partition-Cube on F, to obtain F and D F Country Year Sales F 8 Country Year Sales 5 5 Country Year Sales 8 F 5 5 GROUP BY Year Country Year Sales 4 Aalborg University 8 - DWML course F D GROUP BY none Country Year Sales 5
18 Partitioned-Cube (cont.) Step : D = (D F ) i= D i Step : return F, D Country Year Sales 4 Country Year Sales F 8 F 5 D D Country Year Sales 9 5 D Country Year Sales 8 D Country Year Sales 5 Aalborg University 8 - DWML course 8
19 Partitioned-Cube (cont.) Recursively execute STEP # if there are more than attributes Not in this example but in the next exercise Aalborg University 8 - DWML course 9
20 Partitioned-Cube Exercise Run the Partitioned-Cube Alg. on this example Attributes: B, B, B Verify your final result by using the fact table The following outline is given to you Choose attribute B, partition the base table on B Relation R B B B A Prod. Loc. Year Sales PC 8 Compute the cube B B B Compute other cubes B * (excluding B B B ), i.e., cubes B B, B B, B TV PC 4 Consider the cube B B B, project out the attribute B TV Remaining attribute(s): B B TV Choose attribute B, partition the cube B B on B PC Compute the cube B B Compute other cubes B * (excluding B B ), i.e., cube B PC Consider the cube B B, project out the attribute B Remaining attribute(s): B Compute the cube B Compute the cube * (excluding B ), i.e., cube None TV 5 G=SUM Aalborg University 8 - DWML course
21 MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University 8 - DWML course
22 Range Sum Query Range Sum Query Given a MOLAP data cube Specify a range for each (numeric) dimension Compute the SUM of these values Example: Measure: salary Numeric attributes: age, time Find the revenue from customers with an age from to 5, in years from to 5 Brute-force approach Accumulate the SUM value while visiting relevant cells What happens if the query covers many cells in the cube? Better solution Year Range Queries in OLAP Data Cubes, in ACM SIGMOD Age MOLAP data cube Aalborg University 8 - DWML course
23 Prefix Sum Array Consider a D array as a data cube for the moment Age as attribute v, Year as attribute v (values starting from ) Construct a prefix sum array P P[i,j] = Σ v=..i Σ v=..j A[v,v ] Fast computation of P by visiting them in lexicographic order and reusing previous values A range sum query is of the form RangeSum([l,h ], [l,h ]) Σ v=l..h Σ v=l..h A[v,v ] Using prefix sum array to answer query fast RangeSum([,], [,]) Easy. That s 4. RangeSum([,], [,]) Wait Index v \ v Index v \ v Cube A Prefix-Sum Array P (of A) Aalborg University 8 - DWML course
24 Query Processing RangeSum([,], [,]) can be rewritten as the sum of + RangeSum([,], [,]) Index v \ v Cube A 4 RangeSum([,], [,]) 5 RangeSum([,], [,]) 8 + RangeSum([,], [,]) 4 Using the prefix-sum array P, we have +4 + = Advantage Prefix-Sum Array P (of A) We only need to fetch 4 values, regardless of the range [l,h ], [l,h ] Index 4 Can we discard the array A? Why? Any entry A[i,j] is equivalent to RangeSum([i,i], [j,j]) v \ v Aalborg University 8 - DWML course 4
25 Update / Maintenance Cube A Insert a tuple (,) with measure value δ Index v \ v 4 Increment the count of A[,] by δ 5 Increment the count of P[i,j] by δ, for any i and j P[,], P[4,], P[,], P[4,] 4 8 If we insert the tuple (,), then we need to increment the whole prefix-array! Updates in data warehouse are often done in batch Index Prefix-Sum Array P (of A) 4 P can be updated in low amortized cost v \ v Aalborg University 8 - DWML course 5
26 Extension to Multi-dimensional Case Suppose that there are d dimensions and the cube A has N entries RangeSum([l,h ], [l,h ],, [l d,h d ]) Can be computed by a straightforward method, using Π j=..d (h j l j +) cell values Prefix-sum array P (of A) has N entries also Pre-computation time of P: O(dN) No need to keep A afterwards By using the prefix-sum array P, we only need to access d elements of P to compute RangeSum Aalborg University 8 - DWML course
27 Blocked Prefix Sum Array Tradeoff between array space and query processing time But we now need to keep the original array A Blocked Prefix Sum Array Define b as the length of a group of cells Only keep each entry P[i,j] where (i+) mod b = or i is the last index (j+) mod b = or j is the last index Processing a range sum query RangeSum([,], [,]) = 4 RangeSum([,], [,]) Decompose into internal region and boundary region Internal region can be processed by the above techniques Border region will be discussed in the next slide Index v \ v Aalborg University 8 - DWML course Index v \ v 8 Cube A Blocked Prefix-Sum Array P, b=
28 Blocked Prefix Sum Array Decompose query range into regions Internal region (dark gray) Compute the sum using blocked prefix sum array P Border region (light gray) Compute the sum by accessing cells in the original array A For each border cell, we choose the cheaper way to compute its sum Visit the cells (of A) within the range, or Visit the complement cells (of A) a block How about the update cost of Blocked Prefix Sum Array, compared to Prefix Sum Array? Aalborg University 8 - DWML course 8
29 Range Max Query A range max query is of the form RangeMax([l,h ], [l,h ]) max v=l..h max v=l..h A[v,v ] Can we build a prefix max array? Consider the queries RangeMax([,], [,]) RangeMax([,], [,]) The prefix property does not hold for the range max query If the global maximum value is at (,), then it overwrites other maximum value in any local region Cube A Index v \ v Prefix-Max Array??? Index v \ v Aalborg University 8 - DWML course 9
30 Multiway Augmented Tree Multiway tree structure Balanced tree Each node stores its associated region and the maximum value in that region Branch-and-bound search RangeMax([,], [,]) Visit root node Consider its subtrees * and * Visit the subtree * first Check leaf nodes and Obtain the maximum value 5 Track back to the branch * No need to visit its subtree Return 5 as the result root v v v Index v \ v 9 8 Cube A ** * (9) * (5) * () * () 5 4 Aalborg University 8 - DWML course
31 Update Multiway Augmented Tree Cube A Insertion Insert a tuple (,) with measure value δ A[,] = max(δ, A[,]) Compare the new value with the old value, and update the tree as follows Deletion Delete a tuple (,) with measure δ If δ equals to A[,], then we need to search tuples in that cell and update A[,] Compare the new value with the old value, and update the tree as follows Tree update If the new value is different from the old value, then we locate the leaf node and propagate changes upwards the tree Efficient update when compared to prefix cube root v Index v \ v ** * (9) * (5) * () * () v v Aalborg University 8 - DWML course
32 Variant of the Tree Cube A Index v \ v Variant of the Tree Like the previous multiway tree Still a balanced tree Each node is associated with a region and the maximum value in that region The only difference: The way that the branches are divided [,][,] root The previous solution for RangeMax queries is still applicable on this tree [,][,] (9) [,][,] () [,][,] (8) [,][,] () Aalborg University 8 - DWML course
33 Using the Tree for Range Sum Query Cube A Can we apply the multiway augmented tree for answering range sum query? YES, if for each node, we store the sum of values in its region But, it is not as efficient as the prefix-sum array Processing the query RangeSum([,], [,]) Visit only the relevant tree nodes and accumulate the sum result Question: Do we need to visit the subtree of [,][,] [,][,] [,][,] [,][,] Index v \ v [,][,] (4) [,][,] (5) [,][,] () root [,][,] () [,][,] (9) Aalborg University 8 - DWML course
34 Summary Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University 8 - DWML course 4
Efficient Computation of Data Cubes. Network Database Lab
Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database
More informationData Warehousing and Data Mining
Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationOverview. DW Performance Optimization. Aggregates. Aggregate Use Example
Overview DW Performance Optimization Choosing aggregates Maintaining views Bitmapped indices Other optimization issues Original slides were written by Torben Bach Pedersen Aalborg University 07 - DWML
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationProcessing of Very Large Data
Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first
More informationAdvanced Data Management Technologies
ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:
More informationData Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders
Data Warehousing Conclusion Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders Motivation for the Course Database = a piece of software to handle data: Store, maintain, and query Most ideal system
More informationData Warehousing ETL. Esteban Zimányi Slides by Toon Calders
Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load
More informationDatabase design View Access patterns Need for separate data warehouse:- A multidimensional data model:-
UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to
More informationA Multi-Dimensional Data Model
A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationDecision Support Systems aka Analytical Systems
Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationOverview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?
Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More information2 CONTENTS
Contents 4 Data Cube Computation and Data Generalization 3 4.1 Efficient Methods for Data Cube Computation............................. 3 4.1.1 A Road Map for Materialization of Different Kinds of Cubes.................
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationAccess Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value
Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of
More informationData Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationComputing Complex Iceberg Cubes by Multiway Aggregation and Bounding
Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,
More informationDatabase System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationData Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa
ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationIndexing and Hashing
C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter
More informationCube-Lifecycle Management and Applications
Cube-Lifecycle Management and Applications Konstantinos Morfonios National and Kapodistrian University of Athens, Department of Informatics and Telecommunications, University Campus, 15784 Athens, Greece
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationInformation Sciences
Information Sciences 181 (2011) 2626 2655 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Multidimensional cyclic graph approach: Representing
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationNesnelerin İnternetinde Veri Analizi
Bölüm 4. Frequent Patterns in Data Streams w3.gazi.edu.tr/~suatozdemir What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationOn-Line Analytical Processing (OLAP) Traditional OLTP
On-Line Analytical Processing (OLAP) CSE 6331 / CSE 6362 Data Mining Fall 1999 Diane J. Cook Traditional OLTP DBMS used for on-line transaction processing (OLTP) order entry: pull up order xx-yy-zz and
More informationDeccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus
Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple
More informationCompressed Aggregations for mobile OLAP Dissemination
INTERNATIONAL WORKSHOP ON MOBILE INFORMATION SYSTEMS (WMIS 2007) Compressed Aggregations for mobile OLAP Dissemination Ilias Michalarias Arkadiy Omelchenko {ilmich,omelark}@wiwiss.fu-berlin.de 1/25 Overview
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationMultidimensional Indexes [14]
CMSC 661, Principles of Database Systems Multidimensional Indexes [14] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Motivation Examined indexes when search keys are in 1-D space Many interesting
More informationExam Datawarehousing INFOH419 July 2013
Exam Datawarehousing INFOH419 July 2013 Lecturer: Toon Calders Student name:... The exam is open book, so all books and notes can be used. The use of a basic calculator is allowed. The use of a laptop
More informationDatabase System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationChapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives
Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: How business intelligence is a comprehensive framework to support business decision making How operational
More informationPhysical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.
Physical Disk Structure Physical Data Organization and Indexing Chapter 11 1 4 Access Path Refers to the algorithm + data structure (e.g., an index) used for retrieving and storing data in a table The
More informationCS 1655 / Spring 2013! Secure Data Management and Web Applications
CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationSql Fact Constellation Schema In Data Warehouse With Example
Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL
More informationECT7110 Introduction to Data Warehousing
ECT7110 Introduction to Data Warehousing Prof. Wai Lam ECT7110 Introduction to Data Warehousing 1 What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationFig 1.2: Relationship between DW, ODS and OLTP Systems
1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions
More informationMade available courtesy of Springer Verlag Germany: The original publication is available at
CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees By: Joachim Hammer, and Lixin Fu Joachim Hammer, Lixin Fu, CubiST ++ Evaluating Ad-Hoc CUBE Queries Using Statistics Trees, Distributed and
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem
More informationEvaluation of Top-k OLAP Queries Using Aggregate R trees
Evaluation of Top-k OLAP Queries Using Aggregate R trees Nikos Mamoulis 1, Spiridon Bakiras 2, and Panos Kalnis 3 1 Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, nikos@cs.hku.hk
More informationChapter 5, Data Cube Computation
CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full
More informationScalable Algorithmic Techniques Decompositions & Mapping. Alexandre David
Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability
More information2 CONTENTS
Contents 5 Mining Frequent Patterns, Associations, and Correlations 3 5.1 Basic Concepts and a Road Map..................................... 3 5.1.1 Market Basket Analysis: A Motivating Example........................
More information1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.
Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt
More informationProceedings of the IE 2014 International Conference AGILE DATA MODELS
AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of
More informationOLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube
OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach
More informationECLT 5810 Introduction to Data Warehousing
ECLT 5810 Introduction to Data Warehousing Prof. Wai Lam ECLT 5810 Introduction to Data Warehousing 1 What is Data Warehouse? Provides tools for business executives Systematically organize and understand
More informationB-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree
B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deletion in a B-tree Disk Storage Data is stored on disk (i.e., secondary memory) in blocks. A block is
More informationCHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)
CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination
More informationAdvanced Data Management Technologies
ADMT 2018/19 Unit 5 J. Gamper 1/48 Advanced Data Management Technologies Unit 5 Logical Design and DW Applications J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:
More informationDecision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1
Decision Support Chapter 25 CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support
More informationData mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationChapter 4, Data Warehouse and OLAP Operations
CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining
More informationOracle Essbase XOLAP and Teradata
Oracle Essbase XOLAP and Teradata Steve Kamyszek, Partner Integration Lab, Teradata Corporation 09.14 EB5844 ALLIANCE PARTNER Table of Contents 2 Scope 2 Overview 3 XOLAP Functional Summary 4 XOLAP in
More informationQuotient Cube: How to Summarize the Semantics of a Data Cube
Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)
More informationData Warehousing & Data Mining
Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,
More informationWhat is a Multi-way tree?
B-Tree Motivation for studying Multi-way and B-trees A disk access is very expensive compared to a typical computer instruction (mechanical limitations) -One disk access is worth about 200,000 instructions.
More informationEvaluating XPath Queries
Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But
More informationDta Mining and Data Warehousing
CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:
More informationChapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"
Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!
More informationAdvanced Databases. Lecture 1- Query Processing. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Advanced Databases Lecture 1- Query Processing Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Overview Measures of Query Cost Selection Operation Sorting Join Operation Other
More informationData Cube Technology
Data Cube Technology Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl
More informationOBIEE Performance Improvement Tips and Techniques
OBIEE Performance Improvement Tips and Techniques Vivek Jain, Manager Deloitte Speaker Bio Manager with Deloitte Consulting, Information Management (BI/DW) Skills in OBIEE, OLAP, RTD, Spatial / MapViewer,
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationUnit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP
Unit 7: Basics in MS Power BI for Excel M7-5: OLAP Outline: Introduction Learning Objectives Content Exercise What is an OLAP Table Operations: Drill Down Operations: Roll Up Operations: Slice Operations:
More informationData Cubes in Dynamic Environments
Data Cubes in Dynamic Environments Steven P. Geffner Mirek Riedewald Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California, Santa Barbara, CA 9 Λ Abstract The data cube,
More informationData Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids
Chapter 5: Data Cube Technology Data Cube Technology Data Cube Computation: Basic Concepts Data Cube Computation Methods Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/
More informationIDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu
IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced
More informationBig Data 13. Data Warehousing
Ghislain Fourny Big Data 13. Data Warehousing fotoreactor / 123RF Stock Photo 2 The road to analytics Aurelio Scetta / 123RF Stock Photo 3 Another history of data management (T. Hofmann) 1970s 2000s Age
More informationFind the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!
Professor: Pete Keleher! keleher@cs.umd.edu! } Keep sorted by some search key! } Insertion! Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow
More informationOn-Line Application Processing
On-Line Application Processing WAREHOUSING DATA CUBES DATA MINING 1 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming,
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationIndexing: Overview & Hashing. CS 377: Database Systems
Indexing: Overview & Hashing CS 377: Database Systems Recap: Data Storage Data items Records Memory DBMS Blocks blocks Files Different ways to organize files for better performance Disk Motivation for
More information