Computing Data Cubes Using Massively Parallel Processors

Size: px
Start display at page:

Download "Computing Data Cubes Using Massively Parallel Processors"

Transcription

1 Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li Department of Information Systems and Computer Science National University of Singapore Abstract To better support decision making, it was proposed to extend SQL to include data cube operations. Computation of data cube requires computing a number of interrelated group-bys, which is rather expensive operation when databases are large. In this paper, we propose to couple a relational database management system with massively parallel processors (MPP) to facilitate on-line analytic processing. Extended SQL queries involving complex data analysis such as data cube computation are decomposed. Data retrieved from the database are pipelined to the MPP machine where data cubes are computed in parallel. The system architecture and issues related to parallel computation of data cubes are described. A brute force parallel data cube processing algorithm was implemented and the results of some preliminary experiments are presented. 1. Introduction Aggregation is widely used in on-line analytical processing. In relational database systems, SQL supports a set of aggregate functions (SUM, COUNT, AVG, MAX, and MIN). Together with another operator, GROUP BY, users can retrieve not only the data physically stored in a database, but also summaries of such data. For example, with a relation SALES (date, product, customer, amount), the total sales for each product can be easily obtained by issuing a single SQL query: SELECT product, SUM (amount) FROM SALES GROUP BY product The semantics of the GROUP BY operator is to partition a relation (or sub-relation) into disjoint sets based on the values of the group-by attributes, the attributes specified in the GROUP BY clause. Aggregate functions are then applied to each of such sets. Although relational database systems and SQL language have been widely used in business applications for the past decade, certain common forms of data analysis, such as histograms, roll-up totals and sub-totals for drilldowns and cross tabulation, are difficult with these SQL aggregation constructs [GCB+97]. A new operator, CUBE BY has been proposed recently to overcome those problems. The CUBE operator is the n-dimensional generalization of the group-by operator. It computes group-bys corresponding to all possible combinations of attributes in the CUBE BY clause. For example, the query SELECT date, product, customer, SUM (amount) FROM SALES CUBE BY date, product, customer will produce the SUM of amount of all tuples in the database and the results of 7 group-bys, i.e., (date, product, customer), (date, product), (date, customer), (product, customer), (date), (product) and (customer). To make these group-buy results union compatible, empty attributes in group-bys are denoted by ALL. The CUBE BY operator can be simply implemented as a union of a series of group-bys. Obviously, it is a quite expensive approach, especially when the number of cube-by attributes and the database are large. In this paper, we report our study on using massively parallel processors to compute data cubes. It is the first phase of a project on parallel on-line analytical processing. It is our belief that, designing and implementing a fully fledged parallel on-line analytical processing system is still not an easy decision to be made for most organizations. However, it is a practical solution to couple a commercial relational DBMS with massively parallel processors (or a cluster of P1-T-1

2 general-purpose processors) to form a system. In such a system, the DBMS provides efficient storage and retrieval of large volume of data and the massively parallel processors provides the computing power required by analytical processing. As the result, queries involving complex data analysis on large volume data can be answered with reasonable response time. The remainder of the paper is organized as follows. Section 2 describes a system architecture that uses massively parallel processors as an OLAP engine of a conventional DBMS. Issues related to parallel implementation of the CUBE BY operator are discussed in Section 3. Section 4 presents some preliminary experimental results. Section 5 concludes the paper. Query Analyst Cost Estimator Query Dispatcher Plan Generator Execution Manager Result Synthesizer DBMS MPP Database Figure 2.1: Coupling MPP with a DBMS to server as an OLAP engine 2. Using MPP as an OLAP engine The reference architecture of a system that couples MPP with a DBMS as an OLAP engine is shown in Figure 1. The shadowed portion is a piece of middle ware to be designed and implemented in our project. A user query is first analyzed by the Query Analyst to see whether the MPP should be invoked. If the MPP should be invoked, the query will be decomposed into sub-queries. The Query Dispatcher will send the sub-queries to DBMS and/or the Plan Generator, which is responsible for generating parallel plan to be executed by the MPP. The DBMS will retrieve the required data from the database and send them to the MPP for further processing. P1-T-2 The execution of the parallel plan is controlled and monitored by the Execution Manager. The Result Synthesizer will assembly the final results of the query and delivery them to the user. To assist the Query Analyst determines whether the MPP s involvement is processing a particular query is beneficial. A module, Cost Estimator, is included in the system that estimates the costs of different query plans. The proposed architecture has a number of features. First, the coupling between the DBMS and the MPP is quite loose. It can be fully controlled by providing options in the software so that the users can cut-off the coupling easily by turning off the option. Without coupling, user queries are directed to the DBMS directly as it is without overhead.

3 Second, both the DBMS and MPP are highly autonomous. No modification is required to the original hardware and software at the both sides. It is obvious that, the Query Analyst and the Parallel Plan Generator are two key components of the system. While the Query Analyst needs to rewrite a user query once it is determined that parallel processing is beneficial, the Plan Generator is responsible for generating query execution plans for parallel execution. A lot of research work has been reported in the field of parallel query processing and optimization [LOT94]. Some of the results can be readily applied. Therefore, we concentrate ourselves on issues which have been less addressed, especially those are particularly important to online analytical processing such as the recently proposed CUBE BY operation. 3. Computing data cubes in parallel There have been a large amount work has been done on parallel query processing in relational database systems [DeGr92, LOT94]. In the recent surge of research interests on OLAP and multidimensional databases [DEBu95], efficient implementation of data cube operation has also attracted researchers attention [AAD+96, DANR96, HaRU96]. However, relatively less work has been devoted to parallel processing of aggregates [ShNa94]. In this section, we discuss some interesting properties and issues related to data cube computation using parallel processors. 3.1 Data cube and cuboids We adopt the notations used in [DANR96]. Let R be a relation with k+1 attributes X = {A 1, A 2,,A k, V}. A cuboid on j attributes S = {A i1, A i2,, A ij} is defined as a group-by on attributes A i1, A i2,, A ij using aggregate function F(.) applied on attribute V. This cuboid can be represented as a k+1 attribute relation by using the special value ALL for the remaining k-j attributes [GBLP96]. The CUBE on attribute set X is the union of cuboids on all subsets of attributes of X. The cuboid on all attributes in X is called the base cuboid. To compute the CUBE, we need to compute all the cuboids that together form the CUBE. Among those cuboids, the base cuboid has to be P1-T-3 computed from the original relation. If the aggregate function is distributive, 1 the other cuboids can be computed from the cuboids. The five aggregate functions supported by SQL are in fact all distributive so that a cuboid on attributes set S i can be computed from any cuboid on attribute sets S j if S i Sj. Figure 3.1 shows a lattice of cuboids for a relation with 4+1 attributes. The nodes in the lattice are cuboids to be computed and the edges indicate that the lower level cuboids can be computed from the upper level ones. The numbers in the parentheses are sample sizes of the cuboids in terms of number of tuples. (A,B) [1,000] (A,B,C) [10,000] (A,C) [500] (A) [50] (A,B,D) [5,000] {A,D) [250] (B) [20] R = 500,000 (A,B,C,D) [49,998] (C) [10] ( ) [1] (A,C,D) [2,500] (B,C) [200] (B,C,D) [1,000] (B,D) [100] (D) [5] Figure 2: Cuboids for a relation with 4+1 attributes. (C,D) [50] There are two basic approaches to compute groupby: sorting and hashing [GRAE93]. Since hashbased approaches are usually suitable for parallel processing, we will only consider the hash-based methods in this study. The basic hash-based approach for computing a cuboid is rather straightforward. A hash table whose entries are distinct values of the group by attributes and the aggregation value is built. For each source tuple, a hash function is applied to the group-by attributes. If the values have not been inserted in the hash table, a new entry is created. The aggregate function is applied and the result is used to update the entry. 1 Aggregate function F () is distributive, if there is a function G () such that F ({X i,j }) = G ({ F ({ X i,j i = 1,, I}) j = 1,, J}).

4 The above description assumes that the hash table resides in memory. If the size of available memory is smaller than required, a cuboid can be computed in a number of iterations. For each iteration, a sub-cuboid whose hash table fits in memory is computed. Computing a cube requires computing a number of interrelated cuboids. Agrawal et. al. summarized the possible optimization techniques that can be used [AAD+96]. They are Smallest-parent: computing a cuboid from the smallest cuboid previously computed cuboid. In Figure 3.1, edges in solid line indicate a cuboid and its smallest parent from which it can be computed. Cache-results: caching results of a cuboid from which other cuboids are computed to reduce disk I/Os. Amortize-scans: computing as many as possible cuboids at the same time to amortize disk reads. Share-sorts: sharing sorting costs cross multiple cuboids when sort-based method is used. Share-partitions: sharing the partitioning cost cross multiple cuboids when hash-based algorithms are used. A number of algorithms that incorporates some of the above techniques were also proposed and studied. 3.2 Parallel evaluation of data cubes In the MPP environment, it is easy to have a much larger aggregate memory than an uniprocessor machine, which makes it attractive using MPP machines to compute data cubes. However, for most OLAP applications, the data to be aggregated is usually too large even for MPP machines. Especially, those processors are not dedicated database machines, and memory available for database processes will still be limited. Careful memory management for multiple cuboid evaluation is therefore still a key issue in parallel evaluation of data cubes. Furthermore, when more than one cuboid is to be computed using more than one processor, cuboid allocation is another issue. Data partitioning strategy. The standard way to deal with limited memory in hash-based algorithms is to partition the data. When multiple interrelated cuboids are to be evaluated, the data can be partitioned based on either attributes or cuboids. For attribute-based partitioning, when data is partitioned on some attribute, say A, then all cuboids that contain A are partitioned on A and computed at the same time. In other words, a number of partial cuboids are computed concurrently in an iteration. For cuboid-based partitioning, data are partitioned if a cuboid is too large. More than one cuboid is evaluated in the same iteration only when the available memory can accommodate more than one cuboid. Cuboid allocation strategy. When more than one processor is available to evaluate a partition, which contains more than one cuboid or partitions from more than one cuboid, cuboid allocation strategy distributes the computation among the available nodes. There are two possible strategies. Concurrent allocation strategy allows all cuboids or cuboid partitions to be evaluated concurrently. That is, a minimum number of nodes is allocated to evaluate a cuboid (or cuboid partition) based on the size of the cuboid and the size of available memory. Sequential allocation strategy is to evaluate the cuboids sequentially and evenly distribute the computation of each cuboid (cuboid partition) among all available nodes. The advantage of concurrent allocation is that, each node maintains less number of hash tables. As such less number of hash values is computed at each node. For sequential allocation, a node may require evaluate all cuboids for the iteration. More hash tables and more hash operations are required. However, some optimization techniques, such as caching results are easy to be incorporated into the processing algorithm. 3.3 A brute force evaluation algorithm To gain some hands-on experience about parallel evaluation of data cubes. a brute force algorithm was implemented: One node is allocated as the execution coordinator. It reads data, both the original relation and the computed cuboids and broadcast the tuples over the network. P1-T-4

5 A greedy algorithm is used to determine the cuboids or portions of cuboids to be evaluated during each iteration. Cuboidbased partitioning strategy is used. That is, given the aggregate memory of participating processors, the algorithm find the next set cuboids based on estimates of the sizes of the cuboids. All cuboids are sorted on the size of their parents to form a list. Each iteration will choose the next set of cuboids that can fit in memory to compute. Since all cuboids are computed from the same set of source tuples during the iteration, a cuboid may not be computed from its direct parent. The possible increase of CPU cost is estimated to determine whether it is beneficial to include a cuboid into the iteration. During each iteration, the cuboids to be evaluated are allocated to the available processors. Two cuboids allocation strategies were implemented. The minimum node strategy allocates minimum number of nodes to compute one cuboid based on the sizes of cuboids and memory. The evendistribution strategy distributes a cuboid over all available nodes for parallel execution. Execution coordinator broadcasts the tuples over the network. Participating nodes build hash tables for cuboid partitions to be evaluated locally. After receiving the input tuples, it applies hash functions to determine whether the tuple should be evaluated. If so, the aggregate function is applied and the value is used to update the hash table. No data transfer among the participating nodes. When a node completes its computation, a message is sent to the coordinator. The completed cuboid is ready for evaluating other cuboids. We call the algorithm a brute force algorithm, as it is a straightforward implementation for parallel evaluation of data cubes with little optimization incorporated. This because our main objective of this first implementation is not to pursuit the high performance of the algorithm but to get familiar with the facilities provided by AP3000 and the properties of data cube computation. 4. A preliminary performance study The brute force algorithm described in the previous section was implemented in C. In this section, we report results of some initial experiments conducted. The experiments were conducted using Fujitsu AP3000 with 32 nodes. The relation used for computing the data cube contains 500,000 tuples of 5 attributes, A,B,C,D and V, where A,B,C, and D are group-by attributes. Aggregate function SUM is applied to attribute V. The cardinalities of four attributes are 50, 20, 10, and 5 respectively. The attribute values are uniformly distributed within their respective ranges. The sizes of the cuboids are shown in Table 4.1. To simulate memory constraints, the test program only uses a buffer area, which is equivalent to certain number of tuples for the cube computation. For each experiment, the CPU time, message time, i.e., time used to receive broadcast data, and output time, i.e., time for writing result tuples to the disk were recorded. Because the memory allocated is smaller than the size of the total size required, the computation requires a number of iterations. For each iteration, the longest time among all participating nodes is taken as the processing time for the iteration. The total processing time is the sum of the processing time of all iterations. That is, processing time T o (o: {CPU, message, output}) T where k is the number of iterations and max(t oi ) is the maximum of the processing time among all the participating nodes used to complete iteration i. We reported here the results of three sets of experiments that investigated the effects of number of nodes, scheduling strategies, and memory size on the cube processing time Experiment One o k = max( Toi) i= 1 First set of experiments studies the processing time and the number of nodes used for computing the cube. The results are shown in Figure 4.1. P1-T-5

6 processing time (seconds) )LJXUH7LPHIRUFRPSXWLQJWKHVDPSOHFXEH. From Figure 4.1, we can see that the processing time reduces dramatically when we increase the number of participating nodes from 1 to 4. With 5 or more nodes, the speed-up is not that significant. The major reason is that, processing time depends largely on the number of iterations required to compute all the cuboids. Table 4.1 listed the cuboids or their partitions processed when different nodes were used. With 5 or more nodes, the hash table of the largest cuboids, the base cuboid can be held in memory so that the computation can be completed in two iterations. The number of input tuples processed is the same. Only benefit with more nodes in that the portion of cuboids to be computed by each node becomes small. However, this saving is marginal. Table 4.1: Iterations and cuboids computed Nodes Iterations Cuboids computed 1 7 {ABCD 0 } {ABCD 1 } {ABCD 2 } {ABCD 3 }{ABCD 4 }{ABCBCD} {ABD, } 2 4 {ABCD 0 } {ABCD 1 } {ABCD 2 } {ABC, } 3 3 {ABCD 0 } {ABCD 1 } {ABC, } 4 3 {ABCD 0 } {ABCD 1 } {ABC,,} 5 2 (ABCD}{ABC, } 6 2 (ABCD}{ABC, } 7 2 (ABCD}{ABC, } 8 2 (ABCD}{ABC, } 4.2. Experiment Two CPU Message Output number of nodes In the first experiment, a cuboid was evaluated using as many nodes as possible. That is, the even distribution allocation strategy was used. In the second experiment, when more than one cuboid was evaluated in the same iteration, each cuboid is assigned to minimum number of nodes based on the size of the cuboid and the size of the aggregate memory of those nodes. The result is shown in Figure 4.2. processing time (seconds) )LJXUH7LPHIRUFRPSXWLQJWKHVDPSOHFXEH Comparing Figure 4.2 with 4.1, we can see that the CPU time using the second strategy is about 70-75% of the first one when the number of nodes increases to more than four. This can be explained as follows. If a node is responsible for computing n cuboids, n hash operations are required for each of the input tuple. With 5 8 nodes, the second iteration evaluates 15 cuboids and no nodes need to evaluate more than 2 cuboids. With the first strategy, each node may be required to evaluate more than 10 cuboids. 4.3 Memory size and processing time CPU MESSAGE OUTPUT number of nodes Since the size of available memory determines the number of iterations, which has dramatic effects on the total processing time. In the third experiment, we fixed the number of nodes that participate the cube computation to two and varied the available size of memory at each node. The results are shown in Figure 4.3. The curves in Figure 4.3 shows the same trend as Figure 4.1 and 2. However, the speed-up when the memory size increases is not as much as we in the previous cases. For example, the CPU time for 2 nodes with memory of 44,000 tuples each is about P1-T-6

7 processing time (seconds) Figure 4.3: Processing time vs. memory size. as twice as much if 8 nodes with memory of 11,000 tuple each are used. In other words, in addition to the effect of larger aggregate memory size, parallel processing does bring more benefit to cube computation. 4. Discussions aggregate memory (*11,000) CPU Message Output To study the feasibility and benefit of using massively parallel processors to compute data cubes, a brute force algorithm was implemented. A preliminary study indicate that, even without much optimization, massively parallel processors can indeed speed-up the computation of data cubes, partly because of the large aggregate memory. To develop high performance parallel data cube processing algorithms, a number of issues need to be carefully considered. Among them are data partitioning methods and cuboid allocation strategies. To design and implement an algorithm that incorporated various optimization techniques is our immediate task. Conf., Mumbai, India, 1996, [BCL93] K.P. Brown, M.J. Carey, and M. Livny, Managing memory to meet multiclass workload response time goal. In Proc. Of 19th VLDB Conf., Brighton, England, September 1993, [DANR96] P.M. Deshpande, et. al., Computation of multidimensional aggregates. Technical Report-1314, Computer Sciences Department, University of [DeGr92] Wisconsin-Madison, D.J. DeWitt and J. Gray, Parallel Database Systems: The Future of High Performance Database Systems, CACM, June [GCB+97] J.Gray et. al., Data Cube: A relational aggregate operator generalizing groupby, cross-tab, and sub-totals, Data Mining and Knowledge Discovery, Vol 1. No. 1, 1997, [GRAE93] G. Grafe, Query evaluation techniques for large databases. ACM Computing Surveys, Vol. 25, No. 2, 1993, [LOT94] H. Lu, B.C. Ooi, and K-L. Tan, Parallel query processing in relational database systems, IEEE Computer Press, [ShNa94] A. Shatdal and J.F. Naughton, Processing aggregates in parallel database systems, Technical Report- 1233, Computer Sciences Department, University of Wisconsin-Madison, In our experiments, data were transferred using broadcast method. Also, an execution coordinator is responsible for reading and transmitting the data. Comparing the different data transfer methods provided by the system is our another task. References [AAD+96] S. Agrawal, et. al., On the computation of multidimensional aggregates, In Proc. Of 22 nd VLDB P1-T-7

Efficient Computation of Data Cubes. Network Database Lab

Efficient Computation of Data Cubes. Network Database Lab Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:

More information

Lecture 2 Data Cube Basics

Lecture 2 Data Cube Basics CompSci 590.6 Understanding Data: Theory and Applica>ons Lecture 2 Data Cube Basics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Papers 1. Gray- Chaudhuri- Bosworth- Layman- Reichart- Venkatrao-

More information

Chapter 5, Data Cube Computation

Chapter 5, Data Cube Computation CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full

More information

2 CONTENTS

2 CONTENTS Contents 4 Data Cube Computation and Data Generalization 3 4.1 Efficient Methods for Data Cube Computation............................. 3 4.1.1 A Road Map for Materialization of Different Kinds of Cubes.................

More information

Multi-Cube Computation

Multi-Cube Computation Multi-Cube Computation Jeffrey Xu Yu Department of Sys. Eng. and Eng. Management The Chinese University of Hong Kong Hong Kong, China yu@se.cuhk.edu.hk Hongjun Lu Department of Computer Science Hong Kong

More information

Sameet Agarwal Rakesh Agrawal Prasad M. Deshpande Ashish Gupta. Jerey F. Naughton Raghu Ramakrishnan Sunita Sarawagi

Sameet Agarwal Rakesh Agrawal Prasad M. Deshpande Ashish Gupta. Jerey F. Naughton Raghu Ramakrishnan Sunita Sarawagi On the Computation of Multidimensional Aggregates Sameet Agarwal Rakesh Agrawal Prasad M. Deshpande Ashish Gupta Jerey F. Naughton Raghu Ramakrishnan Sunita Sarawagi Abstract At the heart of all OLAP or

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining

PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis and Data Mining Journal of Parallel and Distributed Computing 61, 285321 (2001) doi:10.1006jpdc.2000.1691, available online at http:www.idealibrary.com on PARSIMONY: An Infrastructure for Parallel Multidimensional Analysis

More information

Using Tiling to Scale Parallel Data Cube Construction

Using Tiling to Scale Parallel Data Cube Construction Using Tiling to Scale Parallel Data Cube Construction Ruoming in Karthik Vaidyanathan Ge Yang Gagan Agrawal Department of Computer Science and Engineering Ohio State University, Columbus OH 43210 jinr,vaidyana,yangg,agrawal

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

An Empirical Comparison of Methods for Iceberg-CUBE Construction. Leah Findlater and Howard J. Hamilton Technical Report CS August, 2000

An Empirical Comparison of Methods for Iceberg-CUBE Construction. Leah Findlater and Howard J. Hamilton Technical Report CS August, 2000 An Empirical Comparison of Methods for Iceberg-CUBE Construction Leah Findlater and Howard J. Hamilton Technical Report CS-2-6 August, 2 Copyright 2, Leah Findlater and Howard J. Hamilton Department of

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: How business intelligence is a comprehensive framework to support business decision making How operational

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Chapter 4: Mining Frequent Patterns, Associations and Correlations Chapter 4: Mining Frequent Patterns, Associations and Correlations 4.1 Basic Concepts 4.2 Frequent Itemset Mining Methods 4.3 Which Patterns Are Interesting? Pattern Evaluation Methods 4.4 Summary Frequent

More information

On-Line Analytical Processing (OLAP) Traditional OLTP

On-Line Analytical Processing (OLAP) Traditional OLTP On-Line Analytical Processing (OLAP) CSE 6331 / CSE 6362 Data Mining Fall 1999 Diane J. Cook Traditional OLTP DBMS used for on-line transaction processing (OLTP) order entry: pull up order xx-yy-zz and

More information

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2 Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,

More information

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Project Participants

Project Participants Annual Report for Period:10/2004-10/2005 Submitted on: 06/21/2005 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

External Sorting Sorting Tables Larger Than Main Memory

External Sorting Sorting Tables Larger Than Main Memory External External Tables Larger Than Main Memory B + -trees for 7.1 External Challenges lurking behind a SQL query aggregation SELECT C.CUST_ID, C.NAME, SUM (O.TOTAL) AS REVENUE FROM CUSTOMERS AS C, ORDERS

More information

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

Apriori Algorithm. 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Apriori Algorithm For a given set of transactions, the main aim of Association Rule Mining is to find rules that will predict the occurrence of an item based on the occurrences of the other items in the

More information

Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining

Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining Frank Dehne 1,ToddEavis 2, and Andrew Rau-Chaplin 2 1 Carleton University, Ottawa, Canada, frank@dehne.net, WWW home page: http://www.dehne.net

More information

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ 45 Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ Department of Computer Science The Australian National University Canberra, ACT 2611 Email: fzhen.he, Jeffrey.X.Yu,

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply Recent desktop computers feature

More information

TID Hash Joins. Robert Marek University of Kaiserslautern, GERMANY

TID Hash Joins. Robert Marek University of Kaiserslautern, GERMANY in: Proc. 3rd Int. Conf. on Information and Knowledge Management (CIKM 94), Gaithersburg, MD, 1994, pp. 42-49. TID Hash Joins Robert Marek University of Kaiserslautern, GERMANY marek@informatik.uni-kl.de

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Frequent Pattern Mining and Association Analysis Cengiz Gunay Partial slide credits: Li Xiong, Jiawei Han and Micheline Kamber George Kollios 1 Mining Frequent Patterns,

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

Different Cube Computation Approaches: Survey Paper

Different Cube Computation Approaches: Survey Paper Different Cube Computation Approaches: Survey Paper Dhanshri S. Lad #, Rasika P. Saste * # M.Tech. Student, * M.Tech. Student Department of CSE, Rajarambapu Institute of Technology, Islampur(Sangli), MS,

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations. Relational Operations Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Optimization of Queries in Distributed Database Management System

Optimization of Queries in Distributed Database Management System Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of

More information

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple

More information

Building Large ROLAP Data Cubes in Parallel

Building Large ROLAP Data Cubes in Parallel Building Large ROLAP Data Cubes in Parallel Ying Chen Dalhousie University Halifax, Canada ychen@cs.dal.ca Frank Dehne Carleton University Ottawa, Canada www.dehne.net A. Rau-Chaplin Dalhousie University

More information

From SQL-query to result Have a look under the hood

From SQL-query to result Have a look under the hood From SQL-query to result Have a look under the hood Classical view on RA: sets Theory of relational databases: table is a set Practice (SQL): a relation is a bag of tuples R π B (R) π B (R) A B 1 1 2

More information

Improved Data Partitioning For Building Large ROLAP Data Cubes in Parallel

Improved Data Partitioning For Building Large ROLAP Data Cubes in Parallel Improved Data Partitioning For Building Large ROLAP Data Cubes in Parallel Ying Chen Dalhousie University Halifax, Canada ychen@cs.dal.ca Frank Dehne Carleton University Ottawa, Canada www.dehne.net frank@dehne.net

More information

Query Optimization in Distributed Databases. Dilşat ABDULLAH

Query Optimization in Distributed Databases. Dilşat ABDULLAH Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

Communication and Memory Optimal Parallel Data Cube Construction

Communication and Memory Optimal Parallel Data Cube Construction Communication and Memory Optimal Parallel Data Cube Construction Ruoming Jin Ge Yang Karthik Vaidyanathan Gagan Agrawal Department of Computer and Information Sciences Ohio State University, Columbus OH

More information

RELATIONAL OPERATORS #1

RELATIONAL OPERATORS #1 RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query

More information

Data Cube Technology

Data Cube Technology Data Cube Technology Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl

More information

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

ANU MLSS 2010: Data Mining. Part 2: Association rule mining ANU MLSS 2010: Data Mining Part 2: Association rule mining Lecture outline What is association mining? Market basket analysis and association rule examples Basic concepts and formalism Basic rule measurements

More information

DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843

DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843 DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843 WHAT IS A DATA CUBE? The Data Cube or Cube operator produces N-dimensional answers

More information

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar

Frequent Pattern Mining. Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Frequent Pattern Mining Based on: Introduction to Data Mining by Tan, Steinbach, Kumar Item sets A New Type of Data Some notation: All possible items: Database: T is a bag of transactions Transaction transaction

More information

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN:

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN: International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-6, Issue-1 E-ISSN: 2347-2693 Precomputing Shell Fragments for OLAP using Inverted Index Data Structure D. Datta

More information

Map-Reduce for Cube Computation

Map-Reduce for Cube Computation 299 Map-Reduce for Cube Computation Prof. Pramod Patil 1, Prini Kotian 2, Aishwarya Gaonkar 3, Sachin Wani 4, Pramod Gaikwad 5 Department of Computer Science, Dr.D.Y.Patil Institute of Engineering and

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

DW Performance Optimization (II)

DW Performance Optimization (II) DW Performance Optimization (II) Overview Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University

More information

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids Chapter 5: Data Cube Technology Data Cube Technology Data Cube Computation: Basic Concepts Data Cube Computation Methods Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/

More information

Materialized Data Mining Views *

Materialized Data Mining Views * Materialized Data Mining Views * Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland tel. +48 61

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC. Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC. Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets

Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets American Journal of Applied Sciences 2 (5): 926-931, 2005 ISSN 1546-9239 Science Publications, 2005 Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets 1 Ravindra Patel, 2 S.S.

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 20 Data Mining and Mining Association Rules Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Reading Material Optional Reading:

More information

Item Set Extraction of Mining Association Rule

Item Set Extraction of Mining Association Rule Item Set Extraction of Mining Association Rule Shabana Yasmeen, Prof. P.Pradeep Kumar, A.Ranjith Kumar Department CSE, Vivekananda Institute of Technology and Science, Karimnagar, A.P, India Abstract:

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

ACM-ICPC Indonesia National Contest Problem A. The Best Team. Time Limit: 2s

ACM-ICPC Indonesia National Contest Problem A. The Best Team. Time Limit: 2s Problem A The Best Team Time Limit: 2s ACM-ICPC 2010 is drawing near and your university want to select three out of N students to form the best team. The university however, has a limited budget, so they

More information

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Data Mining: Mining Association Rules. Definitions. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Definitions Market Baskets. Consider a set I = {i 1,...,i m }. We call the elements of I, items.

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

Parallelizing Frequent Itemset Mining with FP-Trees

Parallelizing Frequent Itemset Mining with FP-Trees Parallelizing Frequent Itemset Mining with FP-Trees Peiyi Tang Markus P. Turkia Department of Computer Science Department of Computer Science University of Arkansas at Little Rock University of Arkansas

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

Data Communication and Parallel Computing on Twisted Hypercubes

Data Communication and Parallel Computing on Twisted Hypercubes Data Communication and Parallel Computing on Twisted Hypercubes E. Abuelrub, Department of Computer Science, Zarqa Private University, Jordan Abstract- Massively parallel distributed-memory architectures

More information

The cgmcube project: Optimizing parallel data cube generation for ROLAP

The cgmcube project: Optimizing parallel data cube generation for ROLAP Distrib Parallel Databases (2006) 19: 29 62 DOI 10.1007/s10619-006-6575-6 The cgmcube project: Optimizing parallel data cube generation for ROLAP Frank Dehne Todd Eavis Andrew Rau-Chaplin C Science + Business

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Incognito: Efficient Full Domain K Anonymity

Incognito: Efficient Full Domain K Anonymity Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Performance and Scalability: Apriori Implementa6on

Performance and Scalability: Apriori Implementa6on Performance and Scalability: Apriori Implementa6on Apriori R. Agrawal and R. Srikant. Fast algorithms for mining associa6on rules. VLDB, 487 499, 1994 Reducing Number of Comparisons Candidate coun6ng:

More information

Column-Oriented Database Systems. Liliya Rudko University of Helsinki

Column-Oriented Database Systems. Liliya Rudko University of Helsinki Column-Oriented Database Systems Liliya Rudko University of Helsinki 2 Contents 1. Introduction 2. Storage engines 2.1 Evolutionary Column-Oriented Storage (ECOS) 2.2 HYRISE 3. Database management systems

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014 Distributed DBMS Advantages and disadvantages of distributed databases. Functions of DDBMS. Distributed database design. Distributed Database A logically interrelated collection of shared data (and a description

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information