Fighting Redundancy in SQL

Size: px
Start display at page:

Download "Fighting Redundancy in SQL"

Transcription

1 Fighting Redundancy in SQL Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY Abstract. Many SQL queries with aggregated subqueries exhibit redundancy (overlap in FROM and WHERE clauses). We propose a method, called the for-loop, to optimize such queries by ensuring that redundant computations are done only once. We specify a procedure to build a query plan implementing our method, give an example of its use and argue that it offers performance advantages over traditional approaches. 1 Introduction In this paper, we study a class of Decision-Support SQL queries, characterize them and show how to process them in an improved manner. In particular, we analyze queries containing subqueries, where the subquery is aggregated (type-a and type-ja in [8]). In many of these queries, SQL exhibits redundancy in that FROM and WHERE clauses of query and subquery show a great deal of overlap. We argue that these patterns are currently not well supported by relational query processors. The following example gives some intuition about the problem; the query used is Query 2 from the TPC-H benchmark ([18]) -we will refer to it as query TPCH2: select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like %BRASS and r_name = EUROPE and s_nationkey = n_nationkey and n_regionkey = r_regionkey and ps_supplycost = (select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = EUROPE ) order by s_acctbal desc, n_name, s_name, p_partkey; This query is executed in most systems by using unnesting techniques. However, the commonality between query and subquery will not be detected, and This research was sponsored by NSF under grant IIS

2 all operations (including common joins and selections) will be repeated (see an in-depth discussion of this example in subsection 2.3). Our goal is to avoid duplication of effort. For lack of space, we will not discuss related research in query optimization ([3, 11, 6 8, 15]); we point out that detecting and dealing with redundancy is not attempted in this body of work. Our method applies only to aggregated subqueries that contain WHERE clauses overlapping with the main query s WHERE clause. This may seem a very narrow type of queries until one realizes that all types of SQL subqueries can be rewritten as aggregated subqueries (EXISTS, for instance, can be rewritten as a subquery with COUNT; all other types of subqueries can be rewritten similarly ([2])). Therefore, the approach is potentially applicable to any SQL query with subqueries. Also, it is important to point out that the redundancy is present because of the structure of SQL, which necessitates a subquery in order to declaratively state the aggregation to be computed. Thus, we argue that such redundancy is not infrequent ([10]). We describe an optimization method geared towards detecting and optimizing this redundancy. Our method not only computes the redundant part only once, but also proposes a new special operator to compute the rest of the query very effectively. In section 2 we describe our approach and the new operator in more detail. We formally describe the operator (subsection 2.1), show one query trees with the operator can be generated for a given SQL query (subsection 2.2), and describe an experiment ran on the context of the TPC-H benchmark ([18]) (subsection 2.3). Finally, in section 3 we propose some further research. 2 Optimization of Redundancy In this section we define patterns which detect redundancy in SQL queries. We then show how to use the matching of patterns and SQL queries to produce a query plan which avoids repeating computations. We represent SQL queries in an schematic form or pattern. With the keywords SELECT... FROM... WHERE we will use L, L 1, L 2,... as variables over a list of attributes; T, T 1, T 2,... as variables over a list of relations, F, F 1, F 2,... as variables over aggregate functions and, 1, 2,... as variables over (complex) conditions. Attributes will be represented by attr, attr 1, attr 2,.... If there is a condition in the WHERE clause of the subquery which introduces correlation it will be shown explicitly; this is called the correlation condition. The table to which the correlated attribute belongs is called the correlation table, and is said to introduce the correlation; the attribute compared to the correlated attribute is called the correlating attribute. Also, the condition that connects query and subquery (called a linking condition) is also shown explicitly. The operator in the linking condition is called the linking operator, the attributes the linking attributes and the aggregate function on the subquery side is called the linking aggregate. We will say that a pattern matches an SQL query when there is a correspondence g between the variables in the pattern and the elements of the query. As an example, the pattern

3 SELECT L FROM T WHERE 1 AND attr 1 θ (SELECT F(attr 2 ) FROM T WHERE 2 ) would match query TPCH2 by setting g( 1 ) = {p partkey = ps partkey and s suppkey = ps suppkey and p size = 15 and p type like %BRASS and r name = EUROPE and s nationkey = n nationkey and n regionkey = r regionkey }, g( 2 ) = {p partkey = ps partkey and s suppkey = ps suppkey and r name = EUROPE and s nationkey = n nationkey and n regionkey = r regionkey}, g(t) = {part,supplier,partuspp,nation,region}, g(f) = min and g(attr 1 ) = g(attr 2 ) = ps supplycost. Note that the T symbol appears twice so the pattern forces the query to have the same FROM clauses in the main query and in the subquery 1. The correlation condition is p partkey = ps partkey; the correlation table is part, and ps partkey is the the correlating attribute. The linking condition here is ps supplycost = min(ps suplycost); thus ps supplycost is the linking attribute, = the linking operator and min the linking aggregate. The basic idea of our approach is to divide the work to be done in three parts: one that is common to query and subquery, one that belongs only to the subquery, and one that belongs only to the main query 2. The part that is common to both query and subquery can be done only once; however, as we argue in subsection 2.3 in most systems today it would be done twice. We calculate the three parts above as follows: the common part is g( 1 ) g( 2 ); the part proper to the main query is g( 1 ) g( 2 ); and the part proper to the subquery is g( 2 ) g( 1 ). For query TPCH2, this yields { p partkey = ps partkey and s suppkey = ps suppkey and r name = EUROPE and s nationkey = n nationkey and n regionkey = r regionkey}, {p size = 15 and p type like %BRASS } and, respectively. We use this matching in constructing a program to compute this query. The process is explained in the next subsection. 2.1 The For-Loop Operator We start out with the common part, called the base relation, in order to ensure that it is not done twice. The base relation can be expressed as an SPJ query. Our strategy is to compute the rest of the query starting from this base relation. This strategy faces two difficulties. First, if we simply divide the query based on common parts we obtain a plan where redundancy is eliminated at the price of fixing the order of some operations. In particular, some selections not in the common part wouldn t be pushed down. Hence, it is unclear whether this strategy will provide significant improvements by itself (this situation is similar 1 For correlated subqueries, the correlation table is counted as present in the FROM clause of the subquery. 2 We are assuming that all relations mentioned in a query are connected; i.e. that there are no Cartesian products present, only joins. Therefore, when there is overlap between query and subquery FROM clause, we are very likely to find common conditions in both WHERE clauses (at least the joins).

4 to that of [13]). Second, when starting from the base relation, we face a problem in that this relation has to be used for two different purposes: it must be used to compute an aggregate after finishing up the WHERE clause in the subquery (i.e. after computing g( 2 ) g( 1 )); and it must be used to finish up the WHERE clause in the main query (i.e. to compute g( 1 ) g( 2 )) and then, using the result of the previous step, compute the final answer to the query. However, it is extremely hard in relational algebra to combine the operators involved. For instance, the computation of an aggregate must be done before the aggregate can be used in a selection condition. In order to solve this problem, we define a new operator, called the forloop, which combines several relational operators into a new one (i.e. a macrooperator). The approach is based on the observation that some basic operations appear frequently together and they could be more efficiently implemented as a whole. In our particular case, we show in the next subsection that there is an efficient implementation of the for-loop operator which allows it, in some cases, to compute several basic operators with one pass over the data, thus saving considerable disk I/O. Definition 1. Let R be a relation, sch(r) the schema of R, L sch(r), A sch(r), F an aggregate function, α a condition on R (i.e. involving only attributes of sch(r)) and β a condition on sch(r) {F (A)} (i.e. involving attributes of sch(r) and possibly F (A)). Then for-loop operator is defined as either one of the following: 1. F L L,F (A),α,β (R). The meaning of the operator is defined as follows: let T emp be the relation GB L,F (A) (σ α (R)) (GB is used to indicate a group-by operation). Then the for-loop yields relation σ β (R R.L=T emp.l T emp), where the condition of the join is understood as the pairwise equality of each attribute in L. This is called a grouped for-loop. 2. F L F (A),α,β (R). The meaning of the operator is given by σ β (AGG F (A) (σ α (R)) R), where AGG F (A) (R) indicates the aggregate F computed over all A values of R. This is called a flat for-loop. Note that β may contain aggregated attributes as part of a condition. In fact, in the typical use in our approach, it does contains an aggregation. The main use of a for-loop is to calculate the linking condition of a query with an aggregated subquery on the fly, possibly with additional selections. Thus, for instance, for query TPCH2, the for-loop would take the grouped form F L p partkey,min(ps supplycost),,p size=15 p typelike%brass ps suplycost=min(ps supplycost) (R), where R is the relation obtained by computing the base relation 3. The for-loop is equivalent to the relational expression σ p size=15 p typelike%brass ps suplycost=min(ps supplycost) (AGG min(ps supplycost) (R) R). It can be seen that this expression will compute the original SQL query; the aggregation will compute the aggregate function of the subquery (the conditions 3 Again, note that the base relation contains the correlation as a join.

5 in the WHERE clause of the subquery have already been computed in R, since in this case 2 1 and hence 2 1 = ), and the Cartesian product will put a copy of this aggregate on each tuple, allowing the linking condition to be stated as a regular condition over the resulting relation. Note that this expression may not be better, from a cost point of view, than other plans produced by standard optimization. What makes this plan attractive is that the for-loop operator can be implemented in such a way that it computes its output with one pass over the data. In particular, the implementation will not carry out any Cartesian product, which is used only to explain the semantics of the operator. The operator is written as an iterator that loops over the input implementing a simple program (hence the name). The basic idea is simple: in some cases, computing an aggregation and using the aggregate result in a selection can be done at the same time. This is due to the behavior of some aggregates and the semantics of the conditions involved. Assume, for instance, that we have a comparison of the type att = min(attr2), where both attr and attr2 are attributes of some table R. In this case, as we go on computing the minimum for a series of values, we can actually decide, as we iterate over R, whether some tuples will make the condition true or not ever. This is due to the fact that min is monotonically non-increasing, i.e. as we iterate over R and we carry a current minimum, this value will always stay the same or decrease, never increase. Since equality imposes a very strict constraint, we can take a decision on the current tuple t based on the values of t.attr and the current minimum, as follows: if t.attr is greater than the current minimum, we can safely get rid of it. If t.attr is equal to the current minimum, we should keep it, as least for now, in a temporary result temp1. If t.attr is less than the current minimum, we should keep it, in case our current minimum changes, in a temporary result temp2. Whenever the current minimum changes, we know that temp1 should be deleted, i.e. tuples there cannot be part of a solution. On the other hand, temp2 should be filtered: some tuples there may be thrown away, some may be in a new temp1, some may remain in temp2. At the end of the iteration, the set temp1 gives us the correct solution. Of course, as we go over the tuples in R we may keep some tuples that we need to get rid of later on; but the important point is that we never have to get back and recover a tuple that we dismissed, thanks to the monotonic behavior of min. This behavior does generalize to max, sum, count, since they are all monotonically non-decreasing (for sum, it is assumed that all values in the domain are positive numbers); however, average is not monotonic (either in an increasing or decreasing manner). For this reason, our approach does not apply to average. For the other aggregates, though, we argue that we can successfully take decisions on the fly without having to recover discarded tuples later on. 2.2 Query Transformation The general strategy to produce a query plan with for-loops for a given SQL query Q is as follows: we classify q into one of two categories, according to q s structure. For each category, a pattern p is given. As before, if q fits into p

6 there is a mapping g between constants in q and variables in p. Associated with each pattern there is a for-loop program template t. A template is different from a program in that it has variables and options. Using the information on the mapping g (including the particular linking aggregate and linking condition in q), a concrete for-loop program is generated from t. The process to produce a query tree containing a for-loop operator is then simple: our patterns allow us to identify the part common to query and subquery (i.e. the base relation), which is used to start the query tree. Standard relational optimization techniques can be applied to this part. Then a for-loop operator which takes the base relation as input is added to the query tree, and its parameters determined. We describe each step separately. We distinguish between two types of queries: type A queries, in which the subquery is not correlated (this corresponds to type J in [8]); and type B queries, where the subquery is correlated (this corresponds to the type JA in [8]). Queries of type A are interesting in that usual optimization techniques cannot do anything to improve them (obviously, unnesting does not apply to them). Thus, our approach, whenever applicable, offers a chance to create an improved query plan. In contrast, queries of type B have been dealt with extensively in the literature ([8, 3, 6, 11, 17, 16, 15]). As we will see, our approach is closely related to other unnesting techniques, but it is the only one that considers redundancy between query and subquery and its optimization. The general pattern a type A query must fit is given below: SELECT L FROM T WHERE 1 and attr 1 θ (SELECT F(attr 2 ) FROM T WHERE 2 ) {GROUP BY L2} The parenthesis around the GROUP BY clause are to indicate that such clause is optional 4. We create a query plan for this query in two steps: 1. A base relation is defined by g( 1 ) g( 2 )(g(t )). Note that this is an SPJ query, which can be optimized by standard techniques. 2. We apply a forloop operator defined by F L(g(F (attr 2 )), g( 2 ) g( 1 ), g( 1 ) g( 2 ) g(attr 3 θ F 2 (attr 4 ))) It can be seen that this query plan computes the correct result for this query by using the definition of the for-loop operator. Here, the aggregate is F (attr 2 ), α is g( 2 1 ) and β is g( 1 ) g( 2 ) g(attr θ F (attr 2 )). Thus, this plan will first apply 1 2 to T, in order to generate the base relation. Then, the for-loop will compute the aggregate F (attr 2 ) on the result of selecting g( 2 1 ) on the base relation. Note that ( 2 1 ) ( 1 2 ) = 2, and hence the aggregate is computed over the conditions in the subquery only, as it should. The result of this aggregate is then appended to every tuple in the base relation by the Cartesian 4 Obviously, SQL syntax requires that L2 L, where L and L2 are lists of attributes. In the following, we assume that queries are well formed.

7 product (again, note that this description is purely conceptual). After that, the selection on g( 1 ) g( 2 ) g(attr 3 θ F 2 (attr 4 )) is applied. Here we have that ( 1 2 ) ( 1 2 ) = 1, and hence we are applying all the conditions in the main clause. We are also applying the linking condition attr 3 θ F (attr 2 ), which can be considered a regular condition now because F (attr 2 ) is present in every tuple. Thus, the forloop operator computes the query correctly. This forloop operator will be implemented by a program that will carry out all needed operators with one scan of the input relation. Clearly, the concrete program is going to depend on the linking operator (θ, assumed to be one of {=, <=, >=, <, >}) and the aggregate function (F, assumed to be one of min,max,sum,count,avg). The general pattern for type B queries is given next. SELECT L FROM T 1 WHERE 1 and attr 1 θ (SELECT F 1 (attr 2 ) FROM T 2 WHERE 2 and S.attr 3 θ R.attr 4 ) {GROUP BY L2} where R T 1 T 2, S T 2, and we are assuming that T 1 {R} = T 2 {S} (i.e. the FROM clauses contain the same relations except the one introducing the correlated attribute, called R, and the one introducing the correlation attribute, called S). We call T = T 1 {R}. As before, a group by clause is optional. In our approach, we consider the table containing the correlated attribute as part of the FROM clause of the subquery too (i.e. we effectively decorrelate the subquery). Thus, the outer join is always part of our common part. In our plan, there are two steps: 1. compute the base relation, given by g( 1 2 )(T {R, S}). This includes the outer join of R and S. 2. computation of a grouped forloop defined by F L(attr6, F (attr2), 2 1, 1 2 attr1 θ F (attr2)) which computes the rest of the query. Our plan has two main differences with traditional unnesting: the parts common to query and subquery are computed only once, at the beginning of the plan, and computing the aggregate, the linking predicate, and possible some selections is carried out by the forloop predicate in one step. Thus, we potentially deal with larger temporary results, as some selections (those not in 1 2 ) are not pushed down, but may be able to effect several computations at once (and do not repeat any computation). Clearly, which plan is better depends on the amount of redundancy between query and subquery, the linking condition (which determines how efficient the for-loop operator is), and traditional optimization parameters, like the size of the input relations and the selectivity of the different conditions.

8 Select ps_supplycost=min(ps_supplycost) PartSupp Select size=15&type LIKE %BRASS Part Select name="europe" Region Nation Supplier PartSupp GBps_partkey,min(ps_supplycost) Supplier Nation Select name="europe" Region Supplier FL Part Region Nation PartSupp Select name="europe" Fig. 1. Standard query plan (p_partkey, min(ps_supplycost), (p_size=15 & p_type LIKE %BRASS & ps_supplycost=min(ps_supplycost)) Fig. 2. For-loop query plan 2.3 Example and Analytical Comparison We apply our approach to query TPCH2; this is a typical B query. For our experiment we created a TPC-H benchmark of the smallest size (1 GB) using two leading commercial DBMS. We created indices in all primary and foreign keys, updated system statistics, and capture the query plan for query 2 on each system. Both query plans were very similar, and they are represented by the query tree in figure 1. Note that the query is unnested based on Kim s approach (i.e. first group and then join). Note also that all selections are pushed all the way down; they were executed by pipelining with the joins. The main differences between the two systems were the choices of implementations for the joins and different join ordering 5. For our concern, the main observation about this query plan is that operations in query and subquery are repeated, even though there clearly is a large amount of repetition 6. We created a query plan for this query, 5 To make sure that the particular linking condition was not an issue, the query was changed to use different linking aggregates and linking operators; the query plan remained the same (except that for operators other than equality Dayal s approach was used instead of Kim s). Also, memory size was varied from a minimum of 64 M to a maximum of 512 M, to determine if memory size was an issue. Again, the query plan remained the same through all memory sizes. 6 We have disregarded the final Sort needed to complete the query, as this would be necessary in any approach, including ours.

9 based on our approach (shown in figure 2). Note that our approach does not dictate how the base relation is optimized; the particular plan shown uses the same tree as the original query tree to facilitate comparisons. It is easy to see that our approach avoids any duplication of work. However, this comes at the cost of fixing the order of some operations (i.e. operations in 1 2 must be done before other operations). In particular, some selections get pushed up because they do not belong into the common part, which increases the size of the relation created as input for the for-loop. Here, TPCH2 returns 460 rows, while the intermediate relation that the for-loop takes as input has 158,960 tuples. Thus, the cost of executing the for-loop may add more than other operations because of a larger input. However, grouping and aggregating took both systems about 10% of the total time 7. Another observation is that the duplicated operations do not take double the time, because of cache usage. But this can be attributed to the excellent main memory/database size ratio in our setup; with a more realistic setup this effect is likely to be diminished. Nevertheless, our approach avoids duplicated computation and does result in some time improvement (it takes about 70% of the time of the standard approach). In any case, it is clear that a plan using the for-loop is not guaranteed to be superior to traditional plans under all circumstances. Thus, it is very important to note that we assume a cost-based optimizer which will generate a for-loop plan if at least some amount of redundancy is detected, and will compare the for-loop plan to others based on cost. 3 Conclusions and Further Research We have argued that Decision-support SQL queries tend to contain redundancy between query and subquery, and this redundancy is not detected and optimized by relational processors. We have introduced a new optimization mechanism to deal with this redundancy, the for-loop operator, and an implementation for it, the for-loop program. We developed a transformation process that takes us from SQL queries to for-loop programs. A comparative analysis with standard relational optimization was shown. The for-loop approach promises a more efficient implementation for queries falling in the patterns given. For simplicity and lack of space, the approach is introduced here applied to a very restricted class of queries. However, we have already worked out extensions to widen its scope (mainly, the approach can work with overlapping (not just identical) FROM clauses in query and subquery, and with different classes of linking conditions). We are currently developing a precise cost model, in order to compare the approach with traditional query optimization using different degrees of overlap, different linking conditions, and different data distributions as parameters. We are also working on extending the approach to several levels of nesting, and studying its applicability to OQL. 7 This and all other data about time come from measuring performance of appropriate SQL queries executed against the TPC-H database on both systems. Details are left out for lack of space.

10 References 1. Badia, A. and Niehues, M. Optimization of Sequences of Relational Queries in Decision-Support Environments, in Proceedings of DAWAK 99, LNCS n. 1676, Springer-Verlag. 2. Cao, Bin and Badia, A. Subquery Rewriting for Optimization of SQL Queries, submitted for publication. 3. Dayal, U. Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers, in Proceedings of the VLDB Conference, Fegaras, L. and Maier, D. Optimizing Queries Using an Effective Calculus, ACM TODS, vol. 25, n. 4, Freytag, J. and Goodman, N. On the Translation of Relational Queries into Iterative Programs, ACM Transactions on Database Systems, vol. 14, no. 1, March Ganski, R. and Wong, H. Optimization of Nested SQL Queries Revisited, in Proceedings of the ACM SIGMOD Conference, Goel, P. and Iyer, B. SQL Query Optimization: Reordering for a General Class of Queries, in Proceedings of the 1996 ACM SIGMOD Conference. 8. Kim, W. On Optimizing an SQL-Like Nested Query, ACM Transactions On Database Systems, vol. 7, n.3, September Lieuwen, D. and DeWitt, D. A Transformation-Based Approach to Optimizing Loops in database Programming Languages, in Proceedings of the ACM SIGMOD Conference, Lu, H., Chan, H. C. and Wei, K. K. A Survey on Usage of SQL, SIGMOD Record, Muralikrishna, M. Improving Unnesting Algorithms for Aggregate Queries in SQL, in Proceedings of the VLDB Conference, Park, J. and Segev, A. Using common subexpressions to optimize multiple queries, in Proceedings of the 1988 IEEE CS ICDE. 13. Ross, K. and Rao, J. Reusing Invariants: A New Strategy for Correlated Queries, in Proceedings of the ACM SIGMOD Conference, Jun Rao, Bruce Lindsay, Guy Lohman, Hamid Pirahesh and David Simmen, Using EELs, a Practical Approach to Outerjoin and Antijoin Reordering, in Proceedings of ICDE Praveen Seshadri, Hamid Pirahesh, T. Y. Cliff Leung Complex Query Decorrelation, in Proceedings of ICDE Praveen Seshadri, Joseph M. Hellerstein, Hamid Pirahesh, T. Y. Cliff Leung, Raghu Ramakrishnan, Divesh Srivastava, Peter J. Stuckey, and S. Sudarshan Cost-Based Optimization for Magic: Algebra and Implementation, in Proceedings of the SIGMOD Conference, Inderpal Singh Mumick and Hamid Pirahesh Implementation of Magic-sets in a Relational Database System, in Proceedings of the SIGMOD Conference TPC-H Benchmark, TPC Council,

Fighting Redundancy in SQL: the For-Loop Approach

Fighting Redundancy in SQL: the For-Loop Approach Fighting Redundancy in SQL: the For-Loop Approach Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY 40292 July 8, 2004 1 Introduction

More information

Computing SQL Queries with Boolean Aggregates

Computing SQL Queries with Boolean Aggregates Computing SQL Queries with Boolean Aggregates Antonio Badia Computer Engineering and Computer Science department University of Louisville Abstract. We introduce a new method for optimization of SQL queries

More information

A Nested Relational Approach to Processing SQL Subqueries

A Nested Relational Approach to Processing SQL Subqueries A Nested Relational Approach to Processing SQL Subqueries Bin Cao bin.cao@louisville.edu Antonio Badia abadia@louisville.edu Computer Engineering and Computer Science Department University of Louisville

More information

An Overview of Cost-based Optimization of Queries with Aggregates

An Overview of Cost-based Optimization of Queries with Aggregates An Overview of Cost-based Optimization of Queries with Aggregates Surajit Chaudhuri Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94304 chaudhuri@hpl.hp.com Kyuseok Shim IBM Almaden Research

More information

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse Jiratta Phuboon-ob, and Raweewan Auepanwiriyakul Abstract A data warehouse (DW) is a system which has value and role for decision-making

More information

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets

More information

Optimizing Communication for Multi- Join Query Processing in Cloud Data Warehouses

Optimizing Communication for Multi- Join Query Processing in Cloud Data Warehouses Optimizing Communication for Multi- Join Query Processing in Cloud Data Warehouses Swathi Kurunji, Tingjian Ge, Xinwen Fu, Benyuan Liu, Cindy X. Chen Computer Science Department, University of Massachusetts

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

High Volume In-Memory Data Unification

High Volume In-Memory Data Unification 25 March 2017 High Volume In-Memory Data Unification for UniConnect Platform powered by Intel Xeon Processor E7 Family Contents Executive Summary... 1 Background... 1 Test Environment...2 Dataset Sizes...

More information

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database query processing Query Processing The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database from high level queries

More information

Optimized Query Plan Algorithm for the Nested Query

Optimized Query Plan Algorithm for the Nested Query Optimized Query Plan Algorithm for the Nested Query Chittaranjan Pradhan School of Computer Engineering, KIIT University, Bhubaneswar, India Sushree Sangita Jena School of Computer Engineering, KIIT University,

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Optimization of Nested Queries in a Complex Object Model

Optimization of Nested Queries in a Complex Object Model Optimization of Nested Queries in a Complex Object Model Based on the papers: From Nested loops to Join Queries in OODB and Optimisation if Nested Queries in a Complex Object Model by Department of Computer

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Redundancy Awareness in SQL Queries

Redundancy Awareness in SQL Queries Redundancy Awareness in QL Queries Bin ao and Antonio Badia omputer Engineering and omputer cience Department University of Louisville bin.cao,abadia @louisville.edu Abstract In tis paper, we study QL

More information

Detecting Logical Errors in SQL Queries

Detecting Logical Errors in SQL Queries Detecting Logical Errors in SQL Queries Stefan Brass Christian Goldberg Martin-Luther-Universität Halle-Wittenberg, Institut für Informatik, Von-Seckendorff-Platz 1, D-06099 Halle (Saale), Germany (brass

More information

yqgm_std_rules documentation (Version 1)

yqgm_std_rules documentation (Version 1) yqgm_std_rules documentation (Version 1) Feng Shao Warren Wong Tony Novak Computer Science Department Cornell University Copyright (C) 2003-2005 Cornell University. All Rights Reserved. 1. Introduction

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

CS122 Lecture 4 Winter Term,

CS122 Lecture 4 Winter Term, CS122 Lecture 4 Winter Term, 2014-2015 2 SQL Query Transla.on Last time, introduced query evaluation pipeline SQL query SQL parser abstract syntax tree SQL translator relational algebra plan query plan

More information

XWeB: the XML Warehouse Benchmark

XWeB: the XML Warehouse Benchmark XWeB: the XML Warehouse Benchmark CEMAGREF Clermont-Ferrand -- Université de Lyon (ERIC Lyon 2) hadj.mahboubi@cemagref.fr -- jerome.darmont@univ-lyon2.fr September 17, 2010 XWeB: CEMAGREF the XML Warehouse

More information

Exploiting Predicate-window Semantics over Data Streams

Exploiting Predicate-window Semantics over Data Streams Exploiting Predicate-window Semantics over Data Streams Thanaa M. Ghanem Walid G. Aref Ahmed K. Elmagarmid Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-1398 {ghanemtm,aref,ake}@cs.purdue.edu

More information

Whitepaper. Big Data implementation: Role of Memory and SSD in Microsoft SQL Server Environment

Whitepaper. Big Data implementation: Role of Memory and SSD in Microsoft SQL Server Environment Whitepaper Big Data implementation: Role of Memory and SSD in Microsoft SQL Server Environment Scenario Analysis of Decision Support System with Microsoft Windows Server 2012 OS & SQL Server 2012 and Samsung

More information

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

CS 582 Database Management Systems II

CS 582 Database Management Systems II Review of SQL Basics SQL overview Several parts Data-definition language (DDL): insert, delete, modify schemas Data-manipulation language (DML): insert, delete, modify tuples Integrity View definition

More information

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan Plan for today Query Processing/Optimization CPS 216 Advanced Database Systems Overview of query processing Query execution Query plan enumeration Query rewrite heuristics Query rewrite in DB2 2 A query

More information

Introduction to Database Systems CSE 444

Introduction to Database Systems CSE 444 Introduction to Database Systems CSE 444 Lecture 18: Query Processing Overview CSE 444 - Summer 2010 1 Where We Are We are learning how a DBMS executes a query How come a DBMS can execute a query so fast?

More information

Parallelism Strategies In The DB2 Optimizer

Parallelism Strategies In The DB2 Optimizer Session: A05 Parallelism Strategies In The DB2 Optimizer Calisto Zuzarte IBM Toronto Lab May 20, 2008 09:15 a.m. 10:15 a.m. Platform: DB2 on Linux, Unix and Windows The Database Partitioned Feature (DPF)

More information

Query Optimization Overview

Query Optimization Overview Query Optimization Overview parsing, syntax checking semantic checking check existence of referenced relations and attributes disambiguation of overloaded operators check user authorization query rewrites

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Optimization of Nested Queries using the NF 2 Algebra

Optimization of Nested Queries using the NF 2 Algebra c J. Hölsch, M. Grossniklaus, and M. H. Scholl, 216. This is the author s version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in Proc.

More information

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15 Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

Exploring Power-Performance Tradeoffs in Database Systems

Exploring Power-Performance Tradeoffs in Database Systems Exploring Power-Performance Tradeoffs in Database Systems Zichen Xu, 1 Yi-Cheng Tu, 1 and Xiaorui Wang 2 1 Department of Computer Science & Engineering, University of South Florida 4202 E. Fowler Ave.,

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

CSE 544, Winter 2009, Final Examination 11 March 2009

CSE 544, Winter 2009, Final Examination 11 March 2009 CSE 544, Winter 2009, Final Examination 11 March 2009 Rules: Open books and open notes. No laptops or other mobile devices. Calculators allowed. Please write clearly. Relax! You are here to learn. Question

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision TPC BENCHMARK TM H (Decision Support) Standard Specification Revision 2.17.3 Transaction Processing Performance Council (TPC) Presidio of San Francisco Building 572B Ruger St. (surface) P.O. Box 29920

More information

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan SQL CS 564- Fall 2015 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan MOTIVATION The most widely used database language Used to query and manipulate data SQL stands for Structured Query Language many SQL standards:

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage

More information

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision 2.8.0

TPC BENCHMARK TM H (Decision Support) Standard Specification Revision 2.8.0 TPC BENCHMARK TM H (Decision Support) Standard Specification Revision 2.8.0 Transaction Processing Performance Council (TPC) Presidio of San Francisco Building 572B Ruger St. (surface) P.O. Box 29920 (mail)

More information

Data Manipulation (DML) and Data Definition (DDL)

Data Manipulation (DML) and Data Definition (DDL) Data Manipulation (DML) and Data Definition (DDL) 114 SQL-DML Inserting Tuples INSERT INTO REGION VALUES (6,'Antarctica','') INSERT INTO NATION (N_NATIONKEY, N_NAME, N_REGIONKEY) SELECT NATIONKEY, NAME,

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML) Since in the result relation each group is represented by exactly one tuple, in the select clause only aggregate functions can appear, or attributes that are used for grouping, i.e., that are also used

More information

Relational Model: History

Relational Model: History Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s

More information

SQL - Data Query language

SQL - Data Query language SQL - Data Query language Eduardo J Ruiz October 20, 2009 1 Basic Structure The simple structure for a SQL query is the following: select a1...an from t1... tr where C Where t 1... t r is a list of relations

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

Optimizing Queries with Aggregate Views. Abstract. Complex queries, with aggregates, views and nested subqueries

Optimizing Queries with Aggregate Views. Abstract. Complex queries, with aggregates, views and nested subqueries Optimizing Queries with Aggregate Views Surajit Chaudhuri 1 and Kyuseok Shim 2 1 Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304, USA 2 IBM Almaden Research Center, 650 Harry Road,

More information

Tuning Relational Systems I

Tuning Relational Systems I Tuning Relational Systems I Schema design Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning, etc Query rewriting Using indexes appropriately,

More information

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide

More information

Relational Algebra. Procedural language Six basic operators

Relational Algebra. Procedural language Six basic operators Relational algebra Relational Algebra Procedural language Six basic operators select: σ project: union: set difference: Cartesian product: x rename: ρ The operators take one or two relations as inputs

More information

Design and Implementation of Bit-Vector filtering for executing of multi-join qureies

Design and Implementation of Bit-Vector filtering for executing of multi-join qureies Undergraduate Research Opportunity Program (UROP) Project Report Design and Implementation of Bit-Vector filtering for executing of multi-join qureies By Cheng Bin Department of Computer Science School

More information

DTD-Directed Publishing with Attribute Translation Grammars

DTD-Directed Publishing with Attribute Translation Grammars DTD-Directed Publishing with Attribute Translation Grammars Michael Benedikt Chee Yong Chan Wenfei Fan Rajeev Rastogi Bell Laboratories, Lucent Technologies benedikt,cychan,wenfei,rastogi @research.bell-labs.com

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

CSE 344 MAY 7 TH EXAM REVIEW

CSE 344 MAY 7 TH EXAM REVIEW CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice

More information

Schema Tuning. Tuning Schemas : Overview

Schema Tuning. Tuning Schemas : Overview Administração e Optimização de Bases de Dados 2012/2013 Schema Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID Tuning Schemas : Overview Trade-offs among normalization / denormalization Overview When

More information

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model Agenda CSE 444: Database Internals Review Relational Model Lecture 2 Review of the Relational Model Review Queries (will skip most slides) Relational Algebra SQL Review translation SQL à RA Needed for

More information

Optimizing relational queries in connection hypergraphs: nested queries, views, and binding propagations

Optimizing relational queries in connection hypergraphs: nested queries, views, and binding propagations The VLDB Journal (1998) 7: 1 11 The VLDB Journal c Springer-Verlag 1998 Optimizing relational queries in connection hypergraphs: nested queries, views, and binding propagations Jia Liang Han Bell Labs,

More information

Query Optimization in Distributed Databases. Dilşat ABDULLAH

Query Optimization in Distributed Databases. Dilşat ABDULLAH Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of

More information

New Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time!

New Requirements. Advanced Query Processing. Top-N/Bottom-N queries Interactive queries. Skyline queries, Fast initial response time! Lecture 13 Advanced Query Processing CS5208 Advanced QP 1 New Requirements Top-N/Bottom-N queries Interactive queries Decision making queries Tolerant of errors approximate answers acceptable Control over

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 6 Lifecycle of a Query Plan 1 Announcements HW1 is due Thursday Projects proposals are due on Wednesday Office hour canceled

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical

More information

2. Make an input file for Query Execution Steps for each Q1 and RQ respectively-- one step per line for simplicity.

2. Make an input file for Query Execution Steps for each Q1 and RQ respectively-- one step per line for simplicity. General Suggestion/Guide on Program (This is only for suggestion. You can change your own design as needed and you can assume your own for simplicity as long as it is reasonable to make it as assumption.)

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso).

Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). Orri Erling (Program Manager, OpenLink Virtuoso), Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). Business Intelligence Extensions for SPARQL Orri Erling and Ivan Mikhailov OpenLink Software, 10 Burlington

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Optimizing Queries Using Materialized Views

Optimizing Queries Using Materialized Views Optimizing Queries Using Materialized Views Paul Larson & Jonathan Goldstein Microsoft Research 3/22/2001 Paul Larson, View matching 1 Materialized views Precomputed, stored result defined by a view expression

More information

Technical Report - Distributed Database Victor FERNANDES - Université de Strasbourg /2000 TECHNICAL REPORT

Technical Report - Distributed Database Victor FERNANDES - Université de Strasbourg /2000 TECHNICAL REPORT TECHNICAL REPORT Distributed Databases And Implementation of the TPC-H Benchmark Victor FERNANDES DESS Informatique Promotion : 1999 / 2000 Page 1 / 29 TABLE OF CONTENTS ABSTRACT... 3 INTRODUCTION... 3

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5 SQL QUERIES CS121: Relational Databases Fall 2017 Lecture 5 SQL Queries 2 SQL queries use the SELECT statement General form is: SELECT A 1, A 2,... FROM r 1, r 2,... WHERE P; r i are the relations (tables)

More information

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Vidya Bodhe P.G. Student /Department of CE KKWIEER Nasik, University of Pune, India vidya.jambhulkar@gmail.com Abstract

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

CIS 330: Applied Database Systems

CIS 330: Applied Database Systems 1 CIS 330: Applied Database Systems Lecture 7: SQL Johannes Gehrke johannes@cs.cornell.edu http://www.cs.cornell.edu/johannes Logistics Office hours role call: Mondays, 3-4pm Tuesdays, 4:30-5:30 Wednesdays,

More information

Avoiding Sorting and Grouping In Processing Queries

Avoiding Sorting and Grouping In Processing Queries Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Principles of Data Management. Lecture #12 (Query Optimization I)

Principles of Data Management. Lecture #12 (Query Optimization I) Principles of Data Management Lecture #12 (Query Optimization I) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v B+ tree

More information

Chapter 4: SQL. Basic Structure

Chapter 4: SQL. Basic Structure Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Joined Relations Data Definition Language Embedded SQL

More information

Perm Integrating Data Provenance Support in Database Systems

Perm Integrating Data Provenance Support in Database Systems Perm Integrating Data Provenance Support in Database Systems Boris Glavic Database Technology Group Department of Informatics University of Zurich glavic@ifi.uzh.ch Gustavo Alonso Systems Group Department

More information

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag. Physical Design D B M G 1 Phases of database design Application requirements Conceptual design Conceptual schema Logical design ER or UML Relational tables Logical schema Physical design Physical schema

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

An Alternative Storage Scheme for the DBNotes Annotation Management System for Relational Databases

An Alternative Storage Scheme for the DBNotes Annotation Management System for Relational Databases ENST Paris - Promotion 2006 Bogdan ALEXE Rapport de stage d ingenieur An Alternative Storage Scheme for the DBNotes Annotation Management System for Relational Databases Non Confidentiel Directeur de stage:

More information

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy CompSci 516 Data Intensive Computing Systems Lecture 3 SQL - 2 Instructor: Sudeepa Roy Announcements HW1 reminder: Due on 09/21 (Thurs), 11:55 pm, no late days Project proposal reminder: Due on 09/20 (Wed),

More information

Optimization of Queries in Distributed Database Management System

Optimization of Queries in Distributed Database Management System Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of

More information

Violating Independence

Violating Independence by David McGoveran (Originally published in the Data Independent, Premier Issue, Jan. 1995: Updated Sept. 2014) Introduction A key aspect of the relational model is the separation of implementation details

More information

Relational Query Optimization. Highlights of System R Optimizer

Relational Query Optimization. Highlights of System R Optimizer Relational Query Optimization Chapter 15 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Highlights of System R Optimizer v Impact: Most widely used currently; works well for < 10 joins.

More information

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) Technology & Information Management Instructor: Michael Kremer, Ph.D. Class 6 Professional Program: Data Administration and Management MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) AGENDA

More information

CSE 344 JANUARY 26 TH DATALOG

CSE 344 JANUARY 26 TH DATALOG CSE 344 JANUARY 26 TH DATALOG ADMINISTRATIVE MINUTIAE HW3 and OQ3 out HW3 due next Friday OQ3 due next Wednesday HW4 out next week: on Datalog Midterm reminder: Feb 9 th RELATIONAL ALGEBRA Set-at-a-time

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel Madden and Nabil Hachem SIGMOD 2008 Presented by: Souvik Pal Subhro Bhattacharyya Department of Computer Science Indian

More information

Chapter 6: Formal Relational Query Languages

Chapter 6: Formal Relational Query Languages Chapter 6: Formal Relational Query Languages Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 6: Formal Relational Query Languages Relational Algebra Tuple Relational

More information