Computing SQL Queries with Boolean Aggregates

Size: px
Start display at page:

Download "Computing SQL Queries with Boolean Aggregates"

Transcription

1 Computing SQL Queries with Boolean Aggregates Antonio Badia Computer Engineering and Computer Science department University of Louisville Abstract. We introduce a new method for optimization of SQL queries with nested subqueries. The method is based on the idea of Boolean aggregates, aggregates that compute the conjunction or disjunction of a set of conditions. When combined with grouping, Boolean aggregates allow us to compute all types of non-aggregated subqueries in a uniform manner. The resulting query trees are simple and amenable to further optimization. Our approach can be combined with other optimization techniques and can be implemented with a minimum of changes in any cost-based optimizer. 1 Introduction Due to the importance of query optimization, there exists a large body of research in the subject, especially for the case of nested subqueries ([10, 5, 13, 7, 8, 17]). It is considered nowadays that existing approaches can deal with all types of SQL subqueries through unnesting. However, practical implementation lags behind the theory, since some transformations are quite complex to implement. In particular, subqueries where the linking condition (the condition connecting query and subquery) is one of NOT IN, NOT EXISTS or a comparison with ALL seem to present problems to current optimizers. These cases are assumed to be translated, or are dealt with using antijoins. However, the usual translation does not work in the presence of nulls, and even when fixed it adds some overhead to the original query. On the other hand, antijoins introduce yet another operator that cannot be moved in the query tree, thus making the job of the optimizer more difficult. When a query has several levels, the complexity grows rapidly (an example is given below). In this paper we introduce a variant of traditional unnesting methods that deals with all types of linking conditions in a simple, uniform manner. The query tree created is simple, and the approach extends neatly to several levels of nesting and several subqueries at the same level. The approach is based on the concept of Boolean aggregates, which are an extension of the idea of aggregate function in SQL ([12]). Intuitively, Boolean aggregates are applied to a set of predicates and combine the truth values resulting from evaluation of the predicates. We show how two simple Boolean predicates can take care of any type of SQL subquery in This research was sponsored by NSF under grant IIS

2 a uniform manner. The resulting query trees are simple and amenable to further optimization. Our approach can be combined with other optimization techniques and can be implemented with a minimum of changes in any cost-based optimizer. In section 2 we describe in more detail related research on query optimization and motivate our approach with an example. In section 3 we introduce the concept of Boolean aggregates and show its use in query unnesting. We then apply our approach to the example and discuss the differences with standard unnesting. Finally, in section 4 we offer some preliminary conclusions and discuss further research. 2 Related Research and Motivation We study SQL queries that contain correlated subqueries 1. Such subqueries contain a correlated predicate, a condition in their WHERE clause introducing the correlation. The attribute in the correlated predicate provided by a relation in an outer block is called the correlation attribute; the other attribute is called the correlated attribute. The condition connecting query and subquery is called the linking condition. There are basically four types of linking condition in SQL: comparisons between an attribute and an aggregation (called the linking aggregate); IN and NOT IN comparisons; EXISTS and NOT EXISTS comparisons; and quantified comparisons between an attribute and a set of attribute through the use of SOME and ALL. We call linking conditions involving an aggregate, IN, EXISTS, and comparisons with SOME positive linking conditions, and the rest (those involving NOT IN, NOT EXISTS, and comparisons with ALL) negative linking conditions. All nested correlated subqueries are nowadays executed by some variation of unnesting. In its original approach ([10]), the correlation predicate is seen as a join; if the subquery is aggregated, the aggregate is computed in advance and then join is used. Kim s approach had a number of shortcomings; among them, it assumed that the correlation predicate always used equality and the linking condition was a positive one. Dayal s ([5]) and Muralikrishna s ([13]) work solved these shortcomings; Dayal introduced the idea of using an outerjoin instead of a join (so values with no match would not be lost), and proceeds with the aggregate computation after the outerjoin. Muralikrishna generalizes the approach and points out that negative linking aggregates can be dealt with using antijoin or translating them to other, positive linking aggregates. These approaches also introduce some shortcomings. First, outerjoins and antijoins do not commute with regular joins or selections; therefore, a query tree with all these operators does not offer many degrees of freedom to the optimizer. The work of [6] and [16] has studied conditions under which outerjoins and antijoins can be moved; alleviating this problem partially. Another problem with this approach is that by carrying out the (outer)join corresponding to the correlation predicate, other predicates in the WHERE clause of the main query, which may restrict the total computation to be carried out, are postponed. The magic sets approach 1 The approach is applicable to non-correlated subqueries as well, but does not provide any substantial gains in that case.

3 ([17, 18, 20]) pushes these predicates down past the (outer)join by identifying the minimal set of values that the correlating attributes can take (the magic set), and computing it in advance. This minimizes the size of other computation but comes at the cost of building the magic set in advance. However, all approaches in the literature assume positive linking conditions (and all examples shown in [5, 13, 19, 20, 18] involve positive linking conditions). Negative linking conditions are not given much attention; it is considered that queries can be rewritten to avoid them, or that they can be dealt with directly using antijoins. But both approaches are problematic. About the former, we point out that the standard translation does not work if nulls are present. Assume, for instance, the condition attr > ALL Q, where Q is a subquery, with attr2 the linked attribute. It is usually assumed that a (left) antijoin with condition attr attr2 is a correct translation of this condition, since for a tuple t to be in the antijoin, it cannot be the case that t.attr attr2, for any value of attr2 (or any value in a given group, if the subquery is correlated). Unfortunately, this equivalence is only true for 2-valued logics, not for the 3-valued logic that SQL uses to evaluate predicates when null is present. The condition attr attr2 will fail if attr is not null, and no value of attr2 is greater than or equal to attr, which may happen because attr2 is the right value or because attr2 is null. Hence, a tuple t will be in the antijoin in the last case above, and t will qualify for the result. Even though one could argue that this can be solved by changing the condition in the antijoin (and indeed, a correct rewrite is possible, but more complex than usually considered ([1]), a larger problem with this approach is that it produces plans with outerjoins and antijoins, which are very difficult to move around on the query tree; even though recent research has shown that outerjoins ([6]) and antijoins ([16]) can be moved under limited circumstances, this still poses a constraint on the alternatives that can be generated for a given query plan -and it is up to the optimizer to check that the necessary conditions are met. Hence, proliferation of these operations makes the task of the query optimizer difficult. As an example of the problems of the traditional approach, assume tables R(A,B,C,D), S(E,F,G,H,I), U(J,K,L), and consider the query Select * From R Where R.A > 10 and R.B NOT IN (Select S.E From S Where S.F = 5 and R.D = S.G and S.H > ALL (Select U.J From U Where U.K = R.C and U.L!= S.I)) Unnesting this query with the traditional approach has the problem of introducing several outerjoins and antijoins that cannot be moved, as well as extra

4 Project(R.*) Select(A>10 & F=5) AJ(B = E) AJ(H =< J) Project(R.*,S.*) Project(S.*,T.*) LOJ(K = C and L!= I) T LOJ(D = G) R S Fig. 1. Standard unnesting approach applied to the example operations. To see why, note that we must outerjoin U with S and R, and then group by the keys of R and S, to determine which tuples of U must be tested for the ALL linking condition. However, should the set of tuples of U in a group fail the test, we cannot throw the whole group away: for that means that some tuples in S fail to qualify for an answer, making true the NOT IN linking condition, and hence qualifying the R tuple. Thus, tuples in S and U should be antijoined separately to determine which tuples in S pass or fail the ALL test. Then the result should separately antijoined with R to determine which tuples in R pass or fail the NOT IN test. In other words, the selection on the condition relating S.H and U.J is no longer a local one, but a global one, as it depends on the next linking condition up the query tree (note that if the linking condition were IN, instead of NOT IN, tuples could be discarded). The result is shown in figure 1, with LOJ denoting a left outer join and AJ denoting an antijoin (note that the tree is actually a graph!). Even though Muralikrishna ([13]) proposes to extract (left) antijoins from (left) outerjoins, we note that in general such reuse may not be possible: here, the outerjoin is introduced to deal with the correlation, and the antijoin with the linking, and therefore they have distinct, independent conditions attached to them (and such approaches transform the query tree in a query graph, making it harder for the optimizer to consider alternatives). Also, magic sets would be able to improve on the above plan pushing selections down to the relations; however, this approach does not improve the overall situation, with outerjoins and antijoins still present. Clearly, what is called for is an approach which uniformly deals with all types of linking conditions without introducing undue complexity. 3 Boolean Aggregates We seek a uniform method that will work for all linking conditions. In order to achieve this, we define Boolean aggregates AND and OR, which take as input a

5 comparison, a set of values (or tuples), and return a Boolean (true or false) as output. Let attr be an attribute, θ a comparison operator and S a set of values. Then AND(S, attr, θ) = attr θ attr2 attr2 S We define AND(, att, θ) to be true for any att, θ. Also, OR(S, attr, θ) = attr θ attr2 attr2 S We define OR(, att, θ) to be false for any att, θ. It is important to point out that each individual comparison is subject to the semantics of SQL s WHERE clause; in particular, comparisons with null values return unknown. The usual behavior of unknown with respect to conjunction and disjunction is followed ([12]). Note also that the set S will be implicit in normal use. When the Boolean aggregates are used alone, S will be the input relation to the aggregate; when used in conjunction with a GROUP-BY operator, each group will provide the input set. Thus, we will write GB A,AND(B,θ) (R), where A is a subset of attributes of the schema of R, B is an attribute from the schema of R, and θ is a comparison operator; and similarly for OR. The intended meaning is that, similar to other aggregates, AND is applied to each group created by the grouping. We use boolean aggregates to compute any linking condition which does not use a (regular) aggregate, as follows: after a join or outerjoin connecting query and subquery is introduced by the unnesting, a group by is executed. The grouping attributes are any key of the relation from the outer block; the Boolean aggregate used depends on the linking condition: for attr θ SOME Q, where Q is a correlated subquery, the aggregate used is OR(attr, θ). For attr IN Q, the linking condition is treated as attr = SOME Q. For EXIST S Q, the aggregate used in OR(1, 1, =) 2. For attr θ ALL Q, where Q is a correlated subquery, the aggregate used is AND(attr, θ). For attr NOT IN Q, the linking condition is treated as attr ALL Q. Finally, for NOT EXIST S Q, the aggregate used is AN D(1, 1, ). After the grouping and aggregation, the Boolean aggregates leave a truth value in each group of the grouped relation. A selection then must be used to pick up those tuples where the boolean is set to true. However, this approach has the same problem as the standard one: we cannot discard a tuple simply because the Boolean test corresponding to the linking condition has failed. Instead, our approach implements a new operator, called mark, which takes as input a condition, and attribute or list of attributes and a constant. For those tuples where the condition holds, the constant is put in the attribute or attributes denoted, overwriting the old value. Formally, let R be a relation, ϕ a condition, X sch(r) (note that X may be a singleton, in which case we simply use the attribute name), and c a constant. Then 2 Note that technically this formulation is not correct since we are using a constant instead of attr, but the meaning is clear.

6 mark ϕ,x,c (R) = {t t R (ϕ(t ) t[r X] = t [R X] t[x] = c) ( ϕ(t ) t = t )} In our case, a constant called an emptymarker will be used (see below); the condition will always be Bool = false, i.e. those cases where the final result of the Boolean aggregate is false. This mark will affect the way the next Boolean aggregate is computed (see below), resulting in the correct result at the end. Note that most of this work can be optimized in implementation, an issue that we discuss in the next subsection. Clearly, implementing a Boolean aggregate is very similar to implementing a regular aggregate. The usual way to compute the traditional SQL aggregates (min, max, sum, count, avg) is to use an accumulator variable in which to store temporary results, and update it as more values come. For min and max, for instance, any new value is compared to the value in the accumulator, and replaces it if it is smaller (larger). Sum and count initialize the accumulator to 0, and increase the accumulator with each new value (using the value, for sum, using 1, for count). Likewise, a Boolean accumulator is used for Boolean aggregates. For ALL, the accumulator is started as true; for SOME, as false. As new values arrive, a comparison is carried out, and the result is ANDed (for AND) or ORed (for OR) with the accumulator. There is, however, a problem with this straightforward approach. When an outerjoin is used to deal with the correlation, tuples in the outer block that have no match appear in the result exactly once, padded on the attributes of the inner block with nulls. Thus, when a group by is done, these tuples become their own group. Hence, tuples with no match actually have one (null) match in the outer join. The Boolean aggregate will then iterate over this single tuple and, finding a null value on it, will deposit a value of unknown in the accumulator. But when a tuple has no matches the ALL test should be considered successful. The problem is that the outer join marks no matches with a null; while this null is meant to be no value occurs, SQL is incapable of distinguishing this interpretation from others, like value unknown (for which the 3-valued semantics makes sense). Note also that the value of attr2 may genuinely be a null, if such a null existed in the original data. Thus, what is needed is a way to distinguish between tuples that have been added as a pad by the outer join. We stipulate that outer joins will pad tuples without a match not with nulls, but with a different marker, called an emptymarker, which is different from any possible value and from the null marker itself. Then a program like the following can be used to implement the AND aggregate: acc = True; while (not (empty(s)){ t = first(s); if (t.attr2!= emptymark) acc = acc AND attr comp attr2; S = rest(s); }

7 Note that this program implements the semantics given for the operator, since a single tuple with the empty marker represents the empty set in the relational framework 3. Note how the use of the mark operator allows us to mark certain tuples as not having past the test of the linking operator; hence, their values are not used in subsequent Boolean aggregates. However, the tuples are still there since they can still qualify. 3.1 Query Unnesting We unnest using an approach that we call quasi-magic. First, at every query level the WHERE clause, with the exception of any linking condition(s), is transformed into a query tree. This allows us to push selections before any unnesting, as in the magic approach, but we do not compute the magic set, just the complementary set ([17, 18, 20]). This way, we avoid the overhead associated with the magic method. Then, correlated queries are treated as in Dayal s approach, by adding a join (or outerjoin, if necessary), followed by a group by on key attributes of the outer relation. At this point, we apply boolean aggregates by using the linking condition, as outlined above. In our previous example, a tree (call it T 1 ) will be formed to deal with the outer block: σ A>10 (R). A second tree (call it T 2 ) is formed for the nested query block at first level: σ F =5 (S). Finally, a third tree is formed for the innermost block: U (note that this is a trivial tree because, at every level, we are excluding linking conditions, and there is nothing but linking conditions in the WHERE clause of the innermost block of our example). Using these trees as building blocks, a tree for the whole query is built as follows: 1. First, construct a graph where each tree formed so far is a node and there is a direct link from node T i to node T j if there is a correlation in the T j block with the value of the correlation coming from a relation in the T i block; the link is annotated with the correlation predicate. Then, we start our tree by left outerjoining any two nodes that have a link between them (the left input corresponding to the block in the outer query), using the condition in the annotation of the link, and starting with graph sources (because of SQL semantics, this will correspond to outermost blocks that are not correlated) and finishing with sinks (because of SQL semantics, this will correspond to innermost blocks that are correlated). Thus, we outerjoin from the outside in. An exception is made for links between T i and T j if there is a path in the graph between T i and T j on length 1. In the example above, our graph will have three nodes, T 1, T 2 and T 3, with links from T 1 to T 2, T 1 to T 3 and 3 The change of padding in the outer join should be of no consequence to the rest of query processing. Right after the application of the Boolean aggregate, a selection will pick up only those tuples with a value of true in the accumulator. This includes tuples with the marker; however, no other operator up the query tree operates on the values with the marker -in the standard setting, they would contain nulls, and hence no useful operation can be carried out on these values.

8 T 2 to T 3. We will create a left outerjoin between T 2 and T 3 first, and then another left outerjoin of T 1 with the previous result. In a situation like this, the link from T 1 to T 3 becomes a condition just another condition when we outerjoin T 1 to the result of the previous outerjoin. 2. On top of the tree obtained in the previous step, we add GROUP BY nodes, with the grouping attributes corresponding to keys of relations in the left argument of the left outerjoins. On each GROUP BY, the appropriate (boolean) aggregate is used, followed by a MARK looking for tuples with false (for Boolean aggregates), putting an emptymarker on the attributes to be considered for the next Boolean aggregate (note that the last one puts an emptymarker on whatever appears in the result, so that it is ignored). Note that these nodes are applied from the inside out, ie. the first (bottom) one corresponds to the innermost linking condition, and so on. 3. A projection, if needed, is placed on top of the tree. The following optimization is applied automatically: every outerjoin is considered to see if it can be transformed into a join. This is not possible for negative linking conditions (NOT IN, NOT EXISTS, ALL), but it is possible for positive linking conditions and all aggregates except COUNT(*) 4. PROJECT(R.*) SELECT(Bool=False,R.*,emptymarker) GB(Rkey,AND(R.B!= S.E)) Mark(Bool=False,S.E,emptymarker) GB(Rkey,Skey, AND(S.H > T.J)) LOJ(K = C and L = I) T LOJ(D = G) SELECT(A>10) Select(F=5) R S Fig. 2. Our approach applied to the example 4 This rule coincides with some of Galindo-Legaria rules ([6]), in that we know that in positive linking conditions and aggregates we are going to have selections that are null-intolerant and, therefore, the outerjoin is equivalent to a join.

9 After this process, the tree is passed on to the query optimizer to see if further optimization is possible. Note that inside each subtree T i there may be some optimization work to do; note also that, since all operators in the tree are joins and outerjoins, the optimizer may be able to move around some operators. Also, some GROUP BY nodes may be pulled up or pushed down ([2, 3, 8, 9]). We show the final result applied to our example above in figure 2. Note that in our example the outerjoins cannot be transformed into joins; however, the group bys may be pushed down depending on the keys of the relation (which we did not specify). Also, even if groupings cannot be pushed down, note that the first one groups the temporal relation by the keys of R and S, while the second one groups by the keys of R alone. Clearly, this second grouping is trivial; the whole operation (grouping and aggregate) can be done in one scan of the input. Compare this tree with the one that is achieved by standard unnesting (shown in figure 1), and it is clear that our approach is more uniform and simple, while using to its advantage the ideas behind standard unnesting. Again, magic sets could be applied to Dayal s approach, to push down the selections in R and S like we did. However, in this case additional steps would be needed (for the creation of the complementary and magic sets), and the need for outerjoins and antijoins does not disappear. In our approach, the complementary set is always produced by our decision to process first operations at the same level, collapsing each query block (with the exception of linking conditions) to one relation (this is the reason we call our approach a quasi-magic strategy). As more levels and more subqueries with more correlations are added, the simplicity and clarity of our approach is more evident. 3.2 Optimizations Besides algebraic optimizations, there are some particular optimizations that can be applied to Boolean aggregates. Obviously, AND evaluation can stop as soon as some predicate evaluates to false (with final result false); and OR evaluation can stop as soon as some predicate evaluates to true (with final result true). The later marking based on Boolean values can be done on the fly: since we know that the selection condition is going to be looking for groups with a value of false, such groups can be marked right after the Boolean aggregate has been computed, in essence pipelining the marking in the GROUP-BY. Note also that by pipelining the marking, we eliminate the need for a Boolean attribute! In our example, once both left outer joins have been carried out, the first GROUP-BY is executed by using either sorting or hashing by the keys of R and S. On each group, the Boolean aggregate AND is computed as tuples come. As soon as a comparison returns false, computation of the Boolean aggregate is stopped, and the group is set aside so that any further tuples belonging to the group are ignored; the output for that group is marked. Groups that do not fail the test are simply added to the output. Once this temporary result is created, it is read again and scanned looking only at values of the keys of R to create the groups; the second Boolean aggregate is computed as before. Also as before, as soon as a comparison returns false, the group is flagged for dismissal. Output is composed

10 of groups that were not flagged when input was exhausted. Therefore, the cost of our plan, considering only operations above the second left outer join, is that of grouping the temporary relation by the keys of R and S, writing the output to disk and reading this output into memory again. In traditional unnesting, the cost after the second left outer joins is that of executing two antijoins, which is in the order of executing two joins. 4 Conclusion and Further Work We have proposed an approach to unnesting SQL subqueries which builds on top of existing approaches. Therefore, our proposal is very easy to implement in existing query optimization and query execution engines, as it requires very little in the way of new operations, cost calculations, or implementation in the back-end. The approach allows us to treat all SQL subqueries in a uniform and simplified manner, and meshes well with existing approaches, letting the optimizer move operators around and apply advanced optimization techniques (like outerjoin reduction and push down/pull up of GROUP BY nodes). Further, because it extends to several levels easily, it simplifies resulting query trees. Optimizers are becoming quite sophisticate and complex; a simple and uniform treatment of all queries is certainly worth examining. We have argued that our approach yields better performance than traditional approaches when negative linking conditions are present. We plan to analyze the performance of our approach by implementing Boolean attributes on a DBMS and/or developing a detailed cost model, to offer further support for the conclusions reached in this paper. References 1. Cao, Bin and Badia, A. Subquery Rewriting for Optimization of SQL Queries, submitted for publication. 2. Chaudhuri, S. ans Shim, K. Including Group-By in Query Optimization, in Proceedings of the 2th VLDB Conference, Chaudhuri, S. ans Shim, K. An Overview of Cost-Based Optimization of Queries with Aggregates, Data Engineering Bulletin, 18(3), Cohen, S., Nutt, W. and Serebrenik, A. Algorithms for Rewriting Aggregate Queries using Views, Proceedings of the Design and Management of Data Warehouses Conference, Dayal, U. Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers, in Proceedings of the VLDB Conference, Galindo-Legaria, C. and Rosenthal, A. Outerjoin Simplification and Reordering for Query Optimization, ACM TODS, vol. 22, n. 1, Ganski, R. and Wong, H. Optimization of Nested SQL Queries Revisited, in Proceedings of the ACM SIGMOD Conference, Goel, P. and Iyer, B. SQL Query Optimization: Reordering for a General Class of Queries, in Proceedings of the 1996 ACM SIGMOD Conference.

11 9. Gupta, A., Harinayaran, V. and Quass, D. Aggregate-Query Processing in Data Warehousing Environments, in Proceedings of the VLDB Conference, Kim, W. On Optimizing an SQL-Like Nested Query, ACM Transactions On Database Systems, vol. 7, n.3, September Materialized Views: Techniques, Implementations and Applications, A. Gupta and I. S. Mumick, eds., MIT Press, Melton, J. Advanced SQL: 1999, Understanding Object-Relational and Other Advanced Features, Morgan Kaufmann, Muralikrishna, M. Improving Unnesting Algorithms for Join Aggregate Queries in SQL, in Proceedings of the VLDB Conference, Ross, K. and Rao, J. Reusing Invariants: A New Strategy for Correlated Queries, in Proceedings of the ACM SIGMOD Conference, Ross, K. and Chatziantoniou, D., Groupwise Processing of Relational Queries, in Proceedings of the 23rd VLDB Conference, Jun Rao, Bruce Lindsay, Guy Lohman, Hamid Pirahesh and David Simmen, Using EELs, a Practical Approach to Outerjoin and Antijoin Reordering, in Proceedings of ICDE Praveen Seshadri, Hamid Pirahesh, T. Y. Cliff Leung Complex Query Decorrelation, in Proceedings of ICDE 1996, pages Praveen Seshadri, Joseph M. Hellerstein, Hamid Pirahesh, T. Y. Cliff Leung, Raghu Ramakrishnan, Divesh Srivastava, Peter J. Stuckey, and S. Sudarshan Cost-Based Optimization for Magic: Algebra and Implementation, in Proceedings of the SIGMOD Conference, 1996, pages Inderpal Singh Mumick and Hamid Pirahesh Implementation of Magic-sets in a Relational Database System, in Proceedings of the SIGMOD Conference 1994, pages Inderpal Singh Mumick, Sheldon J. Finkelstein, Hamid Pirahesh and Raghu Ramakrishnan Magic is Relevant, in Proceedings of the SIGMOD Conference, 1990, pages

Fighting Redundancy in SQL

Fighting Redundancy in SQL Fighting Redundancy in SQL Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY 40292 Abstract. Many SQL queries with aggregated subqueries

More information

An Overview of Cost-based Optimization of Queries with Aggregates

An Overview of Cost-based Optimization of Queries with Aggregates An Overview of Cost-based Optimization of Queries with Aggregates Surajit Chaudhuri Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94304 chaudhuri@hpl.hp.com Kyuseok Shim IBM Almaden Research

More information

Optimized Query Plan Algorithm for the Nested Query

Optimized Query Plan Algorithm for the Nested Query Optimized Query Plan Algorithm for the Nested Query Chittaranjan Pradhan School of Computer Engineering, KIIT University, Bhubaneswar, India Sushree Sangita Jena School of Computer Engineering, KIIT University,

More information

A Nested Relational Approach to Processing SQL Subqueries

A Nested Relational Approach to Processing SQL Subqueries A Nested Relational Approach to Processing SQL Subqueries Bin Cao bin.cao@louisville.edu Antonio Badia abadia@louisville.edu Computer Engineering and Computer Science Department University of Louisville

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

Fighting Redundancy in SQL: the For-Loop Approach

Fighting Redundancy in SQL: the For-Loop Approach Fighting Redundancy in SQL: the For-Loop Approach Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY 40292 July 8, 2004 1 Introduction

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

yqgm_std_rules documentation (Version 1)

yqgm_std_rules documentation (Version 1) yqgm_std_rules documentation (Version 1) Feng Shao Warren Wong Tony Novak Computer Science Department Cornell University Copyright (C) 2003-2005 Cornell University. All Rights Reserved. 1. Introduction

More information

Relational Query Optimization. Highlights of System R Optimizer

Relational Query Optimization. Highlights of System R Optimizer Relational Query Optimization Chapter 15 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Highlights of System R Optimizer v Impact: Most widely used currently; works well for < 10 joins.

More information

Detecting Logical Errors in SQL Queries

Detecting Logical Errors in SQL Queries Detecting Logical Errors in SQL Queries Stefan Brass Christian Goldberg Martin-Luther-Universität Halle-Wittenberg, Institut für Informatik, Von-Seckendorff-Platz 1, D-06099 Halle (Saale), Germany (brass

More information

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical

More information

CS122 Lecture 4 Winter Term,

CS122 Lecture 4 Winter Term, CS122 Lecture 4 Winter Term, 2014-2015 2 SQL Query Transla.on Last time, introduced query evaluation pipeline SQL query SQL parser abstract syntax tree SQL translator relational algebra plan query plan

More information

Optimization of Nested Queries in a Complex Object Model

Optimization of Nested Queries in a Complex Object Model Optimization of Nested Queries in a Complex Object Model Based on the papers: From Nested loops to Join Queries in OODB and Optimisation if Nested Queries in a Complex Object Model by Department of Computer

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Principles of Data Management. Lecture #12 (Query Optimization I)

Principles of Data Management. Lecture #12 (Query Optimization I) Principles of Data Management Lecture #12 (Query Optimization I) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v B+ tree

More information

Optimizing Queries with Aggregate Views. Abstract. Complex queries, with aggregates, views and nested subqueries

Optimizing Queries with Aggregate Views. Abstract. Complex queries, with aggregates, views and nested subqueries Optimizing Queries with Aggregate Views Surajit Chaudhuri 1 and Kyuseok Shim 2 1 Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA 94304, USA 2 IBM Almaden Research Center, 650 Harry Road,

More information

Relational Algebra. Procedural language Six basic operators

Relational Algebra. Procedural language Six basic operators Relational algebra Relational Algebra Procedural language Six basic operators select: σ project: union: set difference: Cartesian product: x rename: ρ The operators take one or two relations as inputs

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007 Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Chapter 19 Query Optimization

Chapter 19 Query Optimization Chapter 19 Query Optimization It is an activity conducted by the query optimizer to select the best available strategy for executing the query. 1. Query Trees and Heuristics for Query Optimization - Apply

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered ynamic ata Warehouse esign? imitri Theodoratos Timos Sellis epartment of Electrical and Computer Engineering Computer Science ivision National Technical University of Athens Zographou 57 73, Athens, Greece

More information

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan Plan for today Query Processing/Optimization CPS 216 Advanced Database Systems Overview of query processing Query execution Query plan enumeration Query rewrite heuristics Query rewrite in DB2 2 A query

More information

Query Optimization in Distributed Databases. Dilşat ABDULLAH

Query Optimization in Distributed Databases. Dilşat ABDULLAH Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of

More information

Evaluation of Ad Hoc OLAP : In-Place Computation

Evaluation of Ad Hoc OLAP : In-Place Computation Evaluation of Ad Hoc OLAP : In-Place Computation Damianos Chatziantoniou Department of Computer Science, Stevens Institute of Technology damianos@cs.stevens-tech.edu Abstract Large scale data analysis

More information

Chapter 4: SQL. Basic Structure

Chapter 4: SQL. Basic Structure Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Joined Relations Data Definition Language Embedded SQL

More information

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors: Query Optimization atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

Query processing and optimization

Query processing and optimization Query processing and optimization These slides are a modified version of the slides of the book Database System Concepts (Chapter 13 and 14), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.

More information

Optimization of Queries in Distributed Database Management System

Optimization of Queries in Distributed Database Management System Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors: Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 7 - Query optimization Announcements HW1 due tonight at 11:45pm HW2 will be due in two weeks You get to implement your own

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization References Access path selection in a relational database management system. Selinger. et.

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework

More information

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure. SL Lecture 4 SL Chapter 4 (Sections 4.1, 4.2, 4.3, 4.4, 4.5, 4., 4.8, 4.9, 4.11) Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Modification of the Database

More information

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation. Query Processing QUERY PROCESSING refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions

More information

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes Schema for Examples Query Optimization (sid: integer, : string, rating: integer, age: real) (sid: integer, bid: integer, day: dates, rname: string) Similar to old schema; rname added for variations. :

More information

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML) Since in the result relation each group is represented by exactly one tuple, in the select clause only aggregate functions can appear, or attributes that are used for grouping, i.e., that are also used

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

Query Optimization. Introduction to Databases CompSci 316 Fall 2018 Query Optimization Introduction to Databases CompSci 316 Fall 2018 2 Announcements (Tue., Nov. 20) Homework #4 due next in 2½ weeks No class this Thu. (Thanksgiving break) No weekly progress update due

More information

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15 Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 6 Lifecycle of a Query Plan 1 Announcements HW1 is due Thursday Projects proposals are due on Wednesday Office hour canceled

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch Advanced Databases Lecture 4 - Query Optimization Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

2. Make an input file for Query Execution Steps for each Q1 and RQ respectively-- one step per line for simplicity.

2. Make an input file for Query Execution Steps for each Q1 and RQ respectively-- one step per line for simplicity. General Suggestion/Guide on Program (This is only for suggestion. You can change your own design as needed and you can assume your own for simplicity as long as it is reasonable to make it as assumption.)

More information

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University Lecture 3 SQL Shuigeng Zhou September 23, 2008 School of Computer Science Fudan University Outline Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6) CPSC404, Laks V.S. Lakshmanan 1 What you will learn from this lecture Cost-based query optimization (System R) Plan space

More information

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation! Chapter 3: Formal Relational Query Languages CS425 Fall 2013 Boris Glavic Chapter 3: Formal Relational Query Languages Relational Algebra Tuple Relational Calculus Domain Relational Calculus Textbook:

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007 Relational Query Optimization Yanlei Diao UMass Amherst March 8 and 13, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12 SQL QUERY EVALUATION CS121: Relational Databases Fall 2017 Lecture 12 Query Evaluation 2 Last time: Began looking at database implementation details How data is stored and accessed by the database Using

More information

Implementation of Relational Operations: Other Operations

Implementation of Relational Operations: Other Operations Implementation of Relational Operations: Other Operations Module 4, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Simple Selections SELECT * FROM Reserves R WHERE R.rname < C% Of the form σ

More information

Why SQL? SQL is a very-high-level language. Database management system figures out best way to execute query

Why SQL? SQL is a very-high-level language. Database management system figures out best way to execute query Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do rather than how to do it Avoid a lot of data-manipulation details needed in procedural languages like C++ or Java Database

More information

CS122 Lecture 10 Winter Term,

CS122 Lecture 10 Winter Term, CS122 Lecture 10 Winter Term, 2014-2015 2 Last Time: Plan Cos0ng Last time, introduced ways of approximating plan costs Number of rows each plan node produces Amount of disk IO the plan must perform Database

More information

More on SQL Nested Queries Aggregate operators and Nulls

More on SQL Nested Queries Aggregate operators and Nulls Today s Lecture More on SQL Nested Queries Aggregate operators and Nulls Winter 2003 R ecom m en ded R eadi n g s Chapter 5 Section 5.4-5.6 http://philip.greenspun.com/sql/ Simple queries, more complex

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

Chapter 6: Formal Relational Query Languages

Chapter 6: Formal Relational Query Languages Chapter 6: Formal Relational Query Languages Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 6: Formal Relational Query Languages Relational Algebra Tuple Relational

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

More SQL: Complex Queries, Triggers, Views, and Schema Modification

More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser.

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. Query Processing: an Overview Query Processing in a Nutshell QUERY Parser Preprocessor Logical Query plan generator Logical query plan Query rewriter

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination The Extended Algebra Duplicate Elimination 2 δ = eliminate duplicates from bags. τ = sort tuples. γ = grouping and aggregation. Outerjoin : avoids dangling tuples = tuples that do not join with anything.

More information

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide

More information

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested

More information

Uncertain Data Models

Uncertain Data Models Uncertain Data Models Christoph Koch EPFL Dan Olteanu University of Oxford SYNOMYMS data models for incomplete information, probabilistic data models, representation systems DEFINITION An uncertain data

More information

Algebraic XQuery Decorrelation with Order Sensitive Operations

Algebraic XQuery Decorrelation with Order Sensitive Operations Worcester Polytechnic Institute Digital WPI Computer Science Faculty Publications Department of Computer Science 2-2005 Algebraic XQuery Decorrelation with Order Sensitive Operations Song Wang Worcester

More information

Chapter 3: SQL. Chapter 3: SQL

Chapter 3: SQL. Chapter 3: SQL Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Database Languages and their Compilers

Database Languages and their Compilers Database Languages and their Compilers Prof. Dr. Torsten Grust Database Systems Research Group U Tübingen Winter 2010 2010 T. Grust Database Languages and their Compilers 4 Query Normalization Finally,

More information

Data Warehousing and Data Mining The Generalized Multi-dimensional Join

Data Warehousing and Data Mining The Generalized Multi-dimensional Join Data Warehousing and Data Mining The Generalized Multi-dimensional Join 1. Definition of the GMD-join 2. Algorithm for the GMD-join 3. Evaluating subqueries in an OLAP context 4. Experimental evaluation

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Fundamentals of Database Systems

Fundamentals of Database Systems Fundamentals of Database Systems Assignment: 4 September 21, 2015 Instructions 1. This question paper contains 10 questions in 5 pages. Q1: Calculate branching factor in case for B- tree index structure,

More information