A Nested Relational Approach to Processing SQL Subqueries

Size: px
Start display at page:

Download "A Nested Relational Approach to Processing SQL Subqueries"

Transcription

1 A Nested Relational Approach to Processing SQL Subqueries Bin Cao Antonio Badia Computer Engineering and Computer Science Department University of Louisville Louisville, KY 4292 ABSTRACT One of the most powerful features of SQL is the use of nested queries. Most research work on the optimization of nested queries focuses on aggregate subqueries. However, the solutions proposed for non-aggregate subqueries are still limited, especially for queries having multiple subqueries and null values. In this paper, we show that existing approaches to queries containing non-aggregate subqueries proposed in the literature (including rewrites) are not adequate. We then propose a new efficient approach, the nested relational approach, based on the nested relational algebra. Our approach directly unnests non-aggregate subqueries using hash joins, and treats all subqueries in a uniform manner, being able to deal with nested queries of any type and any level. We report on experimental work that confirms that existing approaches have difficulties dealing with non-aggregate subqueries, and that our approach offers better performance. We also discuss some possibilities for algebraic optimization and the issue of integrating our approach in a relational database system. 1. INTRODUCTION SQL is the standard language for data retrieval and manipulation in relational database systems. One of the most powerful features of SQL is nested queries (queries having subqueries). Theoretically, a query can have an arbitrary number of subqueries nested within it. A subquery can be either aggregate or non-aggregate. An aggregate subquery has an aggregate function in its SELECT clause; it always returns a single value as the result. A non-aggregate subquery is linked to the outer query by one of the following operators: EXISTS, NOT EXISTS, IN, NOT IN, θ SOME/ANY, and θ ALL, where θ {<,, >,, =, }; the result is either a set of values or empty. Since it is usually This research was sponsored by NSF under grant IIS Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or commercial advantage, and that copies bear this notice and the full citation on the rst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speci c permission and/or a fee. SIGMOD 25 June 14-16, 25, Baltimore, Maryland, USA Copyright 25 ACM /5/6 $5.. inefficient to directly execute nested queries in their original form [1], query unnesting, i.e. rewriting nested queries into flat forms, has been proposed as a better solution [1, 8, 5, 13, 14, 18]. Unfortunately, most proposed approaches concentrate on aggregate subqueries; optimization of nonaggregate subqueries has some limitations. Some proposed approaches are derived from those for aggregate subqueries [8, 6, 1]. The only solutions proposed for non-aggregate subqueries are limited [5, 3, 2], especially for queries with multiple subqueries and null values. The common problems of these proposed approaches are two fold: first, queries can not be unnested directly and transformations are required; second, each operator is evaluated in a different manner. In this paper, we focus on non-aggregate subqueries. We propose a new, efficient approach, the nested relational approach, for evaluating nested queries containing non-aggregate subqueries in a uniform manner. To directly unnest nonaggregate subqueries, we use the nested relational algebra instead of the standard relational algebra. The motivation of using the nested relational algebra is based on the observation that the subquery result is either a set of values or empty, which can be considered as a set-valued attribute in the nested relational model. Conceptually, our nested relational approach unnests a nested query from top-down, and then uses our extended nested relational algebra to compute the predicates associated with the subquery from bottomup. We will show that our approach not only allows unnesting non-aggregate subqueries directly without transformation, but also allows each operator to be evaluated in a uniform manner. Furthermore, our approach does not require indexes; only hash joins are necessary. Finally, being algebraic, our approach has clear semantics and can be further optimized. The rest of this paper is organized as follows: Section 2 summarizes related work and gives the motivation. Section 3 defines the nested relational model and our extended nested relational algebra. Section 4 describes the original algorithm for evaluating queries having non-aggregate subqueries and some special cases for optimization. Section 5 shows our experiments. Section 6 concludes the paper. 2. RELATED WORK AND MOTIVATION Significant research efforts have been devoted to the optimization of nested queries since the 198 s. Kim [1] was motivated by the observation that executing correlated nested queries using the traditional nested iteration method can be 191

2 very inefficient. As a solution, Kim developed query transformation algorithms to rewrite nested queries into equivalent, flat queries which can be processed more efficiently. Several problems of these algorithms were later pointed out and solved in [8]. Dayal [5] refined and extended all of the previous optimization work to a unified approach for processing queries that contain nested subqueries, aggregates and quantifiers, which enables unnesting queries with more than one nesting level. Muralikrishna [13] extended Dayal s approach to enable processing the queries that have an arbitrary number of blocks nested within any given block. Finally, work on magic decorrelation optimization within the logic programming community was brought to SQL optimization [18, 17]. Unfortunately, most proposed strategies focus on aggregate subqueries. Unnesting non-aggregate subqueries, especially those with certain operators, still pose problems. Before presenting the problems, we first describe the terminology used in this paper. We introduce the term linking predicate to refer to the predicate that connects a subquery and an outer query. In a linking predicate, the attribute of the outer query is called the linking attribute, the attribute of the subquery is called the linked attribute, and the operator is called the linking operator. We call EXISTS, SOME/ANY and IN positive linking operators, and NOT EXISTS, ALL and NOT IN negative linking operators. If a query has both positive and negative linking operators, we say it has mixed linking operators. If a subquery contains a predicate which references the relation in the outer query, we say the subquery is correlated to the outer query, and the predicate is called the correlated predicate. The attribute of the outer query in a correlated predicate is called the correlating attribute, and the attribute of the subquery is called the correlated attribute. If a query has no nested subqueries, we call it a flat query. If a query has nested subqueries, but they are all flat, we call it a one-level nested query; if a query has nested subqueries, but they are all onelevel, we call it a two-level nested query, and so on. Since SQL is a block-structured language, the terms inner query block and outer query block are used interchangeably with subquery and outer query respectively in this paper. As an example, assume relations R(A, B, C, D), S(E, F, G, H, I), T (J, K, L), and consider the following query: Query Q: select R.B, R.C, R.D from R where R.A > 1 and R.B not in (select S.E from S where S.F = 5 and R.D = S.G and S.H > all (select T.J from T where T.K = R.C and T.L <> S.I)) Query Q is a two-level nested query. From top-down, the second query block is correlated to the first query block by the predicate R.D = S.G, and the third query block is correlated to the other two query blocks by the predicates T.K = R.C and T.L <> S.I. It has two negative linking operators, NOT IN and ALL. Unnesting Query Q using existing techniques presents several problems. First, it can not be unnested directly; instead, rewriting predicates NOT IN and ALL is required. However, rewriting such predicates may not preserve semantics when null values are present. Because of null values, R.A >ALL (select S.B...) is not equal to an antijoin of R and S on the condition R.A <= S.B. Furthermore, R.A >ALL (select S.B...) is not equal to R.A > (select max(s.b)...) or to = (select count(s.b)...) with the condition R.A <= S.B added into the subquery. Readers can convince themselves by assuming that R.A is 5 and S.B is {2, 3, 4, null}. Second, even when rewriting is possible, the resulting query tree may have several outer joins and antijoins that cannot be moved (except under certain circumstances; see [7, 16]), as well as extra operations. Even though Muralikrishna [14] proposed to extract (left) antijoins from (left) outer joins, we note that in general such reuse may not be possible: here, the outer join is introduced to deal with the correlation, and the antijoin with the linking; therefore, they have distinct, independent conditions attached to them (and such approaches transform the query tree in a query graph, making it harder for the optimizer to consider alternatives). Also, magic decorrelation [18, 17] would be able to improve the above plan by pushing selections down to the relations; however, this approach does not improve the overall situation, with outer joins and antijoins still present. Third, (outer) joins are introduced to deal with correlations, which means that all correlated subqueries become one query block. However, when dealing with negative linking predicates, this creates a problem. To see why, note that in Query Q we have to outer join R with S and T to determine which tuples of T must be tested for the ALL linking predicate. However, if the set of tuples of T related to a tuple in R and S fail the test, we can not throw the whole set away. The reason is that some tuples in S fail to qualify for an answer, making true the NOT IN linking predicate, and hence qualifying the R tuple. Thus, tuples in S and T should be antijoined separately to determine which tuples in S pass or fail the ALL test. Then the result should be separately antijoined with R to determine which tuples in R pass or fail the NOT IN test. Different approaches are developed in [1, 6, 2, 3]. They all involve extending the standard relational algebra. In [1], a special operator called the multidimensional join (MDjoin) is used to join two relations, group the result by one of them and compute aggregates on different partitions of the resulting join. Queries having non-aggregate subqueries are rewritten as counts. However, this approach also suffers from the problem above, and it requires a double join of two relations (although if implemented with care, the approach is efficient). Finally, the MD-join only commutes with other joins and selections in a selective manner. Similar to [1], non-aggregate subqueries are rewritten as aggregate subqueries with counts in [6], then transformed queries are evaluated by a second-order APPLY operator. While such operator is very powerful, it may not yield the best possible plan for each case. In [2], queries having non-aggregate subqueries are computed by using a Boolean aggregate, which applies a condition to a set of tuples by applying the condition to each tuple and computing the conjunction or disjunction of the resulting truth values. In [2], tuples that fail the test are not discarded but marked and kept for further processing. In [3], queries having non-aggregate sub- 192

3 queries are evaluated by transforming nested queries into flat queries first, and then flat queries are incrementally computed. Their transformation leads to a Cartesian product followed by difference operations, which is likely not to be an efficient approach. In conclusion, existing approaches either have difficulties in dealing with mixed and negative linking operators, or call for special operations. What is needed is an approach which uniformly deals with all types of linking predicates without introducing undue complexity. We propose to use the nested relational algebra because it explicitly represents the intuition that for a given tuple, a non-aggregate subquery provides a set of values (perhaps empty). As a consequence, linking predicates become set predicates which can be represented in a straightforward manner. 3. DEFINITION OF EXTENDED NESTED RELATIONAL ALGEBRA Several well-known, basically equivalent definitions of the nested relational algebra have been introduced [19, 15, 11]. For the purpose of the nested relational approach, the definitions need to be extended and slightly modified. Definition 1. Let U = {A 1,..., A n} be a finite set of attributes. A schema over U and the depth of the schema is defined recursively as follows: 1. If A 1,..., A n are atomic attributes from U, then R = (A 1,..., A n) is a (flat) schema over U with the name R. The depth of the schema R is, denoted by depth(r) =. 2. If A 1,..., A n are atomic attributes from U, R 1,..., R m are distinct names of schema with a set of attributes (denoted by attr(r 1),..., attr(r m)) such that {A 1,..., A n} and {attr(r 1),..., attr(r m)} are pairwise disjoint, then R = (A 1,..., A n, R 1,..., R m) is a (nested) schema with the name R. R 1,...R m are called subschemas. The depth of the schema R is defined as: depth(r) = 1 + max m i=1depth(r i). Definition 2. Let R denote a schema over a finite set U of attributes. The domain of R, denoted by DOM(R), is defined recursively as follows: 1. If R = (A 1,..., A n), where A i (1 i n) are atomic attributes, then DOM(R)=DOM(A 1)... DOM(A n), where denotes Cartesian product. 2. If R = (A 1,...A n, R 1,..., R m), where A i (1 i n) are atomic attributes and R j (1 j m) are subschemas nested within R, then DOM(R)=DOM(A 1)... DOM(A n) 2 DOM(R 1)... 2 DOM(Rm), where denotes Cartesian product and 2 DOM(R j ) denotes the power set of the set DOM(R j)(1 j m). A nested tuple over R is an element of DOM(R). A nested relation r over R is a finite set of nested tuples over R, which is denoted by: sch(r) = R. The nested relational algebra has the standard operations of the relational algebra: selection(σ), projection(π), Cartesian product( ), join( ), union( ), intersection( ), difference( ), plus the nest and unnest operators. Here we modify this algebra slightly to suit our purpose, redefining nest and modifying selection. Definition 3. Let R = (A 1,..., A n) be a flat relational schema, where A i (1 i n) are atomic attributes. Let attr(r) denote the names of all attributes, that is, attr(r) = {A 1,..., A n}. Let r be a flat relation over R, that is, sch(r) = R. Let N 1 and N 2 be two disjoint subsets of attr(r). Then the nest of r by N 1 keeping N 2, υ N1,N 2 (r), is defined as: υ N1,N 2 (r) := {t t r t [N 1] = t[n 1] t [N 2] = {t [N 2] t r t [N 1] = t[n 1]}} N 1 is called the set of nesting attributes, N 2 the set of nested attributes. Note that in the traditional definition, only N 2 is specified, and N 1 is understood as attr(r) attr(n 2). The definition presented here has an implicit projection of N 1 N 2 and will be more convenient for our approach; it also highlights the connection between nesting and grouping. Note also that, for simplicity (and since this will be our most frequent use) we have defined nesting over flat (depth(r) = ) relations only; however, the definition can be extended to the general case without problems. The unnest operator can be defined as usual to be the inverse of nest. Definition 4. Let R(A 1,..., A n, R 1,..., R m) be a nested relational schema, where A i (1 i n) are atomic attributes, R j (1 j m) are subschemas. Let r be a nested relation over R, that is, sch(r) = R. Let attr(r j) denote the names of attributes in R j (1 j m). Then a linking predicate over r is defined as one of: AθL{B}, where A A i (1 i n), B attr(r j) (1 j m), θ {<,, >,, =, }, L {SOME/ANY, ALL}. {B} θ, where θ {=, } and B as above. The semantics of each predicate are obvious. Note that (again for simplicity) we only define the linking predicate over one-level (depth(r) = 1) nested relations. For a multi-level (depth(r) 2) nested relation, A and B might belong to the subschemas with depth d and d + 1 respectively. Thus, the above definition can still be used. Definition 5. Let r be a relation over schema R, that is, sch(r) = R. The selection of r with respect to C, σ C(r), where C is a usual predicate or a linking predicate, is defined as usual: σ C(r) := {t t r C(t) is true} Let attr(r) denote the names of all attributes in R, A a subset of attr(r), and C a usual predicate or a linking predicate. The pseudo-selection of r with respect to C keeping A, σ C,A(r), is defined as: σ C,A(r) := {t t r((c(t) is true t = t) (C(t) is false t [A] = {null} t [attr(r) A] = t[attr(r) A]))} Thus, a pseudo-selection keeps all tuples that pass the condition (as the usual selection); for the tuples that fail, it keeps the tuple, but it pads the attributes in A with null values. In this paper, either σ or σ is called a linking selection if C is a linking predicate. The linking selection with σ follows the usual definition; the linking selection with σ applies the pseudo-selection definition. As usual, the definitions of join, semijoin and outer join can carry out to nested algebra from regular (flat) algebra. To help understand the above definitions, we give an example. 193

4 Example 1. Assume R(A, B, C, D), S(E, F, G, H, I), T (J, K, L) are relations shown in figure 1(a), 1(b), 1(c), where R.D, S.I and T.L are primary keys for each relation. The relation T emp1 shown in figure 1(d) is obtained by the projection of R.B, R.C, R.D, S.E, S.H, S.I, T.J and T.L on the result of a left outer join of R and S on the predicate R.D = S.G, followed by a left outer join with T on the predicates T.K = R.C and T.L <> S.I. A B C D(#) null null 5 4 (a) Relation R J K L(#) null 4 2 (c) Relation T E F G H I(#) null 4 (b) Relation S B C D(#) E H I(#) J L(#) null 2 3 null null null null null null null 4 null null B C D(#) E H I(#) J L(#) null 2 3 null null null null null null null 4 null null (a) T emp2 = υ {R.B,R.C,R.D,S.E,S.H,S.I},{T.J,T.L} (T emp1) B C D(#) E H I(#) null null null null 2 3 null null null null null 4 (b) T emp3 = σ S.H>ALL{T.J} T.L is null,{s.e,s.h,s.i}(t emp2) B C D(#) E H I(#) null 2 3 null null null null null 4 (c) T emp4 = σ S.H>ALL{T.J} T.L is null (T emp2) Figure 2: Example of nest and linking selection (d) T emp1 = π R.B,R.C,R.D,S.E,S.H,S.I,T.J,T.L((R R.D=S.G S) T.K=R.C T.L<>S.I T ) Figure 1: Base Relations The relation T emp2 shown in figure 2(a) is a one-level nested relation resulting from nesting by {R.B, R.C, R.D, S.E, S.H, S.I}, keeping all of {T.J, T.L}. The reason why we keep the primary keys of R, S and T is that they will be used to identify if the corresponding tuple is empty. We assume that each relation has a unique non-null attribute served as a primary key. In our case, a primary key with the null value must be padded by a left outer join operation. If a tuple does not match the join condition, the left outer join operation will pad null values on its attributes including the primary key. Thus, the tuple with the primary key being null can be considered empty. Another reason we keep the primary keys of R, S and T is that we have to distinguish between an empty tuple with all attributes being null and a tuple with a certain attribute originally null. As a result, our extended relational algebra can be used on relations containing null values without any problem. The relation T emp3 shown in figure 2(b) is the projection of R.B, R.C, R.D, S.E, S.H and S.I on the result of the linking selection σ S.H>ALL{T.J} T.L is null,{s.e,s.h,s.i}(t emp2). Note that it is a pseudo-selection. A negative linking predicate returns true if the subquery result is empty, which is identified by the primary key being null. Thus, we have additional condition T.L is null doing linking selection. Under our definition, even though the linking selection over the second tuple returns false, we can not discard this tuple. We have to keep this tuple by padding null values on S.E, S.H and S.I. The linking selection over all other tuples returns true, thus we keep these tuples in their original forms. One notable point is that for the fourth and the fifth tuples, although the linking selection compares S.H(null) to {T.J}({null}), the linking selection returns true because the result of the condition T.L is null is true. From this example, we can see that linking selection only compares the linking attribute to the linked attribute whose corresponding primary key is not null. The result of comparison is based on the standard definition. The relation T emp4 shown in figure 2(c) is obtained by the projection of R.B, R.C, R.D, S.E, S.H and S.I on the result of the linking selection σ S.H>ALL{T.J} T.L is null (T emp2). The linking selection over the second tuple returns false, thus we discard this tuple. All other tuples pass the linking selection and become the result. Note that the projection operation in each subfigure is omitted. 194

5 4. THE NESTED RELATIONAL APPROACH TO PROCESSING SUBQUERIES The motivation of the nested relational approach is based on the observation that the linking predicate is actually a set computation. The basic idea of the nested relational approach is straightforward: a nested query is unnested from top-down first, and then the linking predicates are computed from bottom-up, which requires: (1) the subquery result to be a set (perhaps empty) and (2) a comparison between a single-valued attribute and a set-valued attribute. Such operations can be achieved by the nest operator and the linking selection operator defined in the previous section. In our approach, non-correlated subqueries are executed once, and the result is used by every tuple (virtual Cartesian product). Correlated subqueries can be executed and then connected to outer queries by join or outer join operations. We first present an original approach and then introduce some optimizations. 4.1 Original approach For a nested query with n query blocks, in each query block, from top-down, let R i (1 i n) denote the relations in the FROM clause; L i (1 i n 1) denote the linking predicate between blocks i and i+1; C ij (2 i n and 1 j n) represent the correlated predicate(s) between block i and j (i > j), and i (1 i n) represent the predicates in the WHERE clause except L i and C i. Our algorithm proceeds in three steps. First, we reduce each query block to one relation by doing all operations in the WHERE clause except linking predicate and correlated predicate(s), i.e., at each block i, produce T i = σ i (R i) 1. Note that this is equivalent to producing the complementary set in the magic decorrelation technique [18, 17]; however, we do not produce a magic set. Second, we create a tree expression for the query as follows: walk through the query in Depth-First, Left-to-Right order; create one node for each query block. We label each node with the corresponding T i. Between any two adjacent nodes T i and T i+1, we add an edge directed from T i to T i+1 labeled with the linking predicate L i. If T i+1 is correlated to T i, we add the correlated predicate C (i+1)i to the edge. If T i is correlated to a non-adjacent node T j (i > j), we add the correlated predicate C ij to the edge between T i and T i 1 if all edges between T j and T i have been labeled with correlated predicates; otherwise, we add an edge directed from T j to T i labeled with the correlated predicate C ij. The root is labeled by the name of the outermost query block, leaves are labeled by the name of innermost query blocks, other nodes are labeled by the name of the middle query blocks. A node is called a subroot if it has more than one children. All nodes under a subroot are called a subtree of the subroot. For a given node n, let name(n) be the T i that serves as name of the node; link C(n, m) be the C ij (if one exists) and link L(n, m) be the L i, which label the link between n and one of his children m. Third, we compute(root, T 1). The algorithm, shown as algorithm 1, recursively goes down the tree in depth-first manner, creating a single relation through the use of join or outer join. Note that the structure created in the previous step may be a graph. In this step, we restrict our attention to edges labeled with correlated predicates, in which case 1 We assume all relations are connected, i.e. no Cartesian product present. we get a maximal spanning query tree for the graph (when all query blocks are correlated). When a leaf is reached, the algorithm goes bottom-up nesting the relation obtained and applying a corresponding linking selection to reduce the relation. When a subroot is found on the way down, the algorithm chooses a child to continue towards the leaves; on the way up, however, the algorithm will go down again until all paths in the subtree of the subroot have been covered before proceeding up past the subroot. We do not provide a formal proof for the correctness of algorithm 1 due to lack of space. Basically, we unnest a query in a traditional way, and then nest by each tuple of the outer query, which preserves tuple iteration semantics. Then, the linking selection operator computes linking predicates in a straightforward manner. Algorithm 1 Compute(node,relational-expression) Require: : a nested query with non-aggregate subqueries Ensure: : the result of a query 1: PROCEDURE compute(node, rel) { 2: if (node is a leaf) then 3: return; 4: else 5: for each n children(node) do 6: T i = name(n); 7: C ij = link C(node, n); 8: L i = link L(node, n); 9: if (C ij ) then 1: rel = rel Cij T i or rel = rel Cij T i; 11: else 12: rel = rel T i; 13: end if 14: compute(n, rel); 15: rel = υ {T1.,...},{T i. }(rel); 16: rel = σ Li (rel) or σ L i (rel); 17: end for 18: end if 19: } The algorithm works equally for nested linear queries and nested tree queries 2. In the first case, there is only one child for each node; the net effect is that of going down the tree joining or outer joining, or using the Cartesian product when there is no correlation (this Cartesian product is really virtual), and then up nesting and evaluating the predicates. In the second case, each subroot makes us go down all paths before continuing on the way up. To show how the original nested relational approach processes a nested query, we give an example. Example 2. Consider Query Q in section 2. The tree expression for this query is shown in figure 3(a). To process this query, we would start from root node T 1: R, performing a left outer join of R and S on the correlated predicate R.D = S.G. Since T 2: S is not a leaf, we keep performing a left outer join with T on the correlated predicates T.K = R.C and T.L <> S.I. Node T 3: T is a leaf node, thus we compute the linking predicate L 2: S.H >ALL {T.J}, which 2 A nested linear query is a query in which at most one query block is nested within any query block. A nested tree query is a query in which there is at least one query block which has two or more query blocks nested within it at the same level. 195

6 is achieved by nesting {R.B, R.C, R.D, S.E, S.H, S.I}, keeping all of {T.J, T.L}, followed by the projection of R.B, R.C, R.D, S.E, S.H, S.I and the linking selection S.H >ALL {T.J}. Then, it goes back to node T 2: S. Since there is no other children under node T 2: S, we compute the linking predicate R.B ALL {S.E} (the NOT IN linking operator is equal to ALL ) by nesting {R.B, R.C, R.D}, keeping all {S.E, S.I}, followed by the projection of R.B, R.C, R.D and the linking selection R.B ALL {S.E}, which goes back to root T 1: R. The final result is obtained by the projection of the desired attributes. Note that we use both σ and σ linking selection in this example. Generally, σ is used for computing negative or mixed linking predicates; σ is used for computing the last unfinished linking predicate, or for all unfinished linking predicates being positive. We use a query tree to represent the process of a query evaluation, in which π denotes projection; left outer join; σ or σ (linking) selection; υ nest. The query tree for processing Query Q is shown in figure 3(b) (intermediate projections are omitted). T1: R L1: R.B ALL {S.E} C21: R.D = S.G T2: S L2: S.H > ALL {T.J} C32: T.L <> S.I C31: T.K = R.C T3: T (a) Tree Expression R.D=S.G σ R.A>1 σ S.F =5 R S π R.B,R.C,R.D (b) Query Tree σ R.B ALL{S.E} S.I is null υ {R.B,R.C,R.D},{S.E.S.I} σ S.H>ALL{T.J} T.L is null,{s.e,s.h,s.i} υ {R.B,R.C,R.D,S.E,S.H,S.I},{T.J,T.L} T.K=R.C T.L<>S.I Figure 3: The Nested Relational Approach Applied to Query Q 4.2 Optimizations Algorithm 1 can evaluate nested queries containing nonaggregate subqueries with any type of linking predicates and any level of nesting in a uniform manner. However, there are several alternatives and optimizations possible. We briefly discuss some of the more interesting ones. T Reduce nesting operations In the original approach, we compute each linking predicate by using one nesting operation followed by one linking selection. However, examining the parameters of the nest operator, it is clear that higher levels nest by a prefix of the nesting attributes used by lower levels, and use part of the postfix of those nesting attributes as the nested attributes. For instance, see figure 3(b). To compute the linking predicate S.H >ALL {T.J}, we nest by the nesting attributes {R.B, R.C, R.D, S.E, S.H, S.I}; next to compute R.B ALL {S.E}, we nest by the prefix of the previous nesting attributes {R.B, R.C, R.D}, and choose part of the postfix of the previous nesting attributes {S.E, S.I} as the nested attributes. This advantageous feature gives rise to an optimization of the original approach: doing first all nesting operations in a single step, followed by executing the linking selections one by one, instead of intertwining nesting and linking selection. This gives a feasible and efficient implementation due to the fact that only the deepest or first nesting involves true (physical) reordering of the tuples in the relation, all others are conceptual. For example, the nest and the linking selection operations in figure 3(b) can be rewritten as two consecutive nests followed by two linking selections. Even there still exist two nest operators, the operations can be done in one step. Note that the result of two consecutive nesting is a two-level nested relation. As pointed out in section 3, computing the linking predicate S.H >ALL {T.J} only involves S and T, which still can be considered as a linking selection over a one-level nested relation resulted from the projection of S and T on the two-level nested relation. Similarly, computing the linking predicate R.B ALL {S.E} can be regarded as the projection of R and S on the two-level nested relation Pipelining Pipelining is possible in the context of our algorithm. In particular, it seems clear that it should be possible to pipeline the linking selection with the nesting that is immediately adjacent to it; in some cases, the condition may be evaluated at the same time that the nesting is taking place. Thus, the cost of such plans can be further reduced even if no modification to the plan takes place Linear correlation Algorithm 1 could be further optimized for some special queries to gain better performance. One such case is linear correlation. A nested query is linear correlated if each inner query block is only correlated to its adjacent outer query block. Since the evaluation of the outer query block only depends on its adjacent inner query block, the linear correlated queries can be processed from bottom-up instead of top-down. For instance, Query Q becomes a linear correlated query by getting rid of one of the correlated predicates T.K = R.C in the innermost query block and changing T.L <> S.I to T.L = S.I. Instead of from top-down, this query can be efficiently processed from bottom-up by performing nesting on the result of a left outer join of S and T with corresponding selections pushed down, followed by computing the linking predicate S.H >ALL {T.J}; then nesting again on the result of a left outer join of R and the previous resulting tuples, followed by computing the linking predicate R.B ALL {S.E}. Note that pipelining can be applied for computing the linking predicate and nest- 196

7 ing. Clearly, this strategy benefits from small intermediate results, since only qualified tuples participate in further (outer) join operations Push down nesting Another idea is to push nesting operations down past (outer) join. The original nested relational approach uses the standard approach to unnest the subquery, which may produce a very large intermediate relation for later processing. To avoid this problem, we can push the nesting operation down before the (outer) join. This is not always possible; the conditions under which it can be done are similar to the conditions to push down a group-by operator past a join [9]. In particular, one situation in which the push down is possible is when the nesting attribute is also the attribute in the condition of the join, and this condition is an equality. In symbols, if R and S are flat relations and B, C sch(s), A sch(r), υ {B},{C} (R A=C S) = R (υ {B},{C} S). This is a common pattern in our approach. For example, consider Query Q with the third query block removed. It can be processed as follows: first, nest the relation S using υ {S.G},{S.E,S.I} with the selection of S.F = 5 pushed down. Note that these two steps can be pipelined. Then R left outer joins the resulting one-level nested relation on the predicate R.D = S.G, followed by computing the linking predicate R.B ALL {S.E}. The final result is obtained by the projection of the desired attributes Positive linking operators Although the nested relational approach is focused on dealing efficiently with mixed and negative linking operators, we would like the approach to be also efficient for positive linking operators. However, existing approaches have a very efficient way for evaluating positive linking operators. In the case of IN, for instance, the linking predicate is transformed into a semijoin. However, our approach would create an outer join, a nest and a selection, that is, an expression of σ A= SOME{B} (υ {A},{B} (R C S)), where A is the nesting attribute and A sch(r), B is the nested attribute and B sch(s), and C is the correlated predicate, would be generated for A IN (SELECT B FROM S...). The trick in these cases is to realize that the expression above can be simplified to R C A=B S. In a more general setting, the expression σ AθSOME{B} (υ {A},{B} (R C S)) can be shown to be equivalent to R C AθB S. If, furthermore, projection push down shows that only attributes from R are needed, the join can be transformed into a semijoin. Thus, through algebraic rewriting our approach can be shown to be equivalent to the standard one for positive cases. More discussion about positive linking operators will be shown in section EXPERIMENTS AND PERFORMANCE ANALYSIS In this section, we compare the performance of the nested relational approach with the performance of a popular commercial database management system (DBMS), which we call System A, evaluating nested queries in its latest version using its native approach. Our experiments focus on nested queries containing negative and mixed linking operators, which are not efficiently evaluated by direct unnesting using existing techniques. 5.1 Implementation As described in the previous sections, our nested relational algebra is an extension of the standard relational algebra, thus only the nest operator and the linking selection operator are not supported by current DBMS. To implement the nested relational approach, we wrote stored procedures in procedural SQL, an extension of SQL that adds programming language-like capabilities to SQL (variable declaration, loop and conditional statements). Our approach was to design the program in two stages: first, an SQL query is used to unnest the query by executing (left outer) joins of the base relations in each query block with corresponding selections pushed down. Second, code in the procedure implements the nest operator and the linking selection operator by processing the tuples fetched from the first stage, which we call intermediate result. In order to simulate nesting in an effective manner, we make the database sort the intermediate result. This is equivalent to implementing nest by sorting, which we believe is a realistic possibility (like a group-by, the two obvious options to implement nest are sorting and hashing). We implemented two variants of the nested relational approach: the original nested relational approach implements the nest operator and the linking selection operator separately (which requires two passes over the intermediate result), and the optimized nested relational approach pipelines the nest operator and the linking selection operator (which requires only one pass over the intermediate result). The reasons we use stored procedures to implement the nested relational approach are: (1) they run inside the database so that the communication overhead can be reduced significantly compared to external processing; (2) they can be called by other applications, which makes the nested relational approach more suited for practical use. However, there still exists communication overhead when the stored procedure fetches data from the SQL engine (as observed in [1], this is one considerable disadvantage that all experimental settings similar to ours must bear). For that reason, in reporting our results one of the main parameters we use is the size of the intermediate result. In our experiments, we created a TPC-H database [4] at scale factor 1 (total data size 1GB) in System A, hosted on a server with an Intel Pentium 4 2.8GHz processor, two 36GB SCSI disks, and 1GB memory, running Red Hat Enterprise Linux WS release 3. We configured a buffer cache of size 32MB, and installed all data and indexes in a single disk. B+ tree indexes on the primary key of each base table were automatically built by System A. Additional indexes on the selected foreign keys were created manually when needed (more on this below). 5.2 Performance analysis To verify the efficiency of the nested relational approach, three queries and their variations with four different sizes derived from the TPC-H benchmark were tested in our experiments. For each query, we measured the average execution time of multiple runs of the query as the primary performance metric. Before each running, the buffer cache of System A was flushed. The graphs of the results plot the elapsed time on the Y-axis and the size of each query block (outer/inner) on the X-axis. The size of each query block denotes the size of the base table (or a join of base tables) in a query block with corresponding selections pushed down, but without the linking predicates executed yet. We 197

8 chose this size as a parameter due to the fact that it directly relates to the intermediate result, which in turn, relates to the overhead corresponding to fetching tuples from the SQL engine. This size is controlled by changing constants on the selections and thus varying their selectivity factor. Note that the size of the final result is proportional to the size of the intermediate result. Our first experiment was done on Query 1, which is a one-level nested query with an ALL linking operator. Query 1: select o_orderkey, o_orderpriority from orders where o_orderdate>=x1 and o_orderdate<x2 and o_totalprice > all (select l_extendedprice from lineitem where l_orderkey=o_orderkey and l_commitdate<l_receiptdate and l_shipdate<l_commitdate) proach. The experimental results are shown in figure 4. Both the original and the optimized nested relational approaches outperform the native approach, although the native approach benefits from indexes. One notable point about Query 1 is that, with a NOT NULL constraint on the attribute l_extendedprice, System A directly performs an antijoin of orders and lineitem, and the performance is about the same as ours. However, if the NOT NULL constraint is dropped, even though there are no null values in l_extendedprice, antijoin is not used. In general, the ALL or NOT IN linking predicate can not be evaluated using antijoin when null values exist. The second experiment we did was on two variations of Query 2, a two-level nested query. The term [any all] refers to choosing either one. Query 2: select p_partkey, p_name from part where p_size>=x1 and p_size<=x2 and p_retailprice < [any all] (select ps_supplycost from partsupp where ps_partkey=p_partkey and ps_availqty<y and not exists (select * from lineitem where ps_partkey=l_partkey and ps_suppkey=l_suppkey and l_quantity=z)) 4K/7K 8K/7K 12K/7K 16K/7K Size of Query Block (1/2) 8 6 Figure 4: Query 1 The conditions o_orderdate>=x1 and o_orderdate<x2 and l_commitdate<l_receiptdate and l_shipdate<l_commitdate are used to regulate the size of each query block. The size of the outer query block ranges from 4K to 16K tuples, and the inner query block has 7K tuples. The attributes l_orderkey and o_orderkey are automatically indexed. The native approach evaluates Query 1 in the nested iteration manner, that is, for each tuple of orders that qualifies the conditions o_orderdate>=x1 and o_orderdate<x2, the inner query block is computed once, and then the ALL linking predicate is evaluated. Note that every time when the inner query block is to be computed, lineitem is accessed by index rowid, which is more efficient than fully accessed. For the nested relational approach, the stored procedure fetches the tuples from System A which performs an outer hash join of orders and lineitem on the correlated predicate l_orderkey=o_orderkey, which requires full accesses of orders and lineitem, and then processes the intermediate result by the nested relational way. The size of the intermediate result is about 4K, 81K, 123K and 165K for four tests. Accordingly, the processing time of nest and linking selection for the original nested relational approach is.24,.47,.71,.98 seconds, and.3,.6,.1,.13 seconds for the optimized nested relational ap K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K Figure 5: Query 2a(mixed: ANY/NOT EXISTS) The conditions p_size>=x1 and p_size<=x2, ps_availqty<y and l_quantity=z are used again to give different size of each query block. The size of the first query block ranges from 12K to 48K tuples, the second and the third query blocks have 16K and 12K tuples respectively. The attributes p_partkey and (ps_partkey, ps_suppkey) are automatically indexed. Additional indexes on the foreign keys of lineitem, l_partkey and l_suppkey, are created manually for fast accessing data within lineitem. To test the effect of indexes on the native approach, we created a combined index on (l_partkey,l_suppkey) and two single indexes on l_partkey and l_suppkey respectively. 198

9 K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K Figure 6: Query 2b(negative: ALL/NOT EXISTS) Note that Query 2 is linear correlated, this implies that there are two possible approaches to the query: one from top-down, the other from bottom-up. Our first variation of Query 2 is Query 2a with the mixed ANY and NOT EXISTS operators. The native approach evaluates Query 2a from the bottom up, that is, first performs an antijoin of partsupp and lineitem to form a view for the NOT EXISTS linking predicate, and then performs a semijoin of part and the previous resulting view for the ANY linking predicate. Each table is fully accessed once. The execution result is shown in figure 5. If the linking operators are any combination of ANY/SOME, IN, EXISTS and NOT EXISTS, the native approach will be the same, that is, the combination of semijoin and/or antijoin. However, if one of the linking operators is ALL or NOT IN, for general cases, the native approach has to introduce nested iteration, which gives rise to our second variation of Query 2, Query 2b with the negative ALL and NOT EXISTS operators. If there is a NOT NULL constraint on the attribute ps_supplycost, the native approach evaluates it in the similar manner as processing Query 2a with two antijoins instead of one antijoin and one semijoin. However, for general cases or if the NOT NULL constraint is dropped, the native approach can only unnest the NOT EXISTS linking predicate by antijoin, and perform nested iteration for the ALL linking predicate. Thus, for each tuple of part that qualifies the conditions p_size>=x1 and p_size<=x2, the native approach performs a nested loop antijoin of partsupp and lineitem using the combined index on (l_partkey,l_suppkey). Figure 6 shows the execution result. One important point we must make is the additional indexes on the foreign keys of lineitem play an important role in doing nested loop antijoin. The native approach performs much worse if these indexes are not available. No matter what the linking operator is, for both queries, the nested relational approach processes the intermediate result obtained from System A that executes outer hash joins of part, partsupp and lineitem, in a nested relational way. The size of the intermediate result is equivalent for both queries, that is, 14K, 29K, 44K and 58K, and thus the processing time of nest and linking selection is almost same,.18,.36,.54 and.72 seconds for the original nested relational approach, and.2,.4,.6 and.8 seconds for the optimized nested relational approach. For the convenience of comparison, we use the same scale on Y-axes in figure 5 and figure 6. Comparing figure 5 with figure 6, we can obtain the following points for nested linear queries: the nested relational approach has similar performance on nested linear queries regardless of the linking operators; the performance of the native approach depends on the existence of the ALL or NOT IN linking operator: the native approach (semijoin and/or antijoin) performs significantly worse than the nested relational approach if the ALL or NOT IN linking operator are used (see figure 6), but slightly better than the nested relational approach when the ALL or NOT IN linking operator are not used (see figure 5), which is partly because of the processing, but mostly because of the communication overhead required by the nested relational approach. In the literature, antijoin and semijoin have been considered to be the most efficient way for processing the NOT EXISTS predicate and the EXISTS or IN predicates respectively. However, as pointed before, antijoin can not be directly used anywhere without a delicate transformation or a constraint when nulls exist. Furthermore, antijoin and semijoin can not always be extended to evaluate multi-level queries, because either antijoin or semijoin keeps only one table information that participate in the operation, thus the other table information required by the further processing might be lost. As a result, we came up with our third experiment on three variations of Query 3, which is derived from Query 2 by slightly modifying the predicate ps_partkey=l_partkey in the third query block to p_partkey=l_partkey. This modification made Query 3 a more general two-level nested query: the third query block is correlated to both the other two query blocks. This modification also significantly affects the native approach. Similar to Query 2, the terms [all any], [exists not exists] and [= <>] denote choosing either one. Query 3: select p_partkey, p_name from part where p_size>=x1 and p_size<=x2 and p_retailprice < [all any] (select ps_supplycost from partsupp where ps_partkey=p_partkey and ps_availqty<y and [exists not exists] (select * from lineitem where p_partkey[= <>]l_partkey and ps_suppkey[= <>]l_suppkey and l_quantity=z)) The size of each query block and the available indexes are same as Query 2. The variations of Query 3 are used to test mixed linking operators, positive linking operators, and negative linking operators, with equal and non-equal correlated predicates. The first variation is Query 3a with the mixed linking operators ALL and EXISTS, the second variation is Query 3b with two negative linking operators ALL and NOT EXISTS, and the third variation is Query 3c with two positive linking operators ANY and EXISTS. Generally, optimizer generates query plan depending on not only the linking operators but also the correlated predicates. Thus, each variation again 199

10 K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K 12K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K 12K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K (a) Query 3a(a): p partkey=l partkey and ps suppkey=l suppkey (b) Query 3a(b): p partkey<>l partkey and ps suppkey=l suppkey (c) Query 3a(c): p partkey=l partkey and ps suppkey<>l suppkey Figure 7: Query 3a(mixed: ALL/EXISTS) K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K 12K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K 12K/16K/12K 24K/16K/12K 36K/16K/12K 48K/16K/12K (a) Query 3b(a): p partkey=l partkey and ps suppkey=l suppkey (b) Query 3b(b): p partkey<>l partkey and ps suppkey=l suppkey (c) Query 3b(c): p partkey=l partkey and ps suppkey<>l suppkey Figure 8: Query 3b(negative: ALL/NOT EXISTS) has three cases based on the correlated predicates of the third query block: (a) p_partkey=l_partkey and ps_suppkey=l_suppkey, (b) p_partkey<>l_partkey and ps_suppkey=l_suppkey, (c) p_partkey=l_partkey and ps_suppkey<>l_suppkey. For this discussion, we use the same scale on Y-axes in figures 7, 8 and 9 for easy comparison. It is important to point out that System A is unable to use antijoin in these queries, even though the NOT NULL constraint is present; this is due to the problems mentioned in section 2. System A has different plans for all three queries. For Query 3a and Query 3c, System A always tries to unnest the third query blocks for the EXISTS linking predicate. While for Query 3b, System A has to perform nested iteration over three query blocks. Also, System A treats different correlated predicates in a different manner. More importantly, System A is greatly affected by indexes. We explain System A s behavior in detail. For Query 3a with mixed linking operators (see figure 7), the ALL linking predicate has to be evaluated using nested iteration, but the EXISTS linking predicate can be unnested by nested loop join. For each tuple of part that qualifies p_size>=x1 and p_size<=x2, lineitem is accessed by index rowid; for each index, the nested loop join is performed on partsupp using the index on (ps_partkey,ps_suppkey). For Query 3a(a) (see figure 7(a)) and Query 3a(c) (see figure 7(c)), the combined index on (l_partkey,l_suppkey) is used to access lineitem, while for Query 3a(b) (see figure 7(b)), the single index on l_suppkey is used. Comparing figures in figure 7, we can see that Query 3a(b) performs much better than Query 3a(a) and Query 3a(c), both of which have similar performance. The reason is that the single index structure of l_suppkey is much smaller than the combined index structure of (l_partkey,l_suppkey). For Query 3b with negative linking operators (see figure 8), the ALL and NOT EXISTS linking predicates have to be performed by nested iteration. For each tuple of part that qualifies p_size>=x1 and p_size<=x2, partsupp is accessed using the index on (ps_partkey, ps_suppkey), and in turn, lineitem is accessed using the appropriate indexes: the combined index on (l_partkey,l_suppkey) for Query 3b(a) (see figure 8(a)) and Query 3b(c) (see figure 8(c)); the single index on l_suppkey for Query 3b(b) (see figure 8(b)). Figure 8 shows that Query 3b(a) and 2

Computing SQL Queries with Boolean Aggregates

Computing SQL Queries with Boolean Aggregates Computing SQL Queries with Boolean Aggregates Antonio Badia Computer Engineering and Computer Science department University of Louisville Abstract. We introduce a new method for optimization of SQL queries

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Fighting Redundancy in SQL: the For-Loop Approach

Fighting Redundancy in SQL: the For-Loop Approach Fighting Redundancy in SQL: the For-Loop Approach Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY 40292 July 8, 2004 1 Introduction

More information

Fighting Redundancy in SQL

Fighting Redundancy in SQL Fighting Redundancy in SQL Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY 40292 Abstract. Many SQL queries with aggregated subqueries

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Optimization of Nested Queries in a Complex Object Model

Optimization of Nested Queries in a Complex Object Model Optimization of Nested Queries in a Complex Object Model Based on the papers: From Nested loops to Join Queries in OODB and Optimisation if Nested Queries in a Complex Object Model by Department of Computer

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and Chapter 6 The Relational Algebra and Relational Calculus Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 Outline Unary Relational Operations: SELECT and PROJECT Relational

More information

Relational Algebra and SQL

Relational Algebra and SQL Relational Algebra and SQL Relational Algebra. This algebra is an important form of query language for the relational model. The operators of the relational algebra: divided into the following classes:

More information

Chapter 19 Query Optimization

Chapter 19 Query Optimization Chapter 19 Query Optimization It is an activity conducted by the query optimizer to select the best available strategy for executing the query. 1. Query Trees and Heuristics for Query Optimization - Apply

More information

Optimized Query Plan Algorithm for the Nested Query

Optimized Query Plan Algorithm for the Nested Query Optimized Query Plan Algorithm for the Nested Query Chittaranjan Pradhan School of Computer Engineering, KIIT University, Bhubaneswar, India Sushree Sangita Jena School of Computer Engineering, KIIT University,

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Avoiding Sorting and Grouping In Processing Queries

Avoiding Sorting and Grouping In Processing Queries Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database query processing Query Processing The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database from high level queries

More information

Relational Algebra. Procedural language Six basic operators

Relational Algebra. Procedural language Six basic operators Relational algebra Relational Algebra Procedural language Six basic operators select: σ project: union: set difference: Cartesian product: x rename: ρ The operators take one or two relations as inputs

More information

CS122 Lecture 4 Winter Term,

CS122 Lecture 4 Winter Term, CS122 Lecture 4 Winter Term, 2014-2015 2 SQL Query Transla.on Last time, introduced query evaluation pipeline SQL query SQL parser abstract syntax tree SQL translator relational algebra plan query plan

More information

Query Processing SL03

Query Processing SL03 Distributed Database Systems Fall 2016 Query Processing Overview Query Processing SL03 Distributed Query Processing Steps Query Decomposition Data Localization Query Processing Overview/1 Query processing:

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

SQL and Incomp?ete Data

SQL and Incomp?ete Data SQL and Incomp?ete Data A not so happy marriage Dr Paolo Guagliardo Applied Databases, Guest Lecture 31 March 2016 SQL is efficient, correct and reliable 1 / 25 SQL is efficient, correct and reliable...

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

CSC317/MCS9317. Database Performance Tuning. Class test

CSC317/MCS9317. Database Performance Tuning. Class test CSC317/MCS9317 Database Performance Tuning Class test 7 October 2015 Please read all instructions (including these) carefully. The test time is approximately 120 minutes. The test is close book and close

More information

RELATIONAL OPERATORS #1

RELATIONAL OPERATORS #1 RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

CS 377 Database Systems

CS 377 Database Systems CS 377 Database Systems Relational Algebra and Calculus Li Xiong Department of Mathematics and Computer Science Emory University 1 ER Diagram of Company Database 2 3 4 5 Relational Algebra and Relational

More information

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan Plan for today Query Processing/Optimization CPS 216 Advanced Database Systems Overview of query processing Query execution Query plan enumeration Query rewrite heuristics Query rewrite in DB2 2 A query

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

Database Tuning and Physical Design: Basics of Query Execution

Database Tuning and Physical Design: Basics of Query Execution Database Tuning and Physical Design: Basics of Query Execution Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Query Execution 1 / 43 The Client/Server

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

CS 564 Final Exam Fall 2015 Answers

CS 564 Final Exam Fall 2015 Answers CS 564 Final Exam Fall 015 Answers A: STORAGE AND INDEXING [0pts] I. [10pts] For the following questions, clearly circle True or False. 1. The cost of a file scan is essentially the same for a heap file

More information

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count

More information

Detecting Logical Errors in SQL Queries

Detecting Logical Errors in SQL Queries Detecting Logical Errors in SQL Queries Stefan Brass Christian Goldberg Martin-Luther-Universität Halle-Wittenberg, Institut für Informatik, Von-Seckendorff-Platz 1, D-06099 Halle (Saale), Germany (brass

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

RELATIONAL DATA MODEL: Relational Algebra

RELATIONAL DATA MODEL: Relational Algebra RELATIONAL DATA MODEL: Relational Algebra Outline 1. Relational Algebra 2. Relational Algebra Example Queries 1. Relational Algebra A basic set of relational model operations constitute the relational

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Database Management Systems Paper Solution

Database Management Systems Paper Solution Database Management Systems Paper Solution Following questions have been asked in GATE CS exam. 1. Given the relations employee (name, salary, deptno) and department (deptno, deptname, address) Which of

More information

Informationslogistik Unit 4: The Relational Algebra

Informationslogistik Unit 4: The Relational Algebra Informationslogistik Unit 4: The Relational Algebra 26. III. 2012 Outline 1 SQL 2 Summary What happened so far? 3 The Relational Algebra Summary 4 The Relational Calculus Outline 1 SQL 2 Summary What happened

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation. Query Processing QUERY PROCESSING refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

Fundamentals of Database Systems

Fundamentals of Database Systems Fundamentals of Database Systems Assignment: 4 September 21, 2015 Instructions 1. This question paper contains 10 questions in 5 pages. Q1: Calculate branching factor in case for B- tree index structure,

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

CSE 344 MAY 7 TH EXAM REVIEW

CSE 344 MAY 7 TH EXAM REVIEW CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical

More information

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser.

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. Query Processing: an Overview Query Processing in a Nutshell QUERY Parser Preprocessor Logical Query plan generator Logical query plan Query rewriter

More information

Querying Data with Transact SQL

Querying Data with Transact SQL Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Relational Model: History

Relational Model: History Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

A Comparison of Three Methods for Join View Maintenance in Parallel RDBMS

A Comparison of Three Methods for Join View Maintenance in Parallel RDBMS A Comparison of Three Methods for Join View Maintenance in Parallel RDBMS Gang Luo Jeffrey F. Naughton Curt J. Ellmann Michael W. Watzke Department of Computer Sciences NCR Advance Development Lab University

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

Transactions and Concurrency Control

Transactions and Concurrency Control Transactions and Concurrency Control Transaction: a unit of program execution that accesses and possibly updates some data items. A transaction is a collection of operations that logically form a single

More information

Optimizing Recursive Queries in SQL

Optimizing Recursive Queries in SQL Optimizing Recursive Queries in SQL Carlos Ordonez Teradata, NCR San Diego, CA 92127, USA carlos.ordonez@teradata-ncr.com ABSTRACT Recursion represents an important addition to the SQL language. This work

More information

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University Query Optimization Shuigeng Zhou December 9, 2009 School of Computer Science Fudan University Outline Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational

More information

CS222P Fall 2017, Final Exam

CS222P Fall 2017, Final Exam STUDENT NAME: STUDENT ID: CS222P Fall 2017, Final Exam Principles of Data Management Department of Computer Science, UC Irvine Prof. Chen Li (Max. Points: 100 + 15) Instructions: This exam has seven (7)

More information

Request Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by Utilizing Data Sharing Across Concurrent Distributed Queries Rubao Lee, Minghong Zhou, Huaming Liao Research

More information

CS 245 Midterm Exam Solution Winter 2015

CS 245 Midterm Exam Solution Winter 2015 CS 245 Midterm Exam Solution Winter 2015 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007 Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

Optimization Overview

Optimization Overview Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:

More information

CSE 190D Spring 2017 Final Exam

CSE 190D Spring 2017 Final Exam CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,

More information

CMPUT 391 Database Management Systems. An Overview of Query Processing. Textbook: Chapter 11 (first edition: Chapter 14)

CMPUT 391 Database Management Systems. An Overview of Query Processing. Textbook: Chapter 11 (first edition: Chapter 14) CMPUT 391 Database Management Systems Winter Semester 2006, Section B1, Dr. Jörg Sander An Overview of Query Processing Textbook: Chapter 11 (first edition: Chapter 14) Based on slides by Lewis, Bernstein

More information

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,

More information

The Relational Algebra

The Relational Algebra The Relational Algebra Relational Algebra Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) 27-Jan-14

More information

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 External Sorting Today s Topic Implementing the join operation 4/8/2009 Luke Huan Univ. of Kansas 2 Review DBMS Architecture

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Chapter 6 The Relational Algebra and Calculus

Chapter 6 The Relational Algebra and Calculus Chapter 6 The Relational Algebra and Calculus 1 Chapter Outline Example Database Application (COMPANY) Relational Algebra Unary Relational Operations Relational Algebra Operations From Set Theory Binary

More information

Module 9: Selectivity Estimation

Module 9: Selectivity Estimation Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information