Optimization of Nested Queries in a Complex Object Model Based on the papers: From Nested loops to Join Queries in OODB and Optimisation if Nested Queries in a Complex Object Model by Department of Computer Science, University of Twente 1
Overview: - Introduction - OOSQL and SQL - Translation of the queries into the ADL - Optimisation of the Nested Algebra Queries - Unnesting strategies - Conclusion 2
Introduction: It is advantageous to replace nested SQL queries by flat, or join queries. Flat SQL queries are SFW-blocks not containing subqueries in the WHERE clause. So the optimizer has better possibilities to choose the most appropriate join implementation. We distinguished five types of nesting and saw the algorithm to transform nested queries into join queries for each type. Still in case aggregate functions occur between query blocks (one of the types of nesting) SQL s GROUP BY clause is employed to compute the aggregates needed. But Kim s algorithm is not correct if the aggregate function COUNT occurs between query blocks (COUNT bug). 3
Introduction: SELECT * FROM R WHERE = SELECT COUNT (*) FROM S WHERE R.C = S.C Following Kim s algorithm, we get the following queries: 1. T(C,CNT) = SELECT S.C, COUNT (*) FROM S GROUP BY S.C Grouping of the inner operand and computation of the aggregate SELECT R.A, R.B, R.C precedesthe join operation FROM R.S WHERE R.B = T. CNT AND R.C = T.C Alternatively, if the R does not contain duplicates, the nested query may be transformed into: 2. SELECT R.A, R.B, R.C FROM R.S Join is executed first. WHERE R.C = S.C GROUP BY R.A, R.B, R.C HAVING R.B = COUNT (S.C) DO NOT GIVE THE CORRECT RESULT 4
Introduction: To solve the COUNT bug, it has been proposed to use 1. outerjoins instead of joins if COUNT function occurs. The right outerjoin operator preserves dangling tuples of the left join operand: unmatched left operand tuples are extended with NULL values in the right operand attribute positions. 2. two types of join predicates: - a regular join predicate and - an additional,so-called antijoin predicate, to be applied to the dangling tuples. 5
OOSQL and SQL An OOSQL query facility is inherently more complex than one for SQL: nesting is allowed in all clauses, SELECT, FROM, and WHERE. expressions in the FROM-clause may be base tables as well as setvalued attributes. predicates that are used in the WHERE-clause are more complex, because comparisons between set-valued attributes, or set-valued attributes and base table expressions are allowed. As in relational systems supporting SQL, in OO data models supporting an SQL-like query language (OOSQL), optimization of nested queries is an important issue. A naive way to handle nested queries is by nested-loop processing (tuple-oriented query processing), however, it is better to transform nested queries into join queries, because join queries can be implemented in many different ways. 6
Main Approach: Two approaches in the logical optimization of a declarative query language: (1) rewriting expressions in the query language itself (2) translation into and rewriting in some intermediate language, for example an algebraic language. The goal in translation and optimization of OOSQL is to move from tuple- to set-oriented query processing. Our approach is to translate nested OOSQL queries into nested algebraic expressions, and then to try to rewrite nested algebraic expressions into join expressions (algebraic language ADL). 7
The Complex Object Algebra ADL: ADL is a typed algebra for complex objects, allowing for nesting of expressions. Among the constructors supported are the tuple (( )) and set ({ }) type constructor; Roughly, the algebraic operators of the language ADL are: - the standard set (comparison) operators - extended Cartesian product(in which operand tuples are concatenated) - division, - map operator, - selection, - projection, - renaming operator, + aggregate functions sure - nest, - unnest, semantics is omitted because of the - regular join, lack of space - semijoin, - and the antijoin. 8
Translation of OOSQL into ADL: Translation of OOSQL queries into the algebra is done in a simple, almost one-to-one way. In the translation phase, nested OOSQL queries are translated into nested algebraic expressions. Following translation, in the phase of logical optimization, nested expressions are rewritten into set operations. In the translation phase, an SFW-query block is mapped to an algebraic expression consisting of a selection followed by a map: select e1 from x in e2 where e3 α[x : e1] (σ[x : e3](e2)) σ computes the selection e3 α projection e1 9
Optimization Of Nested Algebra Queries: The example queries given below concern the database which in ADL, have the types of SUPPLIER and PART defined as follows: SUPPLIER : { < eid : oid, sname : string, parts : { <pid : oid>}>} PART : {< pid : oid, pname : string, price : int, color : string>} We distinguish three ways of optimizing nested ADL queries: (1) the unnesting of attributes by using the unnest operator, (2) the unnesting of nested expressions by transforming them into relational join queries, (3) using new operators that are defined especially to enhance performance. 10
1. Unnesting Of Attributes Disadvantages: - nesting and unnesting are inverse to each other not for all relations - first unnesting and later nesting again will be expensive due to duplication of attribute values and overhead caused by restructuring. Query : Select the identifiers of suppliers supplying non-existing parts π sid( σ[s : z s.parts p PART z = p[pid]])(supplier)) The set-valued attribute parts is not needed in the result, so the above query may be rewritten into the antijoin query: ( (SUPPLIER )) > πsid µparts Note that because z is existentially quantified, the loss of tuples with empty set-valued attribute parts causes no problem (existential quantification over the empty set delivers false). PART s,p:s.pid=p.pid 11
2. Transformation into join queries In some cases two or more consecutive levels of nesting can be replaced by a join, antijoin, or semijoin operator, reducing the number of levels of nesting. In the ideal case all nesting has disappeared. Query: Select the suppliers supplying red parts. σ[ s : tz s.parts p PART z = p[pid]] p.color = "red"](supplier) This query can be rewritten into the semijoin query: SUPPLIER >< σ[p.color = "red"](part) s,p:p[pid] є s.parts Note that because z is existentially quantified, the loss of tuples with empty set-valued attribute parts causes no problem (existential quantification over the empty set delivers false). 12
3. Using Special Operators The following query cannot be rewritten into a relational join query. New operators is really necessary toobtain an efficient implementation. Query: Select suppliers names together with the parts supplied. α[s : < sname = s.name, parts_suppl = σ[p : p[pid] s.parts](part) > ](SUPPLIER) query can be rewritten into the efficient set operation with nestjoin query: πsname. parts _ sup pl (SUPPLIER Note that each of the options above can be applied to the top level expression as well as to subexpressions thereof. s,p:p[pid] є s.parts;parts_suppl PART) 13
Rewrite strategy 1. Try to rewrite to the various relational join operators (join, antijoin, or semijoin). 2. If the above is not possible, try to flatten set valued attributes; if the nesting phase can be skipped, this may be a strategy worthwhile considering. 3. If the above is not possible, try to rewrite to one of the newly defined operators, because they were introduced to get a better performance compared to nested-loop processing. 4. If none of the above works, leave the query as it is, which means that it is executed by means of nested loops. 14
Rewriting into flat relational algebra: nesting in the WHERE-clause in the presence of set-valued attributes. The general format of a two-block OOSQL query with nesting in the WHERE-clause The goal of the transformation is the following: process is to transform the predicate P(x,Y ), whose second argument is set valued, into a predicate P, where values v are the members of Y. The types of P and P clearly differ: SELECT F(x) from the second argument of P a set FROM x X, y Y constructor is removed, resulting in predicate P WHERE P (x, v)λq(x, v) WITH v = G(x, y) SELECT F(x) FROM x X WHERE P(x, Y ) WITH Y = SELECT G(x, y) FROM y Y WHERE Q(x, v) 15
Rewriting into flat relational algebra: General formatof SFW Guery : α[x : F(x)](σ[x : P(x,Y )](X)) with Y = α[y : G(x,y)](σ[y : Q(x,y)](Y)), for simplisity F and G identity Then we have: σ[x : P(x,Y )](X) with Y = σ[y : Q(x,y)](Y) The query above is a nested query involving nested iteration over a base table: the outer selection predicate contains a subquery, which is a selection on base table Y. We want to transform this nested query into a join query, i.e. a query having no subqueries with base table operands. 16
Set Comparison Operations We concentrate on two-block nested expressions with set comparison operations between query blocks. Two unnesting techniques: - unnesting by rewriting into quantifier expressions, - unnesting by grouping,a technique well-known from the relational model however, to be of good use in complex models, they have to be adapted. We will do so by defining a new algebraic operator, the nestjoin operator. 17
Unnesting By Rewriting Into Quantifier Expressions 1 Rewriting Example 1: SET MEMBERSHIP σ[x : x.c є σ [y : q](y)](x) σ[x : y є σ [y : q](y) y = x.c](x) σ[x : y є Y y = x.cλq](x) X x x,y:y=x.c Λq Y E E - operator є is rewritten into an existential quantification. - select operation is removed from the operand (the range expression) of the existential quantifier, providing the possibility to translate the existential subquery into a semijoin operation In the last rewrite step. Rule 1 UNNESTING QUANTIFIER EXPRESSIONS Le t X and Y be table expressions, and let x not be free in Y, then: 1. σ[x : y є Y p](x) X x x,y:p Y 2. σ[x : y є Y p](x) X x,y:p Y E E A nested query with existential quantification is translated into a semijoin operation; negated existential (i.e.universal) quantification is dealt with by means of the antijoin operator. 18
Unnesting By Rewriting Into Quantifier Expressions 2 Rewriting Example 1: SET INCLUSION All set comparison operators can be rewritten into quantifier expressions, 19
Unnesting By Grouping Another way to deal with set comparison operators is to use grouping. Used in transforming nested queries with aggregate functions between query blocks. Consider the following nested query, 20 in Database Query Processing Universität Konstanz, 2005
Unnesting By Grouping The nested query is transformed into a flat join query consisting of (1) a join to evaluate the inner query block predicate, (2) a nest operation for grouping, (3) a selection for evaluating P, the predicate between blocks, (4) a final projection. 21
Nesting In The Map Operator Another example of the strategy of rewriting nested expressions into relational join expressions, but now concerning nesting in the map operator (i.e. in the SELECT-clause). The following equivalence rule can be used to transform a nested map operation into a join query:. Unnesting by grouping is a transformation technique that is generally applicable, if not for the occurrence of bugs. In the next section, we show how to avoid the occurrence of bugs by using the nestjoin operator. 22
New Algebraic Operators It is worthwhile to define new logical algebra operators whenever there can be found new access algorithms that are an improvement over nested-loop query processing. The Nestjoin Operator Materializing Set-Valued Attributes The PNHL Algorithm 23
The Nestjoin Operator The nestjoin operator is to be used for the unnesting of nested queries that cannot be rewritten into flat relational join operations. The nestjoin operator as defined above can be used for the transformation of two-block select expressions with arbitrary predicates between blocks. The simplified version of the two-block select query: 24
The Nest Join operator Is simply a modification of the join operator. Instead of producing the concatenation of every pair of matching tuples, for each left operand tuple a set is created to hold the (possibly modified) right operand tuples that match. The nest join of two tables X and Y on predicate Q with function G (the function applied to the right-hand tuples satisfying the join predicate) is defined as: In this expression, x++ <a=z> denotes the concatenation of the tuple x and the unary tuple <a=z>, in which a is an arbitrary label not occurring on the top level of X. An example of the nest join operation is found in Table 1, where flat relations X and Y are equijoined on the second attribute (the join function is the identity function). Note that or dangling tuples, the tuple x++<a=0> is present in the result. 25
The Nest Join operator 26
The PNHL Algorithm algorithm of for efficiently processing a nested expression in which a setvalued attribute is joined with a base table. The following query expresses a nested natural join (*) operation: The algorithm builds a hash table for those segments of operand PART that fit into main memory and then probes operand SUPPLIER against each segment of the hash table, thus building partial results. Partial results are merged in the second phase of the algorithm. Compared to the unnest-join-nest processing method, the algorithm achieves better performance. 27
Conclusion: In OOSQL, nesting may occur in the where-, from-, and selectclause. An additional complication in complex object models is the support for iteration over set-valued attributes. The goal is to transform nested OOSQL queries having correlated subqueries with base table expressions as operands into join queries in which base tables occur only at top level. We have shown that transformation of nested OOSQL queries dealing with set-valued attributes into relational join queries is not always possible To improve matters we have defined a new operator called the nestjoin operator. 28
Processing a General Nested Query: 6 Thank you for the attention!!! 29