Optimization of Nested Queries in a Complex Object Model

Similar documents
execution. In this paper, we deal with the problem of trying to translate nested OOSQL queries to join queries in ADL, taking advantage of ecient impl

Relational Databases

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Relational Algebra and SQL

Relational Model, Relational Algebra, and SQL

CS122 Lecture 4 Winter Term,

CSE 344 JANUARY 26 TH DATALOG

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

Introduction to Database Systems CSE 444

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Lecture 17: Query execution. Wednesday, May 12, 2010

Announcements (September 14) SQL: Part I SQL. Creating and dropping tables. Basic queries: SFW statement. Example: reading a table

Database Systems CSE 414

Relational Model: History

Algebraic XQuery Decorrelation with Order Sensitive Operations

Query Processing SL03

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Missing Information. We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios:

CMP-3440 Database Systems

Optimized Query Plan Algorithm for the Nested Query

EECS 647: Introduction to Database Systems

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

More on SQL Nested Queries Aggregate operators and Nulls

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

L22: The Relational Model (continued) CS3200 Database design (sp18 s2) 4/5/2018

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 16th January 2014 Time: 09:45-11:45. Please answer BOTH Questions

Database System Concepts

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

CSE 344 APRIL 20 TH RDBMS INTERNALS

CSE344 Midterm Exam Fall 2016

CIS 330: Applied Database Systems

A Nested Relational Approach to Processing SQL Subqueries

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

Introduction SQL DRL. Parts of SQL. SQL: Structured Query Language Previous name was SEQUEL Standardized query language for relational DBMS:

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

Database Design and Tuning

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

4. SQL - the Relational Database Language Standard 4.3 Data Manipulation Language (DML)

4/10/2018. Relational Algebra (RA) 1. Selection (σ) 2. Projection (Π) Note that RA Operators are Compositional! 3.

Chapter 14: Query Optimization

2. Make an input file for Query Execution Steps for each Q1 and RQ respectively-- one step per line for simplicity.

SQL Data Querying and Views

Parser: SQL parse tree

Chapter 5. Relational Algebra and Relational Calculus

CSE 544 Principles of Database Management Systems

Advanced Database Systems

Chapter 19 Query Optimization

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2006 Lecture 3 - Relational Model

CS122 Lecture 5 Winter Term,

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Introduction. 1. Introduction. Overview Query Processing Overview Query Optimization Overview Query Execution 3 / 591

Introduction Alternative ways of evaluating a given query using

Fighting Redundancy in SQL: the For-Loop Approach

Lyublena Antova, Christoph Koch, and Dan Olteanu Saarland University Database Group Saarbr ucken, Germany Presented By: Rana Daud

CMPUT 391 Database Management Systems. An Overview of Query Processing. Textbook: Chapter 11 (first edition: Chapter 14)

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

CSC 261/461 Database Systems Lecture 19

More SQL: Complex Queries, Triggers, Views, and Schema Modification

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

CS317 File and Database Systems

Querying Data with Transact SQL

Database Tuning and Physical Design: Basics of Query Execution

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p:// Lecture 4

SQL: Queries, Programming, Triggers

CS 377 Database Systems

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

CPS 216 Spring 2003 Homework #1 Assigned: Wednesday, January 22 Due: Monday, February 10

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

More SQL: Complex Queries, Triggers, Views, and Schema Modification

CSE 544 Principles of Database Management Systems

The Relational Algebra

Chapter 6: Formal Relational Query Languages

CSC 261/461 Database Systems Lecture 13. Fall 2017

Ian Kenny. November 28, 2017

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Project. Building a Simple Query Optimizer with Performance Evaluation Experiment on Query Rewrite Optimization

QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM

Avoiding Sorting and Grouping In Processing Queries

Query processing. Query analysis logical query plan Query transformation Physical plan generation and optimization Query execution

The SQL data-definition language (DDL) allows defining :

CSE 344 MAY 7 TH EXAM REVIEW

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Chapter 2: Intro to Relational Model

CMPS 277 Principles of Database Systems. Lecture #4

Midterm Review. Winter Lecture 13

Query Processing & Optimization

SQL relations are multisets (bags) of tuples (i.e., they can contain duplicates)

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Chapter 4: SQL. Basic Structure

Transcription:

Optimization of Nested Queries in a Complex Object Model Based on the papers: From Nested loops to Join Queries in OODB and Optimisation if Nested Queries in a Complex Object Model by Department of Computer Science, University of Twente 1

Overview: - Introduction - OOSQL and SQL - Translation of the queries into the ADL - Optimisation of the Nested Algebra Queries - Unnesting strategies - Conclusion 2

Introduction: It is advantageous to replace nested SQL queries by flat, or join queries. Flat SQL queries are SFW-blocks not containing subqueries in the WHERE clause. So the optimizer has better possibilities to choose the most appropriate join implementation. We distinguished five types of nesting and saw the algorithm to transform nested queries into join queries for each type. Still in case aggregate functions occur between query blocks (one of the types of nesting) SQL s GROUP BY clause is employed to compute the aggregates needed. But Kim s algorithm is not correct if the aggregate function COUNT occurs between query blocks (COUNT bug). 3

Introduction: SELECT * FROM R WHERE = SELECT COUNT (*) FROM S WHERE R.C = S.C Following Kim s algorithm, we get the following queries: 1. T(C,CNT) = SELECT S.C, COUNT (*) FROM S GROUP BY S.C Grouping of the inner operand and computation of the aggregate SELECT R.A, R.B, R.C precedesthe join operation FROM R.S WHERE R.B = T. CNT AND R.C = T.C Alternatively, if the R does not contain duplicates, the nested query may be transformed into: 2. SELECT R.A, R.B, R.C FROM R.S Join is executed first. WHERE R.C = S.C GROUP BY R.A, R.B, R.C HAVING R.B = COUNT (S.C) DO NOT GIVE THE CORRECT RESULT 4

Introduction: To solve the COUNT bug, it has been proposed to use 1. outerjoins instead of joins if COUNT function occurs. The right outerjoin operator preserves dangling tuples of the left join operand: unmatched left operand tuples are extended with NULL values in the right operand attribute positions. 2. two types of join predicates: - a regular join predicate and - an additional,so-called antijoin predicate, to be applied to the dangling tuples. 5

OOSQL and SQL An OOSQL query facility is inherently more complex than one for SQL: nesting is allowed in all clauses, SELECT, FROM, and WHERE. expressions in the FROM-clause may be base tables as well as setvalued attributes. predicates that are used in the WHERE-clause are more complex, because comparisons between set-valued attributes, or set-valued attributes and base table expressions are allowed. As in relational systems supporting SQL, in OO data models supporting an SQL-like query language (OOSQL), optimization of nested queries is an important issue. A naive way to handle nested queries is by nested-loop processing (tuple-oriented query processing), however, it is better to transform nested queries into join queries, because join queries can be implemented in many different ways. 6

Main Approach: Two approaches in the logical optimization of a declarative query language: (1) rewriting expressions in the query language itself (2) translation into and rewriting in some intermediate language, for example an algebraic language. The goal in translation and optimization of OOSQL is to move from tuple- to set-oriented query processing. Our approach is to translate nested OOSQL queries into nested algebraic expressions, and then to try to rewrite nested algebraic expressions into join expressions (algebraic language ADL). 7

The Complex Object Algebra ADL: ADL is a typed algebra for complex objects, allowing for nesting of expressions. Among the constructors supported are the tuple (( )) and set ({ }) type constructor; Roughly, the algebraic operators of the language ADL are: - the standard set (comparison) operators - extended Cartesian product(in which operand tuples are concatenated) - division, - map operator, - selection, - projection, - renaming operator, + aggregate functions sure - nest, - unnest, semantics is omitted because of the - regular join, lack of space - semijoin, - and the antijoin. 8

Translation of OOSQL into ADL: Translation of OOSQL queries into the algebra is done in a simple, almost one-to-one way. In the translation phase, nested OOSQL queries are translated into nested algebraic expressions. Following translation, in the phase of logical optimization, nested expressions are rewritten into set operations. In the translation phase, an SFW-query block is mapped to an algebraic expression consisting of a selection followed by a map: select e1 from x in e2 where e3 α[x : e1] (σ[x : e3](e2)) σ computes the selection e3 α projection e1 9

Optimization Of Nested Algebra Queries: The example queries given below concern the database which in ADL, have the types of SUPPLIER and PART defined as follows: SUPPLIER : { < eid : oid, sname : string, parts : { <pid : oid>}>} PART : {< pid : oid, pname : string, price : int, color : string>} We distinguish three ways of optimizing nested ADL queries: (1) the unnesting of attributes by using the unnest operator, (2) the unnesting of nested expressions by transforming them into relational join queries, (3) using new operators that are defined especially to enhance performance. 10

1. Unnesting Of Attributes Disadvantages: - nesting and unnesting are inverse to each other not for all relations - first unnesting and later nesting again will be expensive due to duplication of attribute values and overhead caused by restructuring. Query : Select the identifiers of suppliers supplying non-existing parts π sid( σ[s : z s.parts p PART z = p[pid]])(supplier)) The set-valued attribute parts is not needed in the result, so the above query may be rewritten into the antijoin query: ( (SUPPLIER )) > πsid µparts Note that because z is existentially quantified, the loss of tuples with empty set-valued attribute parts causes no problem (existential quantification over the empty set delivers false). PART s,p:s.pid=p.pid 11

2. Transformation into join queries In some cases two or more consecutive levels of nesting can be replaced by a join, antijoin, or semijoin operator, reducing the number of levels of nesting. In the ideal case all nesting has disappeared. Query: Select the suppliers supplying red parts. σ[ s : tz s.parts p PART z = p[pid]] p.color = "red"](supplier) This query can be rewritten into the semijoin query: SUPPLIER >< σ[p.color = "red"](part) s,p:p[pid] є s.parts Note that because z is existentially quantified, the loss of tuples with empty set-valued attribute parts causes no problem (existential quantification over the empty set delivers false). 12

3. Using Special Operators The following query cannot be rewritten into a relational join query. New operators is really necessary toobtain an efficient implementation. Query: Select suppliers names together with the parts supplied. α[s : < sname = s.name, parts_suppl = σ[p : p[pid] s.parts](part) > ](SUPPLIER) query can be rewritten into the efficient set operation with nestjoin query: πsname. parts _ sup pl (SUPPLIER Note that each of the options above can be applied to the top level expression as well as to subexpressions thereof. s,p:p[pid] є s.parts;parts_suppl PART) 13

Rewrite strategy 1. Try to rewrite to the various relational join operators (join, antijoin, or semijoin). 2. If the above is not possible, try to flatten set valued attributes; if the nesting phase can be skipped, this may be a strategy worthwhile considering. 3. If the above is not possible, try to rewrite to one of the newly defined operators, because they were introduced to get a better performance compared to nested-loop processing. 4. If none of the above works, leave the query as it is, which means that it is executed by means of nested loops. 14

Rewriting into flat relational algebra: nesting in the WHERE-clause in the presence of set-valued attributes. The general format of a two-block OOSQL query with nesting in the WHERE-clause The goal of the transformation is the following: process is to transform the predicate P(x,Y ), whose second argument is set valued, into a predicate P, where values v are the members of Y. The types of P and P clearly differ: SELECT F(x) from the second argument of P a set FROM x X, y Y constructor is removed, resulting in predicate P WHERE P (x, v)λq(x, v) WITH v = G(x, y) SELECT F(x) FROM x X WHERE P(x, Y ) WITH Y = SELECT G(x, y) FROM y Y WHERE Q(x, v) 15

Rewriting into flat relational algebra: General formatof SFW Guery : α[x : F(x)](σ[x : P(x,Y )](X)) with Y = α[y : G(x,y)](σ[y : Q(x,y)](Y)), for simplisity F and G identity Then we have: σ[x : P(x,Y )](X) with Y = σ[y : Q(x,y)](Y) The query above is a nested query involving nested iteration over a base table: the outer selection predicate contains a subquery, which is a selection on base table Y. We want to transform this nested query into a join query, i.e. a query having no subqueries with base table operands. 16

Set Comparison Operations We concentrate on two-block nested expressions with set comparison operations between query blocks. Two unnesting techniques: - unnesting by rewriting into quantifier expressions, - unnesting by grouping,a technique well-known from the relational model however, to be of good use in complex models, they have to be adapted. We will do so by defining a new algebraic operator, the nestjoin operator. 17

Unnesting By Rewriting Into Quantifier Expressions 1 Rewriting Example 1: SET MEMBERSHIP σ[x : x.c є σ [y : q](y)](x) σ[x : y є σ [y : q](y) y = x.c](x) σ[x : y є Y y = x.cλq](x) X x x,y:y=x.c Λq Y E E - operator є is rewritten into an existential quantification. - select operation is removed from the operand (the range expression) of the existential quantifier, providing the possibility to translate the existential subquery into a semijoin operation In the last rewrite step. Rule 1 UNNESTING QUANTIFIER EXPRESSIONS Le t X and Y be table expressions, and let x not be free in Y, then: 1. σ[x : y є Y p](x) X x x,y:p Y 2. σ[x : y є Y p](x) X x,y:p Y E E A nested query with existential quantification is translated into a semijoin operation; negated existential (i.e.universal) quantification is dealt with by means of the antijoin operator. 18

Unnesting By Rewriting Into Quantifier Expressions 2 Rewriting Example 1: SET INCLUSION All set comparison operators can be rewritten into quantifier expressions, 19

Unnesting By Grouping Another way to deal with set comparison operators is to use grouping. Used in transforming nested queries with aggregate functions between query blocks. Consider the following nested query, 20 in Database Query Processing Universität Konstanz, 2005

Unnesting By Grouping The nested query is transformed into a flat join query consisting of (1) a join to evaluate the inner query block predicate, (2) a nest operation for grouping, (3) a selection for evaluating P, the predicate between blocks, (4) a final projection. 21

Nesting In The Map Operator Another example of the strategy of rewriting nested expressions into relational join expressions, but now concerning nesting in the map operator (i.e. in the SELECT-clause). The following equivalence rule can be used to transform a nested map operation into a join query:. Unnesting by grouping is a transformation technique that is generally applicable, if not for the occurrence of bugs. In the next section, we show how to avoid the occurrence of bugs by using the nestjoin operator. 22

New Algebraic Operators It is worthwhile to define new logical algebra operators whenever there can be found new access algorithms that are an improvement over nested-loop query processing. The Nestjoin Operator Materializing Set-Valued Attributes The PNHL Algorithm 23

The Nestjoin Operator The nestjoin operator is to be used for the unnesting of nested queries that cannot be rewritten into flat relational join operations. The nestjoin operator as defined above can be used for the transformation of two-block select expressions with arbitrary predicates between blocks. The simplified version of the two-block select query: 24

The Nest Join operator Is simply a modification of the join operator. Instead of producing the concatenation of every pair of matching tuples, for each left operand tuple a set is created to hold the (possibly modified) right operand tuples that match. The nest join of two tables X and Y on predicate Q with function G (the function applied to the right-hand tuples satisfying the join predicate) is defined as: In this expression, x++ <a=z> denotes the concatenation of the tuple x and the unary tuple <a=z>, in which a is an arbitrary label not occurring on the top level of X. An example of the nest join operation is found in Table 1, where flat relations X and Y are equijoined on the second attribute (the join function is the identity function). Note that or dangling tuples, the tuple x++<a=0> is present in the result. 25

The Nest Join operator 26

The PNHL Algorithm algorithm of for efficiently processing a nested expression in which a setvalued attribute is joined with a base table. The following query expresses a nested natural join (*) operation: The algorithm builds a hash table for those segments of operand PART that fit into main memory and then probes operand SUPPLIER against each segment of the hash table, thus building partial results. Partial results are merged in the second phase of the algorithm. Compared to the unnest-join-nest processing method, the algorithm achieves better performance. 27

Conclusion: In OOSQL, nesting may occur in the where-, from-, and selectclause. An additional complication in complex object models is the support for iteration over set-valued attributes. The goal is to transform nested OOSQL queries having correlated subqueries with base table expressions as operands into join queries in which base tables occur only at top level. We have shown that transformation of nested OOSQL queries dealing with set-valued attributes into relational join queries is not always possible To improve matters we have defined a new operator called the nestjoin operator. 28

Processing a General Nested Query: 6 Thank you for the attention!!! 29