CS122 Lecture 4 Winter Term,

Similar documents
CS122 Lecture 3 Winter Term,

CS122 Lecture 3 Winter Term,

CS122 Lecture 5 Winter Term,

CS122 Lecture 10 Winter Term,

CS122 Lecture 8 Winter Term,

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

Advanced Database Systems

Database System Concepts

Query Processing & Optimization

Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

DBMS Query evaluation

Database System Concepts

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser.

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Chapter 13: Query Processing

Midterm Review. Winter Lecture 13

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Ch 5 : Query Processing & Optimization

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Chapter 14 Query Optimization

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Chapter 14: Query Optimization

Chapter 12: Query Processing

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

CS121 MIDTERM REVIEW. CS121: Relational Databases Fall 2017 Lecture 13

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Introduction to Database Systems CSE 444

Overview of Implementing Relational Operators and Query Evaluation

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Relational Algebra. Procedural language Six basic operators

CMSC424: Database Design. Instructor: Amol Deshpande

Chapter 3. Algorithms for Query Processing and Optimization

Principles of Data Management. Lecture #9 (Query Processing Overview)

Relational Query Optimization

Parser: SQL parse tree

Relational Databases

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Principles of Data Management. Lecture #12 (Query Optimization I)

CSE 544 Principles of Database Management Systems

Fundamentals of Database Systems

Relational Model, Relational Algebra, and SQL

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Relational Algebra and SQL

Ian Kenny. November 28, 2017

Lecture 17: Query execution. Wednesday, May 12, 2010

Query Processing and Query Optimization. Prof Monika Shah

Querying Data with Transact SQL

Relational Query Optimization. Highlights of System R Optimizer

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Database Systems. Project 2

CSIT5300: Advanced Database Systems

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Query processing and optimization

Introduction Alternative ways of evaluating a given query using

Index. Bitmap Heap Scan, 156 Bitmap Index Scan, 156. Rahul Batra 2018 R. Batra, SQL Primer,

T-SQL Training: T-SQL for SQL Server for Developers

5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator. 5.3 Parser and Translator

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

RELATIONAL DATA MODEL: Relational Algebra

2. Make an input file for Query Execution Steps for each Q1 and RQ respectively-- one step per line for simplicity.

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

EECS 647: Introduction to Database Systems

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Optimization of Nested Queries in a Complex Object Model

CMP-3440 Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 7 Introduction to Structured Query Language (SQL)

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Informatik Vertiefung Database Systems

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Oracle Database: Introduction to SQL Ed 2

SQL Data Query Language

DB2 SQL Class Outline

Implementation of Relational Operations

Missing Information. We ve assumed every tuple has a value for every attribute. But sometimes information is missing. Two common scenarios:

CS330. Query Processing

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5

Relational Database Systems 2 5. Query Processing

Evaluation of relational operations

SQL: Data Manipulation Language. csc343, Introduction to Databases Diane Horton Winter 2017

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

SQL: csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Sina Meraji. Winter 2018

Relational Model: History

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

Transcription:

CS122 Lecture 4 Winter Term, 2014-2015

2 SQL Query Transla.on Last time, introduced query evaluation pipeline SQL query SQL parser abstract syntax tree SQL translator relational algebra plan query plan optimizer annotated execution plan evaluation engine query result To evaluate SQL queries, must solve several problems: 1. Implement relational algebra operations in some way 2. Translate the SQL abstract syntax tree (AST) into a corresponding relational algebra plan 3. Figure out how to evaluate plan and generate results

3 Plan Crea.on and Op.miza.on Some databases use slightly different representations between initial query plan and optimized plan e.g. initial plan uses abstract relational algebra expressions without any implementation details at all Query optimizer adds in these details as annotations Annotated plan nodes are called evaluation primitives They can be directly used to evaluate the query plan SQL translator relational algebra plan query plan optimizer annotated execution plan table statistics

4 Plan Crea.on and Op.miza.on Other databases use the same representation for both All generated plans contain implementation details Initial query plans may be very unoptimized and use the slowest, most general implementations Optimizations can replace slow implementations with faster ones, and/or apply other transformations (NanoDB uses this approach) SQL translator relational algebra plan query plan optimizer annotated execution plan table statistics

5 Evalua.on Primi.ves Implementations of relational algebra operations are called evaluation primitives Don t always correspond directly to relational algebra Example: SELECT * FROM t WHERE a = 15 σ a=15 (t) If t is a heap ]ile: Could create two components, a select node, and another ]ile- scan node that produces all tuples in t σ a=15 t

6 Evalua.on Primi.ves (2) Example: SELECT * FROM t WHERE a = 15 σ a=15 (t) t What if t is ordered or hashed on attribute a? What if t has an (ordered or hashed) index on a? Can t really take advantage of ]ile organization or other access paths if select- predicate is applied separately Can also create a ]ile- scan node with a predicate Evaluation primitives are often more powerful than their corresponding relational algebra operations Allows us to optimize the implementations, then use the optimizations when constructing our plans σ a=15 t a=15

7 Evalua.on Primi.ves (3) Example: SELECT * FROM t AS t1, t AS t2 WHERE t1.a < t2.a Table t is accessed twice, and is σ t1.a < t2.a renamed in query plan Insert extra rename nodes into plan? Sole operation is to rename table in node s output schema (This is NanoDB s approach.) ρ t1 t ρ t2 t Or, give plan nodes ability to handle simple renaming ops? When plan nodes produce their schemas, can easily apply renaming at that point

8 Evalua.on Primi.ves (4) Join operations usually implemented with theta- join More advanced/]lexible than simple translation using Cartesian product, or simple natural- join operator Implementation can also be con]igured to produce inner join, or left/right/full outer join, where supported SELECT * FROM t1, t2 WHERE t1.a = t2.a AND t2.b > 5; Can evaluate in multiple ways: σ t1.a = t2.a t2.b > 5 σ t2.b > 5 θ t1.a = t2.a θ true θ t1.a = t2.a t1 t2 t1 t2 t1 t2 t2.b > 5

9 Evalua.on Primi.ves (5) SELECT * FROM t1, t2 WHERE t1.a = t2.a AND t2.b > 5; σ t1.a = t2.a t2.b > 5 σ t2.b > 5 θ t1.a = t2.a θ true θ t1.a = t2.a t1 t2 t1 t2 t1 Ideally, can implement theta- join to take advantage of join condition when possible Perform equijoins more quickly Take advantage of ordered data, or indexes on inputs t2 t2.b > 5

10 Evalua.on Primi.ves (6) Many join implementations can do several kinds of join Implement inner join, left outer join, full outer join Implement semijoin and antijoin operations as well (will discuss more in a future lecture) Con]igure plan node to do the required operation in plan By combining multiple operations in plan nodes: Can implement wide range of queries without needing large, complicated plans, or many kinds of plan nodes Can take advantage of certain situations to implement the operation in a much faster way

11 Plan Evalua.on Previous example, slightly altered: SELECT c FROM t1, t2 WHERE t1.a = t2.a AND t2.b > 5 One evaluation approach: Each node is evaluated completely, and its results are saved in a temporary table (postorder tree traversal) Evaluate t1! t1 (no- op) Evaluate σ b>5 (t2)! temp1 Evaluate t1.a=t2.a (t1, temp1)! temp2 Evaluate Π t2.c (temp2)! result t1 Π t2.c θ t1.a = t2.a t2 t2.b > 5

12 Plan Evalua.on (2) Called materialized evaluation Each node s results are materialized into a temporary table (possibly onto disk) Issues with this approach? For large tables, causes many additional disk accesses on top of ones already required for plan- node evaluation! (Small temporary results can be held in memory.) Another evaluation approach: pipelined evaluation Evaluate multiple plan nodes simultaneously Results are passed tuple- by- tuple to the next plan node t1 Π t2.c θ t1.a = t2.a t2 t2.b > 5

13 Plan Evalua.on (3) Π t2.c Several ways to implement pipelined evaluation Demand-driven pipeline: Rows are requested (pulled) from top of plan When a plan- node must produce a row, it t1 requests rows from its child nodes until it can produce one Example: Top- level output loop requests a row from Π t2.c node Π t2.c node requests the next row from t1.a=t2.a node t1.a=t2.a node requests rows from its children until it can produce a joined row σ t2.b>5 node scans through t2 until it ]inds a row with b > 5 θ t1.a = t2.a t2 t2.b > 5

14 Plan Evalua.on (4) Π t2.c Producer-driven pipeline: Each plan- node independently generates θ rows and pushes them up the plan t1.a = t2.a Plan nodes communicate via queues t1 t2 Primarily used in parallel databases Planner hands subplans (or individual plan nodes) to different processors to compute Processors independently evaluate plan components and push tuples to the next stage in the plan Sequential databases generally use demand- driven pipelines for query evaluation t2.b > 5

15 Blocking Opera.ons Not all operations can be pipelined An obvious one: sorting SELECT * FROM t WHERE a < 25 ORDER BY b; Sort plan- node must completely consume its input before it can produce any rows Called blocking operations Some databases take blocking operations into account e.g. PostgreSQL s planner computes two estimates for each plan node: the cost to produce all rows the cost to produce the ]irst row SORT b For a top- N query, makes sense to minimize time to ]irst row t a < 25

16 Blocking Opera.ons (2) Some operations can be implemented in blocking or in pipelined ways Grouping/aggregation operation SELECT username, SUM(score) AS total_score FROM game_scores GROUP BY username; username G sum(score) as total_score (game_scores) If incoming tuples are already sorted on username: Can apply aggregate function to runs of tuples with same username value, and produce output rows along the way If incoming tuples are not sorted on username: Must either use a hash- table, or must sort internally Either way, the operation will be blocking

17 SQL Query Transla.on (2) For now, ignore the question of how to implement speci]ic relational algebra operations (Most are straightforward anyway) SQL doesn t map directly to the relational algebra Nested subqueries!!!! Correlated evaluation!!!! Grouping and aggregation is also complicated Basic SQL syntax maps easily to relational algebra Explored this in CS121

18 Mapping Basic SQL Queries SELECT * FROM t1, t2, t1 t2 SELECT * FROM t1, t2, WHERE P σ P (t1 t2 ) SELECT e1 AS a1, e2 AS a2, FROM t1, t2, e1, e2, are expressions using columns in t1, t2, a1, a2, are aliases (alternate names) for e1, e2, Π e1 as a1,e2 as a2, (t1 t2 ) SELECT e1 AS a1, e2 AS a2, FROM t1, t2, WHERE P Π e1 as a1,e2 as a2, (σ P (t1 t2 ))

19 Mapping Basic SQL Queries (2) SELECT e1 AS a1, e2 AS a2, FROM t1, t2, WHERE P Π e1,e2, (σ P (t1 t2 )) This mapping is somewhat confusing, because many databases accept queries that don t work with this Example: SELECT a + c AS v FROM t WHERE v < 25; Following the above mapping: Π a+c as v (σ v<25 (t)) Doesn t make sense v isn t de]ined in select predicate! The SQL standard is very clear (and simple!): P is only allowed to refer to columns in the FROM clause

20 Mapping Basic SQL Queries (3) Can easily support non- standard syntax by recording select- clause aliases in the AST representation Example: SELECT a + c AS v FROM t WHERE v < 25; Traverse SELECT clause; record alias: v = a + c In the WHERE predicate: anytime v is used, replace it with expression a + c Also do this with ON clauses in joins, HAVING clauses, etc. Allows us to follow previous mapping: Π a+c as v (σ a+c<25 (t)) Other techniques as well, but same idea

21 SQL Grouping/Aggrega.on Grouping and aggregation are signi]icantly more dif]icult SELECT g1, g2,, e1, e2, FROM t1, t2, WHERE Pw GROUP BY g1, g2, HAVING Ph g1, g2, are expressions whose values are grouped on e1, e2, are expressions involving aggregate functions e.g. MIN(), MAX(), COUNT(), SUM(), AVG() Approximately maps to: σ Ph ( g1,g2, G e1,e2, (σ Pw (t1 t2 ))) What makes this challenging: g1, g2, are not required to be simple column refs e1, e2, are not required to be single aggregate fns Ph can also contain aggregate function calls not in e i

22 SQL Grouping/Aggrega.on (2) This is an acceptable grouping/aggregate query: SELECT a - b AS g, 3 * MIN(c) + MAX(d * e) FROM t GROUP BY a - b HAVING SUM(f) < 20 Clearly can t use our mapping from last slide: σ Ph ( g1,g2, G e1,e2, (σ Pw (t1 t2 ))) e.g. Ph is SUM(f) < 20, but we don t compute SUM(f) in G step Problem: SQL mixes grouping/aggregation, project and select parts of the query together Need to rewrite query to separate these different parts Makes translation into relational algebra straightforward

23 SQL Grouping/Aggrega.on (3) Our initial query: SELECT a - b AS g, 3 * MIN(c) + MAX(d * e) FROM t GROUP BY a - b HAVING SUM(f) < 20 Step 1: Identify and extract all aggregate functions Replace with auto- generated column references (Use names that users can t enter, e.g. starting with # ) Rewrite the query: SELECT a - b AS g, 3 * "#A1" + "#A2" FROM t GROUP BY a - b HAVING "#A3" < 20 #A1 = MIN(c) #A2 = MAX(d * e) #A3 = SUM(f) Now we know what aggregates we need to compute

24 SQL Grouping/Aggrega.on (4) Rewritten query: SELECT a - b AS g, 3 * "#A1" + #A2" FROM t GROUP BY a - b HAVING "#A3" < 20 #A1 = MIN(c) #A2 = MAX(d * e) #A3 = SUM(f) Now we can translate grouping/aggregation and HAVING clause into relational algebra: σ #A3 < 20 ( a - b G MIN(c) as #A1, MAX(d * e) as #A2, SUM(f) as #A3 (t)) Finally, wrap this with a suitable project, based on SELECT clause contents Π a - b as g, 3 * #A1 + #A2 as 3 * MIN(c) + MAX(d * e) ( ) Note: second expression s name is implementation- speci]ic Can assign a placeholder name, e.g. unnamed1, Or, can generate a name based on expression being computed

25 SQL Grouping/Aggrega.on (5) Unfortunately, we still have a problem Our translation: Π a - b as g, (σ #A3 < 20 ( a - b G (t))) The project operation can t compute expression a - b a - b is already computed in grouping/aggregation phase Before attempting to project, we really also need to substitute in placeholders for grouping expressions SELECT a - b AS g, 3 * "#A1" + #A2" FROM t GROUP BY a - b HAVING "#A3" < 20 #A1 = MIN(c) #A2 = MAX(d * e) #A3 = SUM(f) #G1 = a - b

26 SQL Grouping/Aggrega.on (6) Finally, replace instances of grouping expressions in the SELECT clause with the corresponding names Translated: SELECT "#G1" AS g, 3 * "#A1" + #A2" FROM t GROUP BY a - b [AS "#G1"] HAVING "#A3" < 20 #A1 = MIN(c) #A2 = MAX(d * e) #A3 = SUM(f) #G1 = a - b Now we can carry on with our project, as before Π #G1 as g, (σ #A3<20 ( a- b as #G1 G (t))) Aside: this also allows us to handle crazy SQL like SELECT 3 * (a - b) AS g, GROUP BY a - b

27 SQL Grouping/Aggrega.on (7) Finally, this is an ANSI SQL query: SELECT a - b AS g, 3 * MIN(c) + MAX(d * e) FROM t GROUP BY a - b HAVING SUM(f) < 20 GROUP BY and HAVING clauses cannot use SELECT aliases Some databases allow the nonstandard GROUP BY g instead of requiring the ANSI- standard GROUP BY a - b Similarly, HAVING can refer to renamed aggregate expressions Can use our alias techniques from earlier e.g. traverse SELECT, record alias: g = a - b If query says GROUP BY g, substitute in de]inition of g (Apply similar techniques to HAVING clause)

28 Join Expressions Original SQL form: SELECT FROM t1, t2, WHERE P List of relations in FROM clause Any join conditions speci]ied in WHERE clause Can t specify outer joins SQL- 92 introduced several new forms: SELECT FROM t1 JOIN t2 ON t1.a = t2.a SELECT FROM t1 JOIN t2 USING (a1, a2, ) SELECT FROM t1 NATURAL JOIN t2 Can specify INNER, [LEFT RIGHT FULL] OUTER JOIN Also CROSS JOIN, but cannot specify ON, USING, or NATURAL

29 Join Expressions (2) SQL FROM clauses can be much more complex: SELECT * FROM t1, t2 LEFT JOIN t3 ON (t2.a = t3.a) WHERE t1.b > t2.b; FROM clause is comma- separated list of join expressions JOIN expressions are binary operations Operate on two relations; left- associative Similarly, interpret FROM join_expr, join_expr as a binary operation A Cartesian product between two join expressions Expressions themselves may involve JOIN operations (the, operator is lower precedence than JOIN keyword)

30 Join Expressions (3) FROM clause is parsed into a binary tree of join exprs Can use parentheses to override precedence, where necessary This binary tree is straightforward to translate Translate left subtree into relational algebra plan Translate right subtree into relational algebra plan Create a new plan from these subtrees based on the kind of join being performed Note: This is a naïve translation of the join expression, and probably horribly inef]icient Will discuss solutions for this in the future