Database Languages and their Compilers

Similar documents
Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Parser: SQL parse tree

Probabilistic analysis of algorithms: What s it good for?

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Query Processing & Optimization

CS 406/534 Compiler Construction Putting It All Together

Compiler Theory. (Semantic Analysis and Run-Time Environments)

COP 3402 Systems Software Syntax Analysis (Parser)

Relational Databases

Compiler Code Generation COMP360

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

LECTURE 3. Compiler Phases

Programming Languages Third Edition

Exponentiated Gradient Algorithms for Large-margin Structured Classification

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Chapter 11: Query Optimization

Fuzzy Hamming Distance in a Content-Based Image Retrieval System

Chapter 19 Query Optimization

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

SFU CMPT Lecture: Week 9

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

From Tableaux to Automata for Description Logics

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

R13 SET Discuss how producer-consumer problem and Dining philosopher s problem are solved using concurrency in ADA.

Contents. Chapter 1 SPECIFYING SYNTAX 1

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

An Experimental CLP Platform for Integrity Constraints and Abduction

Efficiency versus Convergence of Boolean Kernels for On-Line Learning Algorithms

From Clarity to Efficiency for Distributed Algorithms

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Compilers and Interpreters

Abstract Syntax Trees & Top-Down Parsing

Compilers and computer architecture From strings to ASTs (2): context free grammars

CSE 344 MAY 7 TH EXAM REVIEW

LOGIC AND DISCRETE MATHEMATICS

Extending xcom. Chapter Overview of xcom

8. Relational Calculus (Part II)

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

Introduction to Parsing

Query Optimization Overview. COSC 404 Database System Implementation. Query Optimization. Query Processor Components The Parser

Abstract Syntax Trees & Top-Down Parsing

Parsing II Top-down parsing. Comp 412

Abstract Syntax Trees & Top-Down Parsing

Graphs (MTAT , 4 AP / 6 ECTS) Lectures: Fri 12-14, hall 405 Exercises: Mon 14-16, hall 315 või N 12-14, aud. 405

Query Optimization. Introduction to Databases CompSci 316 Fall 2018

This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.0.

CS 415 Midterm Exam Fall 2003

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

Principles of Programming Languages [PLP-2015] Detailed Syllabus

Chapter 12: Query Processing

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

The Online Median Problem

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Event List Management In Distributed Simulation

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS 415 Midterm Exam Spring 2002

Syntax-Directed Translation. Lecture 14

Scan Scheduling Specification and Analysis

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Chapter 13: Query Optimization

SQL Part 3: Where Subqueries and other Syntactic Sugar Part 4: Unknown Values and NULLs

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS

Querying Data with Transact SQL

PHANTOM TYPES AND SUBTYPING

Theorem proving. PVS theorem prover. Hoare style verification PVS. More on embeddings. What if. Abhik Roychoudhury CS 6214

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

Join (SQL) - Wikipedia, the free encyclopedia

A Modality for Recursion

CST-402(T): Language Processors

Introduction to Compiler Design

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Exam 2 November 16, 2005

Alternatives for semantic processing

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser.

Horizontal Aggregations for Mining Relational Databases

VIVA QUESTIONS WITH ANSWERS

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

5. Semantic Analysis!

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CS 415 Midterm Exam Spring SOLUTION

CPSC 310: Database Systems / CSPC 603: Database Systems and Applications Final Exam Fall 2005

Compiler Design (40-414)

Control-Flow Graph and. Local Optimizations

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COSC 404 Database System Implementation. Query Optimization. Dr. Ramon Lawrence University of British Columbia Okanagan

Parsing. Zhenjiang Hu. May 31, June 7, June 14, All Right Reserved. National Institute of Informatics

CMSC 330: Organization of Programming Languages

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction. 1. Introduction. Overview Query Processing Overview Query Optimization Overview Query Execution 3 / 591

Chapter 2 A Quick Tour

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

(Refer Slide Time: 4:00)

Polls on Piazza. Open for 2 days Outline today: Next time: "witnesses" (traditionally students find this topic the most difficult)

Transcription:

Database Languages and their Compilers Prof. Dr. Torsten Grust Database Systems Research Group U Tübingen Winter 2010 2010 T. Grust Database Languages and their Compilers

4 Query Normalization Finally, we have reached a point where our understanding (read: syntactic and semantic analysis) of the input query is complete enough to plan its execution on the underlying DBMS. What we need now is a mapping from MCC to the primitives of the query engine. These primitives (operators) may usually be connected in a tree-like fashion to form a query execution plan: Õ Å Ã 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 115

We will, of course, aim to assemble execution plans which 1 contain the minimal number of necessary operators ; this will reduce the need to allocate and manage intermediate query results, thus making the overall I/O costs cheaper; 2 use those operators whose internal workings are simple and thus are efficiently executable; this will help to reduce CPU as well as I/O costs (consider the need for temporary storage during operator execution, e.g., sorting). 3 adhere to specific tree shapes: left-deep right-deep bushy (but we will postpone a discussion of this right now.) 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 116

For a first approach to find such a mapping from MCC to query plans, we can fall back on the ÓÐ-based MCC semantics (see Chapter 2): We need a single type of -operator implemented in our query engine, namely ÓÐ; equations (1) through (3) of Section 2.3 provide the mapping from MCC to execution plans; 4.1 Comprehension Normalization SQL ÃMCC ÃÓÐ provides a complete query processor framework for SQL. Unfortunately, ÓÐ s internal workings are not simple in the sense of our previous discussion. Each instance of ÓÐ we can get rid of in a query plan is worth to search for. 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 117

½ Ê ÙÑ ½ Ê ÙÑ MCC provides a number of hooks to simplify its expressions while their meaning is preserved: From a bird s eye point of view, the overall structure of a comprehension computing an SQL SUM-aggregate query is (see cases (g) and (h) of mapping Ì): Claim: Irrespective of, this computes the same value as The second comprehension saves a ÓÐ. A comparison of the ÓÐ corresponding programs for both comprehensions justifies this equivalence-preserving simplification step. 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 118

Õ Ú ¼ Õ ¼ Å ¼ Õ ¼¼ Å Õ Õ ¼ Ú ¼ Õ ¼¼ Å remove ½ Ê ÙÑ ÙÑ This simplification step is an instantiation of a far more general rewriting that MCC allows us to do: Comprehension Unnesting (Å Å ¼ in general): The comprehension on the right hand side uses one generator ( ) less. The unnesting rule implements the simplification procedure of the previous slide: ßÞÐ Õ ½ Ê ßÞÐ ¼¼ Õ unnest ½ Ê ÙÑ 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 119

Nested SQL queries of the form shown here may lead to inefficient query execution plans: Let Ô Ô ¼ denote two SQL predicates: SELECT DISTINCT Ü Õ FROM Ü AS WHERE EXISTS Þ IN ¼ Ý SELECT FROM Ý AS Ô Ü Ýµ WHERE ½ Ô ¼ Ü Þµ ßÞ Ð subquery Note that the nested subquery is correlated: Ô Ü Ýµ depends on tuple variable Ü bound in the outer query; consequently, we have to re-evaluate the subquery for every binding of Ü; it looks like we have to read times. 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 120

Ì Õ ¼ Õ ¼ ÓÑ Õ ¼¼ Ø Õ Õ ¼ ¼ Õ ¼¼ Ø Ü Üµ Ü Ô ¼ Ü Þµ Þ Ý Ýµ Ý Ô Ü Ýµ ÓÑ Ø We can improve the situation with the help of a new MCC rewriting rule: Unnesting of Existential Quantification: N.B.: Unnesting of existential quantifiers is valid only inside a Ø comprehension. This is applicable to Õ after we have switched to MCC with the help of Ì: Ì Õµ unnest Ü Üµ Ü Ô ¼ Ü Þµ Ý Ô Ü Ýµ Þ Ý Ýµ ÓÑ Ø remove Ü Üµ Ü Ô ¼ Ü Ý Ýµµ Ý Ô Ü Ýµ ÓÑ Ø ÓÑ unnest Ü Üµ Ü Ý Ô ¼ Ü Ý Ýµµ Ô Ü Ýµ Ø 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 121

If we Ì apply to the rewriting results, we now see that Õ is nothing else but a join ½ between and : SELECT Ü DISTINCT FROM Ü AS AS Ý WHERE Ô ¼ Ü Ý Ýµµ AND Ô Ü Ýµ DBMS query engines can cope with queries of this type much more efficiently (various join algorithms are applicable now). Correlation matters. If we replace Ô Ü Ýµ by Ô Ýµ in the original query Õ, can you take advantage of this modification? How about replacing Ô Ü Ýµ by Ô Üµ? 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 122

In company with two additional laws, the repeated and exhaustive application of the unnesting rules rewrites a comprehension into its normal form:! Comprehension Normalization: Õ Ú ¼ Õ ¼ Å ¼ Õ ¼¼ Å Õ Õ ¼ Ú ¼ Õ ¼¼ (4) Å Õ ¼ Õ ¼ ÓÑ Õ ¼¼ Ø Õ Õ ¼ ¼ Õ ¼¼ (5) Ø Õ Ú ÐØ Å ¼ ¼ µ Õ ¼ Å Õ Ú ¼ Õ ¼ (6) Å Õ Ú ¼ Þ Å Õ ¼ Å Þ (7) Å 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 123

4.2 Further Rewriting Laws The expressions of MCC are subject to a large number of additional equivalence-preserving rewritings and simplifications. If some of these rules appear to be too obvious, recall that we are trying to implement a completely unguided system so that we have to declare even the most simple facts; rewriting opportunities might occur due to the application of (a number of) other rewriting laws. 4.2.1 Constant Folding ÓÒ Ø ½ ÓÒ Ø ÓÒ Ø Ã ÓÒ Ø ÓÒ Ø Ã Ã ½ ¾ ½ ¾ ¾ ½ ¾ 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 124

Note that the rewriting laws are directed ( Ãinstead of ) to avoid infinite rewriting loops. The first two rules aim to construct a normal form of arithmetic expressions in which numeric constants appear left-most, e.g.: ½¼ ¼µ ¼µ ¾µ ܵ ݵ Þ three applications of the last law finally produce ¾ ܵ ݵ Þ. Some more examples: ÓÒ Ø Ð Ã ÓÒ Ø Ã ÓÒ Ø Ã ØÖÙ Ð ØÖÙ Ð ½ ½ ÐÒ Ò These laws, in turn, may then make the following applicable: Õ ØÖÙ Õ ¼ Å Ã Õ Õ ¼ Å Õ Ð Õ ¼ Å ÃÞ Å 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 125

4.2.2 Reordering the Qualifier List For a Õ comprehension, the order of qualifiers in Õ obviously makes a Å difference: moving a predicate towards the beginning of Õ allows to evaluate filters early; moving generators inside Õ may lead to interesting join orders (later). 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 126

Ô Õ Å Ô Õ Å ¼ Õ Ô Å Õ Ô Å 1 Exchanging predicates: To see that two adjacent predicates Ô Õ in a qualifier list may be transposed, note that See Section 2.3 (rule (3)) and use the equivalence if Ô then if Õ then else else ¼ Ô Õ if then else. ¼ Finally: Ô Õ Å Ô Õ Å commutative 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 127

Ô Ú ¼ Å Ú ¼ Å 2 Moving generators: Generators introduce variables, so we have to be careful not to move expressions out of scope and to avoid name capture. (a) Transposing a generator and a (Ô predicate may not reference Ú): Ú ¼ Ô Å (b) Transposing two generators ( ¼ may not reference Ú ¼, ¼¼ may not reference Ú, Å ¾ Ø ): Ú Ú ¼ Ú ¼ ¼¼ Å Ú ¼ ¼¼ Ú Ú ¼ ¼ N.B.: These laws are specified for pairs of qualifiers but they can be easily lifted onto sequences of qualifiers of length ¾. 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 128

Contents 0 Introduction to Query Compilation 1 0.1 Welcome!.................................. 1 0.2 Administrativa............................... 3 0.3 Some Remarks on these Slides...................... 4 0.4 Relational Databases and SQL...................... 6 0.5 A Guided Tour through an SQL Query Processor............ 14 0.5.1 Character Streams and Tokens.................. 15 0.5.2 Identify Valid SQL Syntax..................... 16 0.5.3 Resolve the Meaning of Variables and Identifiers........ 17 0.5.4 Check Types and Schemas..................... 18 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 129

0.5.5 How Could Query Evaluation Look Like?............ 19 0.5.6 One Query, Many Programs.................... 20 0.5.7 Query Operators.......................... 21 0.5.8 Query Operator Trees and Rewriting............... 22 0.5.9 The Cheaper, the Better: Query Cost Models.......... 23 0.5.10 Executing Operator Trees on a Machine............. 24 0.6 Recommended Reading.......................... 27 1 Query Parsing 28 1.1 The Scanner (Lexer)............................ 28 1.1.1 Regular Expressions........................ 33 1.1.2 Regular Expressions for SQL................... 35 1.1.3 lex A Scanner Generator.................... 39 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 130

1.2 Syntax Analysis (Parsing)......................... 42 1.2.1 Context-Free Grammars...................... 44 1.2.2 Derivations and Parse Trees.................... 46 1.2.3 Ambiguous Grammars....................... 50 1.2.4 Predictive Parsing and Recursive Descent............ 54 1.2.5 yacc A Parser Generator.................... 63 1.3 Recommended Reading.......................... 67 2 A Query Calculus for SQL 68 2.1 SQL s Nested Loops Semantics...................... 69 2.2 ÓÐ..................................... 71 2.3 The Monoid Comprehension Calculus.................. 75 2.4 Mapping SQL to MCC........................... 81 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 131

2.4.1 Applying Ì to Translate SQL Queries.............. 91 2.5 Recommended Reading.......................... 95 3 Variable Scopes and Type Inference for SQL 96 3.1 Variables in SQL.............................. 97 3.1.1 Variable Environments....................... 99 3.2 The Type of an SQL Query........................ 102 3.2.1 Type Inference........................... 103 3.3 Semantic Attributes............................ 105 4 Query Normalization 115 4.1 Comprehension Normalization....................... 117 4.2 Further Rewriting Laws.......................... 124 4.2.1 Constant Folding.......................... 124 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 132

4.2.2 Reordering the Qualifier List................... 126 2010 T. Grust Database Languages and their Compilers: 4. Query Normalization 133