Announcements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression

Similar documents
Introduction to Data Management CSE 344. Lectures 9: Relational Algebra (part 2) and Query Evaluation

Introduction to Database Systems CSE 444

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Announcements. Agenda. Database/Relation/Tuple. Discussion. Schema. CSE 444: Database Internals. Room change: Lab 1 part 1 is due on Monday

Announcements. Agenda. Database/Relation/Tuple. Schema. Discussion. CSE 444: Database Internals

CSE 344 APRIL 20 TH RDBMS INTERNALS

Lecture 17: Query execution. Wednesday, May 12, 2010

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 344 JANUARY 26 TH DATALOG

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation

CSEP 544: Lecture 04. Query Execution. CSEP544 - Fall

CSE 344 FEBRUARY 14 TH INDEXING

CSE 444: Database Internals. Lecture 3 DBMS Architecture

CSE 444: Database Internals. Lecture 3 DBMS Architecture

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Database Systems CSE 414

CSE 444: Database Internals. Lectures 5-6 Indexing

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

CSE 344 JULY 9 TH NOSQL

Introduction to Database Systems CSE 344

Database Management Systems CSEP 544. Lecture 3: SQL Relational Algebra, and Datalog

Lecture 19: Query Optimization (1)

CSE 544 Principles of Database Management Systems

CSE 344 APRIL 27 TH COST ESTIMATION

Review: Query Evaluation Steps. Example Query: Logical Plan 1. What We Already Know. Example Query: Logical Plan 2.

What we already know. Late Days. CSE 444: Database Internals. Lecture 3 DBMS Architecture

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

CSE 344 FEBRUARY 21 ST COST ESTIMATION

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Database Systems CSE 414

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra.

Announcements. Subqueries. Lecture Goals. 1. Subqueries in SELECT. Database Systems CSE 414. HW1 is due today 11pm. WQ1 is due tomorrow 11pm

Lecture 21: Query Optimization (1)

Database Management Systems CSEP 544. Lecture 6: Query Execution and Optimization Parallel Data processing

Introduction to Database Systems CSE 414

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

CSE 344 JANUARY 19 TH SUBQUERIES 2 AND RELATIONAL ALGEBRA

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CSE344 Midterm Exam Fall 2016

Database Applications (15-415)

CSE 544 Principles of Database Management Systems

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Database Systems CSE 544. Lecture #2 SQL The Complete Story

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

CSIT5300: Advanced Database Systems

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Lecture 20: Query Optimization (2)

CS 564 Final Exam Fall 2015 Answers

ECS 165B: Database System Implementa6on Lecture 7

Introduction to Database Systems CSE 414. Lecture 8: SQL Wrap-up

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Relational Query Optimization

Before we talk about Big Data. Lets talk about not-so-big data

EECS 647: Introduction to Database Systems

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

CSE 544: Principles of Database Systems

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

CSE 344 Midterm Nov 1st, 2017, 1:30-2:20

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Query Processing and Query Optimization. Prof Monika Shah

Magda Balazinska - CSE 444, Spring

DBMS Query evaluation

Relational Query Optimization. Highlights of System R Optimizer

CSE 344 MAY 7 TH EXAM REVIEW

Today's Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Example Database. Query Plan Example

Introduction to Data Management CSE 344. Lectures 8: Relational Algebra

Query Processing & Optimization

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

CSE 344 APRIL 11 TH DATALOG

CompSci 516 Data Intensive Computing Systems

Parser: SQL parse tree

Database Tuning and Physical Design: Basics of Query Execution

COMP9311 Week 10 Lecture. DBMS Architecture. DBMS Architecture and Implementation. Database Application Performance

Advanced Database Systems

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Query Processing & Optimization. CS 377: Database Systems

Chapter 12: Query Processing

Database System Concepts

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

CSE 344 Midterm Exam

Overview of Implementing Relational Operators and Query Evaluation

Principles of Data Management. Lecture #12 (Query Optimization I)

Introduction to Data Management CSE 344

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Lecture 03: SQL. Friday, April 2 nd, Dan Suciu Spring

Database Applications (15-415)

CSE 544 Principles of Database Management Systems

CSE Midterm - Spring 2017 Solutions

Transcription:

Announcements Introduction to Data Management CSE 344 Webquiz 3 is due tomorrow Lectures 9: Relational Algebra (part 2) and Query Evaluation 1 2 Query Evaluation Steps Translate query string into internal representation Logical plan à physical plan SQL query Parse & Check Query Decide how best to answer query: query optimization Query Execution Return Results Check syntax, access control, table names, etc. Query Evaluation 3 Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) From SQL to RA SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = y.cid and x.price > 100 and z.city = Seattle 4 Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) From SQL to RA SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z δ WHERE x.pid = y.pid and y.cid = y.cid and x.price > 100 and z.city = Seattle Π x.name,z.name An Equivalent Expression Query optimization = finding cheaper, equivalent expressions SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z δ WHERE x.pid = y.pid and y.cid = y.cid and x.price > 100 and z.city = Seattle Π x.name,z.name σ price>100 and city= Seattle cid=cid Product pid=pid Purchase cid=cid Customer 5 pid=pid σ price>100 Product Purchase σ city= Seattle Customer 6 1

Extended RA: Operators on Bags Duplicate elimination δ Grouping γ Sorting τ Logical Query Plan SELECT city, count(*) FROM sales GROUP BY city HAVING sum(price) > 100 Π city, c σ p > 100 T3(city, c) T2(city,p,c) T1(city,p,c) γ city, sum(price) p, count(*) c T1, T2, T3 = temporary tables sales(product, city, price) 7 8 Typical Plan for Block (1/2)... Typical Plan For Block (2/2) having condition π fields γ fields, sum/count/min/max(fields) σ selection condition join condition SELECT-PROJECT-JOIN Query π fields σ selection condition join condition join condition R S 9 10 (sno,sname,scity,sstate) How about Subqueries? (sno,sname,scity,sstate) How about Subqueries? and not exists (SELECT * WHERE P.sno = Q.sno and P.price > 100) and not exists (SELECT * WHERE P.sno = Q.sno and P.price > 100) Correlation! 11 12 2

How about Subqueries? and not exists (SELECT * WHERE P.sno = Q.sno and P.price > 100) (sno,sname,scity,sstate) De-Correlation and Q.sno not in (SELECT P.sno WHERE P.price > 100) 13 How about Subqueries? ( ) EXCEPT (SELECT P.sno WHERE P.price > 100) EXCEPT = set difference Un-nesting (sno,sname,scity,sstate) and Q.sno not in (SELECT P.sno WHERE P.price > 100) 14 (sno,sname,scity,sstate) How about Subqueries? ( ) EXCEPT (SELECT P.sno WHERE P.price > 100) Finally π sno σ sstate= WA π sno σ Price > 100 From Logical Plans to Physical Plans 15 16 Query Evaluation Steps Review SQL query (sid, sname, scity, sstate) Example Query optimization Parse & Rewrite Query Select Logical Plan Select Physical Plan Query Execution Logical plan Physical plan Give a relational algebra expression for this query Disk 17 18 3

(sid, sname, scity, sstate) Relational Algebra (σ scity= Seattle sstate= WA pno=2 ( sid = sid )) (sid, sname, scity, sstate) Relational algebra expression is also called the logical query plan Relational Algebra σ scity= Seattle sstate= WA pno=2 sid = sid 19 20 (sid, sname, scity, sstate) Physical Query Plan 1 σ scity= Seattle sstate= WA pno=2 (Nested loop) sid = sid A physical query plan is a logical query plan annotated with physical implementation details 21 (sid, sname, scity, sstate) Physical Query Plan 2 σ scity= Seattle sstate= WA pno=2 (Hash join) sid = sid Same logical query plan Different physical plan 22 (sid, sname, scity, sstate) (Sort-merge join) (Scan & write to T1) Physical Query Plan 3 sid = sid (a) σ scity= Seattle sstate= WA (d) (c) Different but equivalent logical query plan; different physical plan (b) σ pno=2 (Scan & write to T2) Query Optimization Problem For each SQL query many logical plans For each logical plan many physical plans How do find a fast physical plan? Will discuss in a few lectures 23 24 4

Demonstration with SQL Server Management Studio Query Execution 25 26 Iterator Interface Initializes operator state Sets parameters such as selection condition Operator invokes get_ recursively on its inputs Performs processing and produces an output tuple close(): clean-up state Pipelined Query Execution (Nested loop) σ sscity= Seattle sstate= WA pno=2 s sno = sno 27 Supplies 28 Pipelined Query Execution (Nested loop) σ sscity= Seattle sstate= WA pno=2 s sno = sno nex() Supplies 29 Pipelined Execution Tuples generated by an operator are immediately sent to the parent Benefits: No operator synchronization issues No need to buffer tuples between operators Saves cost of writing intermediate data to disk Saves cost of reading intermediate data from disk This approach is used whenever possible 30 5

Intermediate Tuple Materialization Tuples generated by an operator are written to disk an in intermediate table No direct benefit Necessary: For certain operator implementations When we don t have enough memory Intermediate Tuple Materialization (Sort-merge join) (Scan: write to T1) s sno = sno σ sscity= Seattle sstate= WA 31 σ pno=2 Supplies (Scan: write to T2) 32 Query Execution Bottom Line SQL query transformed into physical plan Access path selection for each relation Scan the relation or use an index (see next lecture) Implementation choice for each operator Nested loop join, hash join, etc. Scheduling decisions for operators Pipelined execution or intermediate materialization Execution of the physical plan is pull-based Physical Data Independence Means that applications are insulated from changes in physical storage details E.g., can add/remove indexes without changing apps Can do other physical tunings for performance SQL and relational algebra facilitate physical data independence because both languages are set-at-a-time : Relations as input and output 33 34 Database Management System Deployments Architectures 1. Serverless 2. Two tier: client/server 3. Three tier: client/app-server/db-server 35 36 6

Desktop DBMS Application (SQLite) Serverless User SQLite: One data file One user One DBMS application But only a limited number of scenarios work with such model Client Applications File Data file Disk 37 38 Client Applications Server Machine Client Applications Connection (JDBC, ODBC) Connection (JDBC, ODBC) 39 One server running the database Many clients, connecting via the ODBC or JDBC (Java Database Connectivity) protocol 40 Server Machine Supports many apps and many users simultaneously Client Applications Connection (JDBC, ODBC) One server that runs the DBMS (or RDBMS): Your own desktop, or Some beefy system, or A cloud service (SQL Azure) One server running the database Many clients, connecting via the ODBC or JDBC (Java Database Connectivity) protocol 41 42 7

One server that runs the DBMS (or RDBMS): Your own desktop, or Some beefy system, or A cloud service (SQL Azure) Many clients run apps and connect to DBMS Microsoft s Management Studio (for SQL Server), or psql (for postgres) Some Java program (HW7) or some C++ program One server that runs the DBMS (or RDBMS): Your own desktop, or Some beefy system, or A cloud service (SQL Azure) Many clients run apps and connect to DBMS Microsoft s Management Studio (for SQL Server), or psql (for postgres) Some Java program (HW5) or some C++ program Clients talk to server using JDBC/ODBC protocol 43 44 3-Tiers DBMS Deployment Browser 3-Tiers DBMS Deployment Browser Connection (e.g., JDBC) 45 46 3-Tiers DBMS Deployment Browser 3-Tiers DBMS Deployment Connection (e.g., JDBC) Connection (e.g., JDBC) Web-based applications 47 48 8

Replicate 3-Tiers App DBMS server Deployment for scaleup DBMS Deployment: Cloud Easy scaleup, scaledown Users Why don t we replicate the DB server too? Connection (e.g., JDBC) Developers 49 Web & App Server 50 Using a DBMS Server 1. Client application establishes connection to server 2. Client must authenticate self 3. Client submits SQL commands to server 4. Server executes commands and returns results 51 9