Evaluating Queries. Query Processing: Overview. Query Processing: Example 6/2/2009. Query Processing

Similar documents
Database Applications (15-415)

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Evaluation of Relational Operations: Other Techniques

Database Applications (15-415)

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Evaluation of Relational Operations

Introduction to Spatial Database Systems

Implementation of Relational Operations: Other Operations

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Query Processing. Introduction to Databases CompSci 316 Fall 2017

15-415/615 Faloutsos 1

Chapter 12: Query Processing

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Chapter 13: Query Processing

Evaluation of relational operations

Chapter 12: Query Processing. Chapter 12: Query Processing

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Principles of Database Management Systems

CSIT5300: Advanced Database Systems

Principles of Data Management. Lecture #9 (Query Processing Overview)

RELATIONAL OPERATORS #1

Multidimensional Data and Modelling - DBMS

Overview of Query Evaluation. Chapter 12

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Query Evaluation and Optimization

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Database System Concepts

External Sorting Implementing Relational Operators

CSIT5300: Advanced Database Systems

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Hash-Based Indexing 165

EECS 647: Introduction to Database Systems

Evaluation of Relational Operations

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

Overview of Implementing Relational Operators and Query Evaluation

CMSC424: Database Design. Instructor: Amol Deshpande

An Introduction to Spatial Databases

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Introduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus

Evaluation of Relational Operations. Relational Operations

NOTE: sorting using B-trees to be assigned for reading after we cover B-trees.

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

Query Evaluation Overview, cont.

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Data Organization and Processing

Sorting a file in RAM. External Sorting. Why Sort? 2-Way Sort of N pages Requires Minimum of 3 Buffers Pass 0: Read a page, sort it, write it.

CSE 544 Principles of Database Management Systems

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Chapter 3. Algorithms for Query Processing and Optimization

CMSC424: Database Design. Instructor: Amol Deshpande

Multidimensional Divide and Conquer 2 Spatial Joins

Sorting & Aggregations

Database Applications (15-415)

Today's Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Example Database. Query Plan Example

Implementation of Relational Operations

TotalCost = 3 (1, , 000) = 6, 000

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Advanced Database Systems

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

Query Evaluation Overview, cont.

Dtb Database Systems. Announcement

Evaluation of Relational Operations

Chapters 15 and 16b: Query Optimization

CSE Midterm - Spring 2017 Solutions

Overview of Query Evaluation. Overview of Query Evaluation

Query Processing & Optimization. CS 377: Database Systems

ECS 165B: Database System Implementa6on Lecture 7

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Overview of Query Processing

Chapter 3. Sukhwinder Singh

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Query Evaluation (i)

Query Execution [15]

CSE 344 APRIL 20 TH RDBMS INTERNALS

CS330. Query Processing

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

CS698F Advanced Data Management. Instructor: Medha Atre. Aug 11, 2017 CS698F Adv Data Mgmt 1

Range Reporting. Range reporting problem 1D range reporting Range trees 2D range reporting Range trees Predecessor in nested sets kd trees

Query Processing. Solutions to Practice Exercises Query:

Query Processing & Optimization

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

CSE 344 APRIL 27 TH COST ESTIMATION

CS542. Algorithms on Secondary Storage Sorting Chapter 13. Professor E. Rundensteiner. Worcester Polytechnic Institute

Transcription:

Evaluating Queries Query Processing Query Processing: Overview Query Processing: Example select lname from employee where ssn = 123456789 ; query expression tree? π lname σ ssn= 123456789 employee physical query? (have operators: ROWACCESS, FILESCAN, INDEXSCAN) 1

Classical Example: Sorting why? ORDER BY duplicate removal (intersection, union, DISTINCT) sort/merge for join how? in main memory easy: quick sort issue: large relations don t fit in main memory Scheduling scheduling of access is important Example: 3 pages of main memory, least recently used replacement policy pages in main memory: page 1: 1, 5, 9,, 37 page 2: 2, 6, 10,, 38 page 3: 3, 7, 11,., 39 page 4: 4, 8, 12,, 40 how many page accesses to read 1-40? Another Query Plan Example select e.name, s.name from employee as e, employee as s where e.superid = s.id QEP? 2

External Sorting I goal: minimize page transfers assumptions: data stored in n pages m << n pages fit into main memory solution: sort in runs (temporary sorted subfiles that get merged on disk) External Sorting: Algorithm partition input file into blocks of m pages sort internally, write-out into n/m initial runs at each level of the recursion merge m-1 runs into a new run use 1 page in main memory for each run 1 page in main memory to create merged page write output page if full/reload input page when processed External Sorting: Performance analyze set-up phase how many page I/Os at each recursive level? how many recursive levels? overall analysis? internal sort? 3

Rectangle Intersection, Again original solution: sweepline event list: left/right x-coordinates of rectangles active list: rectangles at current x-value needed for spatial join (on overlap) external technique: distribution sweeping Orthogonal Line Segments input: S set of vertical/horizontal line segments output: pairs of intersecting segments sweepline algorithm: events: x-min coordinates active list: horizontal segments at x-value when vertical segment is encountered: range query works in time O(n log n + k) Orthogonal line segments (EM) assumption n pages of data m buffer pages in main memory external sort (x-min): O(n log m n) split into m horizontal strips of n/m horizontal segments each one active list (stored externally) for each slab (can be read in parallel with others one block at a time, since we have m pages in main memory) 4

Process center pieces why can t we process end-pieces the same way? what to do about end-pieces? End-pieces apply method recursively analysis initial set-up each recursive level depth of recursion? overall running time analysis Rectangle Intersection input: B, R sets of rectangles output: all (b,r) in BxR with b intersecting r solution: similar to line segment problem, separate lists for red and blue rectangles but: rectangles don t fit into strips, they can span multiple strips problem: intersection between (b,r) might be reported multiple times so naïve adaptation gives factor of m 5

Rectangle Intersection Solution: separate list for each interval of strips and each color: L b h, k L r h, k L b h, k L r h, k Analysis? Spatial Join join on: topology (overlap, disjoint, contain, ), geometry (distance, direction) consider overlap only filter: overlap of mbbs refinement: overlap of geometries depends on indexes available 6

Spatial Join Algorithms no indexes distribution sweep hash-join algorithm single index INL (indexed nested loop) two R-tree indexes synchronized tree traversal two linear trees single index: INL for each o in non-indexed relation perform range query with o.mbb on indexed relation no index hash-join algorithm assume buckets fit into main memory hash keys of relations R, S into buckets load smaller bucket, compare to corresponding bucket Example: R: 2000 records, S: 500 records hash into 100 buckets page I/O? (read R, S, write buckets, join) 7

no index for spatial data hash-join depends on join condition being equality: overlapping rectangles won t hash to same bucket solution: buckets determined by rectangles may overlap (no redundancy) or be disjoint (redundancy) hash-join (overlapping) 1. partition R all buckets roughly same size buckets should fit into main memory minimal overlapping 2. assign rectangles of S to buckets of R S rectangles might be duplicated 3. join buckets (load smaller bucket into main memory, scan other bucket) joining two linear trees raster trees: traditional join general linear tree (e.g. linear quadtree or z-ordering tree) Property: C z is contained in C z if and only if z is prefix of z can use to test overlap 8

joining two z-ordering trees replace each entry (z, oid) with intervals (z, ssc(c z )), where ssc(c z ) is lower-right corner of C z two squares overlap iff their intervals overlap store each list in a stack B A G C ED F H J I L K R-trees naïve recursion restricted recursion sweep-line R-trees recursively STT (Synchronized Tree Traversal) I/O performance ok CPU cost high 9

R-trees recursively, improved STT(R1, R2) do we need to look at every combination of {R3, R4, R5} and {R6, R7}? R-trees, sweepline why not use red/blue intersection algorithm we saw earlier? Asymptotics vs constants greedy approach: order red/blue sets keep picking leftmost rectangle r keep testing rectangles s of opposite color so that s.xmin < r.xmax remove leftmost rectangle Example p r p b analysis (bad case?) 10

Building Query Execution Plans select intersect(l.shape, c.shape) from county c, land_use l where c.county_name = San Jose and overlaps(l.shape, c.shape); pipelined execution possible: Iterators pipelined execution possible iterators (open, next, close) e.g. rowaccess, retrieve one record at a time issues: refinement not included yet CPU time plays a role Example spatial join between roads and land-use both relations have R-tree Problems: random access to records (high I/O) repeated access to same records so refinement needs to be done carefully 11

Sequencing assume 4 pages fit into main memory; look at schedule (a) Sequencing: Segment Sort k: number of pairs (Lx, RIDy) that fit into m-1 pages load k pairs (LIDx, RIDy) into m-1 pages sort on LID, access land-use replace LID with L records sort on RID, load records from Road using mth page, perform refinement step for each record Why Not sort (LID, RID) after STT? means we can t pipeline: sorting is a blocking operator 12

Multiway Joins Sources Garcia-Molina, Ullman, Widom, Database Systems; the complete book, Pearson, 2009. Vassilakopoulos, Papadopoulos, Spatial databases, IGI, 2005 13