Module 9: Selectivity Estimation
|
|
- Harvey Parsons
- 5 years ago
- Views:
Transcription
1 Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock Manager Plan Executor Operator Evaluator Concurrency Control Applications SQL Commands Files and Index Structures Buffer Manager Disk Space Manager Parser Optimizer SQL Interface You are here! Query Processor Recovery Manager DBMS Index Files Data Files System Catalog Database 268
2 9.1 Query Cost and Selectivity Estimation The DBMS has a number of alternative implementations available for each (logical) algebra operator. Selecting an implementation for each of the operators in a plan is one of the tasks of the query optimizer. The applicability of implementations may depend on various physical properties of the argument files (or streams), such as sort orders, availability of indexes,... Among the applicable plans, the optimizer selects the ones with the least expected cost. The cost of the implementation algorithms is determined by the size of the input arguments. Therefore, it is crucial to know, compute, or estimate the sizes of arguments. This chapter presents some techniques to measure the quantitative characteristics of database files or input/output streams. 269
3 Principal Approaches There are two principal approaches to estimate sizes of relations: 1. Maintain a database profile, i.e., statistical information about numbers and sizes of tuples, distribution of attribute values and the like, as part of the database catalog (meta information) during database updates. Calculate similar parameters for (intermediate) query results based upon a (simple) statistical model during query optimization. Typically, the statistical model is based upon the uniformity and independence assumptions. Both are typically not valid, but they allow for very simple calculations. In order to provide more accuracy, the system can record histograms to more closely approximate value distributions. 2. Use sampling techniques, i.e., gather necessary characteristics of a query plan (input relations and intermediate results) by running the query on a small sample and by extrapolating to the full input size at query execution time. Crucial decision here is to find the right balance between size of the sample and resulting accuracy of estimation. 270
4 9.2 Database profiles Simple profiles Keep statistical information in the database catalogs. For each relation, record R... the number of records for each relation R, R... the number of disk pages allocated by those records, s(r)... record size s(r) and blocksize b as an alternative ( R = R / b s(r) ) V (A, R)... the number of distinct values of attribute A in relation R,... and possibly more. Based on these, develop a statistical model, typically, a very simple one, using primitive assumptions. 271
5 Typical assumptions In order to obtain simple formulae, assume one of the following: Uniformity & independence assumption: all values of an attribute appear with the same probability; values of different attributes are distributed independent of each other. Simple, yet rarely realistic assumption. Worst case assumption: no knowledge available at all, in case of selections assume all records match the condition. Unrealistic assumption, can only be used for computing upper bounds. Perfect knowledge assumption: exact distribution of values is known. Unrealistic assumption, can only be used for computing lower bounds. Typically use uniformity assumption. 272
6 Selectivity estimation under uniformity assumption Using the parameters mentioned above, and assuming uniform and independent value distributions, we can compute the characteristics of intermediate results for the (logical) operators: 1. Selections Q := A=c(R) Selectivity: sel(a = c) = 1/V (A, R) uniformity! Number records: Q = sel(a = c) R Record size: s(q) = s(r) Number attribute values: V (A, Q) = { 1, for A = A, c( R, V (A, R), Q ), otherwise. 273
7 The number c( R, V (A, R), Q ) of distinct values in an attribute A after the selection, is estimated using a well-known formula from statistics. The number c of distinct colors obtained by drawing r balls from a bag of n balls in m different colors is c(n, m, r): r, for r < m/2, c(n, m, r) = (r + m)/3, for m/2 r < 2m, m, for r 2m 274
8 Element tests, e.g., Q := A IN (...) (R): An approximation for the selectivity could be obtained by multiplying the selectivity for an equality selection sel(a = c) with the number of elements in the list of values Other Selection Conditions Equality between attributes, e.g., Q := A=B(R): An approximation for the selectivity could be sel(a = B) = 1/ max(v (A, R), V (B, R)). This assumes that each value in the smaller attribute (i.e., the one with fewer distinct values) has a corresponding match in the other attribute. Range selections, e.g., Q := A>c(R): If the system also keeps track of the miminum and maximum value of each attribute (denoted High(A, R) and Low(A, R) hereinafter), we could approximate the selectivity by sel(a > c) = High(A, R) c High(A, R) Low(A, R).
9 3. Projections Q := L(R) Estimating the number of result tuples is difficult. Typically use: V (A, R), for L = {A}, R, for key(r) L, Q = R, without duplicate elimination, min ( R, A i L V (A i, R) ), otherwise s(q) = s(a i ) A i L V (A i, Q) = V (A i, R) for A i L 276
10 4. Unions Q := R S Q R + S s(q) = s(r) = s(s) same schema! V (A, Q) V (A, R) + V (A, S) 5. Differences Q := R S max (0, R S ) Q R s(q) = s(r) = s(s) V (A, Q) V (A, R) same schema! 277
11 6. Products Q := R S 7. Joins Q := R F S Q = R S s(q) = s(r) + s(s) { V (A, R), for A sch(r) V (A, Q) = V (A, S), for A sch(s) This is the most challenging operator for selectivity estimation! A few simple cases are: No common attributes (sch(r) sch(s) = ), or join predicate F = true : R F S = R S Join attribute, say A, is key in one of the relations, e.g., in R, and assuming the inclusion dependency π A (S) π A (R) : Q = R 278
12 In the more general case, again assuming inclusion dependencies, π A (S) π A (R) or π A (R) π A (S), we can use two estimates: Q = R S V (A, R) or Q = R S V (A, S) typically use the smaller of those two estimates, i.e., R S Q = max (V (A, R), V (A, S)) s(q) = s(r) + s(s) V (A, Q) A sch(r) sch(s) { min (V (A, R), V (A, S)), V (A, X), s(a) for natural join for A sch(r) sch(s) for A sch(x) 279
13 Selectivity estimation for composite predicates For selections with composite predicates, we compute the selectivities of the individual parts of the condition and combine them appropriately. Combining estimates...? Here we need our second (unrealistic but simplifying) assumption: we assume independence of attribute value distributions, for under this assumption, we can easily compute: Conjunctive predicates, e.g., Q := A=c 1 B=c 2 (R): sel(a = c 1 B = c 2 ) = sel(a = c 1 ) sel(b = c 2 ), which gives Q = R V (A, R) V (B, R). Disjunctive predicates, e.g., Q := A=c 1 B=c 2 (R): sel(a = c 1 B = c 2 ) = sel(a = c 1 )+sel(b = c 2 ) sel(a = c 1 ) sel(b = c 2 ), which gives Q = R V (A, R) + V (B, R) V (A, R) V (B, R). 280
14 9.2.2 Histograms Observation: in most cases, attribute values are not uniformly distributed across the domain of an attribute. to keep track of non-uniform value distribution of an attribute A, maintain a histogram to approximate the actual distribution: 1 Divide attribute domain into adjacent intervals by selecting boundaries b i dom(a). 2 Collect statistical parameters for each such interval, e.g., number of tuples with b i 1 < t(a) b i, number of distinct A-values in that interval. Two types of histograms can be used: equi-width histograms : intervals all have the same length, equi-depth histograms : intervals all contain the same number of tuples Histograms allow for more exact estimates of (equality and range) selections. I Example of a product using histograms The commercial DBMS Ingres used to maintain histograms. 281
15 Example of Approximations Using Histograms For skewed, i.e., non-uniform, data distributions, working without histograms gives bad estimates. For example, we would compute a selectivity sel(a > 13) = = 3, which is far from exact, since we see that, in fact, 9 tuples qualify Actual value distribution D Uniform distribution approximating D The error in estimation using the uniformity assumption is especially large for those values, which occur very often in the database. This is bad, since for these, we would actually need the best estimates! Histograms partition the range of (actual) values into smaller pieces ( buckets ) and keep the number of tuples and/or distinct values for each of those intervals. 282
16 Example of Approximations Using Histograms (cont d) With the equi-width histogram given below, we would estimate 5 result tuples, since the selection covers a third of the last bucket of the histogram (assuming uniform distribution within each bucket of the histogram). Using the equi-depth histogram also given below, we would, in this case, estimate the exact result (9 tuples), since the corresponding bucket contains only a single attribute value. 10,00 9,00 Equiwidth 10,00 9,00 Equidepth 8,00 8,00 7,00 7,00 6,00 6,00 5,00 5,00 5,00 5,00 5,00 5,00 5,00 5,00 4,00 4,00 3,00 2,67 2,67 2,67 3,00 2,00 1,00 1,33 1,33 1,33 1,00 1,00 1,00 2,00 1,00 0,00 0, Typically, equi-depth histograms provide better estimates than equi-width histograms. Compressed histograms keep separate counts for the most frequent values (say, 7 and 14 in our example) and maintain an equi-depth (or whatever) histogram of the remaining values. 283
17 9.3 Sampling Idea: if maintaining a database profile is costly and error-prone (i.e., may provide far-off estimates), may be it is better to dispense with this idea at all. Rather, if a complex query is to be optimized, execute the query on a small sample to collect accurate statistics. Problem: How small shall/can the sample be? Small enough to be executed efficiently, but large enough to obtain useful characteristics. How precisely can we extrapolate? Good prediction requires large (enough) samples.... select a value for either one of those parameters and derive the other one from that. 284
18 Three approaches found in the literature 1 adaptive sampling : try to achieve a given precision with minimal sample size. 2 double (two-phase) sampling : first, obtain a coarse picture from a very small sample, just good enough to calculate necessary sample size for a useful precision in second step. 3 sequential sampling : sliding calculation of characteristics, stops when estimated precision is good enough. 285
19 9.4 Statistics maintained by commercial DBMS The research prototype System/R introduced the concept of a database profile recording simple statistical parameters about each stored relation, each defined index, each segment ( table space ). In addition to the parameters mentioned above (tuple and page cardinalities, number of distinct values), the system stores the current minimum and maximum attribute values (to keep track of the active domain and to provide input for the estimation of range queries). For each index (only B + tree-indexes are supported by System/R), store height of the index tree and number of leaf nodes. Many commercial DBMSs have adopted the same or similar catalog information for their optimizers. 286
20 N.B.: per segment information is used for estimating the cost of a segment scan. System/R stores more than one relation in a segment and uses the TID addressing scheme. Therefore, there is no way of doing a relation scan. If no other plan is available, the system will have to scan all pages in a segment! 287 Parameters of a DB profile in the System/R catalogs per segment NP number of used pages per relation Card(R) = R, number of tuples PCard(R) = R, number of blocks per index ICard(I) = V (A, R) number of distinct values in indexed attribute MinKey(I) minimum attribute value in indexed attribute MaxKey(I) maximum attribute value in indexed attribute NIndx(I) number of leaf nodes NLevels(I) height of index tree Clustered? is this a clustered index (yes/no) All values are approximations only! They are not maintained during database updates (to avoid hot spots ), rather they can be updated explicitly via SQL s update statistics command.
21 System/R estimation of selectivities Obviously, System/R can use ICard(I)-values for estimating the selectivity sel(a = c) for simple attribute-equals-constant -selections in those cases, where an index on attribute A is available. But, what can we do for the other cases? If there is no index on A, System/R arbitrarily assumes sel(a = c) = For selection conditions A = B, the system uses sel(a = B) = 1 max (ICard(I 1 ), ICard(I 2 )) if indexes I 1 and I 2 are available for attributes A and B, respectively. This estimation assumes an inclusion dependency, i.e., each value from the smaller index, say I 1, has a matching value in the other index. Then, given a value a for A, assume that each of the ICard(I 2 ) values for B is equally likely. Hence, the number of tuples that have a given A-value a as their B-value is 1 ICard(I. 2) 1 If only one attribute has an index, assume selectivity sel(a = B) = ICard(I) ; if neither attribute has an index, assume the ubiquitous ,
22 These formulae are used whether or not A and B are from the same relation. Notice the correspondence with our estimation of join selectivity above!... System/R estimation of selectivities (cont d) For range selections A > c, exploit the MinKey and MaxKey parameters, if those are present (i.e., if an index is available). sel(a > c) = MaxKey(I) c MaxKey(I) MinKey(I) If A is not an arithmetic type or there is no index, a fraction less than half is arbitrarily chosen. Similar estimates can be derived for other forms of range selections. For selections of the form A IN (List of values), compute the selectivity of A = c and multiply by the number of values in the list. Note that this number can be the result of a complex selectivity estimation in case of SQL s nested subqueries. However, never use a resulting value greater than 1 2, since we believe that each selection eliminates at least half of the input tuples. 289
23 Other systems I Estimating query characteristics in commercial DBMSs DB2, Informix, and Oracle use one-dimensional equal height histograms. Oracle switches to duplicate counts for each value, whenever there are only few distinct values. MS SQL Server uses one-dimensional equal area histograms with some optimization (compression of adjacent ranges with similar distributions). SQL Server creates and maintains histograms automatically, without user interaction. Sampling is typically not used directly in commercial systems. Sometimes, utilities use sampling for estimating statistics or for building histograms. Sometimes sampling is used for load balancing during parallelization. 290
24 Bibliography Ceri, S. and Pelagatti, G. (1984). Distributed Databases, Principles and Systems. McGraw- Hill. Ling, Y. and Sun, W. (1992). A supplement to sampling-based methods for query size estimation in a database system. SIGMOD Record, 21(4): Ling, Y. and Sun, W. (1995). An evaluation of sampling-based size estimation methods for selections in database systems. In Yu, P. S. and Chen, A. L. P., editors, Proc. 11th IEEE Int l Conf. on Data Engineering, pages , Taipei, Taiwan. Mannino, M. V., Chu, P., and Sager, T. (1988). Statistical profile estimation in database systems. ACM Computing Surveys, 20(3): Mullin, J. (1993). Estimating the size of a relational join. Information Systems, 18(3): Ramakrishnan, R. and Gehrke, J. (2003). Database Management Systems. McGraw-Hill, New York, 3 edition. 291
Chapter 9. Cardinality Estimation. How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11
Chapter 9 How Many Rows Does a Query Yield? Architecture and Implementation of Database Systems Winter 2010/11 Wilhelm-Schickard-Institut für Informatik Universität Tübingen 9.1 Web Forms Applications
More informationCost Models. the query database statistics description of computational resources, e.g.
Cost Models An optimizer estimates costs for plans so that it can choose the least expensive plan from a set of alternatives. Inputs to the cost model include: the query database statistics description
More informationModule 4: Tree-Structured Indexing
Module 4: Tree-Structured Indexing Module Outline 4.1 B + trees 4.2 Structure of B + trees 4.3 Operations on B + trees 4.4 Extensions 4.5 Generalized Access Path 4.6 ORACLE Clusters Web Forms Transaction
More informationModule 9: Query Optimization
Module 9: Query Optimization Module Outline Web Forms Applications SQL Interface 9.1 Outline of Query Optimization 9.2 Motivating Example 9.3 Equivalences in the relational algebra 9.4 Heuristic optimization
More informationATYPICAL RELATIONAL QUERY OPTIMIZER
14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationOptimization Overview
Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationEvaluation of relational operations
Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationAdvances in Data Management Query Processing and Query Optimisation A.Poulovassilis
1 Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis 1 General approach to the implementation of Query Processing and Query Optimisation functionalities in DBMSs 1. Parse
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe
More informationRELATIONAL OPERATORS #1
RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query
More informationPrinciples of Data Management. Lecture #13 (Query Optimization II)
Principles of Data Management Lecture #13 (Query Optimization II) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Reminder:
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationPrinciples of Data Management. Lecture #9 (Query Processing Overview)
Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationReview. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System
Review Relational Query Optimization R & G Chapter 12/15 Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationOverview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages
Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationExternal Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques Chapter 12, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections v Cost depends on #qualifying
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationQuery Processing & Optimization. CS 377: Database Systems
Query Processing & Optimization CS 377: Database Systems Recap: File Organization & Indexing Physical level support for data retrieval File organization: ordered or sequential file to find items using
More informationCMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1
CMPUT 391 Database Management Systems Query Processing: The Basics Textbook: Chapter 10 (first edition: Chapter 13) Based on slides by Lewis, Bernstein and Kifer University of Alberta 1 External Sorting
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationSchema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes
Schema for Examples Query Optimization (sid: integer, : string, rating: integer, age: real) (sid: integer, bid: integer, day: dates, rname: string) Similar to old schema; rname added for variations. :
More informationQuery optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.
Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying
More informationQuery Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:
Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,
More informationChapter 3. Algorithms for Query Processing and Optimization
Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationOutline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection
Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count
More informationQuery Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:
Query Optimization atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationModule 5: Hash-Based Indexing
Module 5: Hash-Based Indexing Module Outline 5.1 General Remarks on Hashing 5. Static Hashing 5.3 Extendible Hashing 5.4 Linear Hashing Web Forms Transaction Manager Lock Manager Plan Executor Operator
More informationOverview of Query Evaluation. Chapter 12
Overview of Query Evaluation Chapter 12 1 Outline Query Optimization Overview Algorithm for Relational Operations 2 Overview of Query Evaluation DBMS keeps descriptive data in system catalogs. SQL queries
More informationRelational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007
Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VIII Lecture 16, March 19, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part VII Algorithms for Relational Operations (Cont d) Today s Session:
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationOverview of Implementing Relational Operators and Query Evaluation
Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders
More informationImplementation of Relational Operations: Other Operations
Implementation of Relational Operations: Other Operations Module 4, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Simple Selections SELECT * FROM Reserves R WHERE R.rname < C% Of the form σ
More informationECS 165B: Database System Implementa6on Lecture 7
ECS 165B: Database System Implementa6on Lecture 7 UC Davis April 12, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: Dynamic aspects of
More informationDBMS Query evaluation
Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/
More informationRelational Query Optimization
Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6) CPSC404, Laks V.S. Lakshmanan 1 What you will learn from this lecture Cost-based query optimization (System R) Plan space
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationQuery Optimization Overview
Query Optimization Overview parsing, syntax checking semantic checking check existence of referenced relations and attributes disambiguation of overloaded operators check user authorization query rewrites
More informationDatabase Applications (15-415)
Database Applications (15-415) DMS Internals- Part X Lecture 21, April 7, 2015 Mohammad Hammoud Last Session: DMS Internals- Part IX Query Optimization Today Today s Session: DMS Internals- Part X Query
More informationImplementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12
Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton
More informationQuery processing and optimization
Query processing and optimization These slides are a modified version of the slides of the book Database System Concepts (Chapter 13 and 14), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationR & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:
Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationQUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION
E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide
More informationImplementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1
Implementing Relational Operators: Selection, Projection, Join Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Readings [RG] Sec. 14.1-14.4 Database Management Systems, R. Ramakrishnan and
More informationReview. Support for data retrieval at the physical level:
Query Processing Review Support for data retrieval at the physical level: Indices: data structures to help with some query evaluation: SELECTION queries (ssn = 123) RANGE queries (100
More informationCS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing
CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset
More informationChapter 13: Query Optimization
Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog
More informationOutline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework
More informationQuery Optimization in Distributed Databases. Dilşat ABDULLAH
Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of
More informationFaloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection
More informationDatabase Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.
Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator
More informationCMPT 354: Database System I. Lecture 7. Basics of Query Optimization
CMPT 354: Database System I Lecture 7. Basics of Query Optimization 1 Why should you care? https://databricks.com/glossary/catalyst-optimizer https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-sigmod-edgar-f-codd-innovations-award/
More information15-415/615 Faloutsos 1
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415/615 Faloutsos 1 Outline introduction selection
More informationQuery Processing: The Basics. External Sorting
Query Processing: The Basics Chapter 10 1 External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot use traditional
More informationQuery Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!
Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13! q Overview! q Optimization! q Measures of Query Cost! Query Evaluation! q Sorting! q Join Operation! q Other
More informationArchitecture and Implementation of Database Systems Query Processing
Architecture and Implementation of Database Systems Query Processing Ralf Möller Hamburg University of Technology Acknowledgements The course is partially based on http://www.systems.ethz.ch/ education/past-courses/hs08/archdbms
More informationDatabase Management Systems (COP 5725) Homework 3
Database Management Systems (COP 5725) Homework 3 Instructor: Dr. Daisy Zhe Wang TAs: Yang Chen, Kun Li, Yang Peng yang, kli, ypeng@cise.uf l.edu November 26, 2013 Name: UFID: Email Address: Pledge(Must
More informationRelational Databases
Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+
More informationDatenbanksysteme II: Caching and File Structures. Ulf Leser
Datenbanksysteme II: Caching and File Structures Ulf Leser Content of this Lecture Caching Overview Accessing data Cache replacement strategies Prefetching File structure Index Files Ulf Leser: Implementation
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science
More informationExternal Sorting Implementing Relational Operators
External Sorting Implementing Relational Operators 1 Readings [RG] Ch. 13 (sorting) 2 Where we are Working our way up from hardware Disks File abstraction that supports insert/delete/scan Indexing for
More informationEvaluation of Relational Operations. Relational Operations
Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )
More informationPrinciples of Database Management Systems
Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) Query Processing
More informationOverview of Query Evaluation. Overview of Query Evaluation
Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationUser Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM
Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationCSE 190D Spring 2017 Final Exam Answers
CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join
More information