CMPSC431W: Database Management Systems Lecture 36 12/4/15 Instructor: Yu- San Lin yusan@psu.edu Course Website: hcp://www.cse.psu.edu/~yul189/cmpsc431w Slides based on McGraw- Hill & Dr. Wang- Chien Lee 1
Example of Composite Search Keys 11,80 12,10 12,20 13,75 <age, sal> 10,12 20,12 75,13 80,11 <sal, age> name age sal bob cal Data entries in index sorted by <sal,age> 12 11 joe 12 10 80 20 sue 13 75 Data records sorted by name 11 12 12 13 <age> 10 20 75 80 <sal> Data entries sorted by <sal> Using lexicographic order 2
Composite Search Keys To retrieve Emp records with age = 30 AND sal = 40 Which is a becer index? (a) <age, sal> (b) age (c) sal Choice of index key orthogonal to clustering etc. If condidon is: 20 < age < 30 AND 30 < sal <50 What is a becer index? If condidon is: age = 30 AND 30 < sal < 50 Which is a becer index? (a) Clustered <age, sal> index (b) clustered <sal, age> index Composite indexes are larger updated more oeen 3
Index- Only Execudon Plans Some queries can be answered without retrieving any tuples from one or more of the reladons involved if a suitable index is available SELECT E.dno, COUNT(*) FROM Emp E GROUP BY E.dno SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY E.dno SELECT AVG(E.sal) FROM Emp E WHERE E.age = 25 AND E.sal BETWEEN 30 AND 50 4
Index- Only Execudon Plans (cont.) Index- only plans are possible if we have a tree index with key <dno, age> or with key <age, dno> Which is becer for the lee query? SELECT E.dno, COUNT(*) FROM Emp E WHERE E.age = 30 GROUP BY E.dno SELECT E.dno, COUNT(*) FROM Emp E WHERE E.age >= 30 GROUP BY E.dno 5
Summary Many alternadves file organizadons exist, each appropriate in some situadon If selecdon queries are frequent, sordng the file or building an index is important Hash- based only good for search Sorted files and tree- based indexes best for range search; also good for equality search Index is a collecdon of data entries plus a way to quickly find entries with given key values 6
Summary (cont.) Data entries can be actual data records, <key, rid> pairs or <key, rid- list> pairs Can have several indexes on a given file of data records, each with a different search key Indexes can be classified as clustered v.s. unclustered, primary v.s., secondary. Differences have important consequences for udlity/performance 7
Summary (cont.) Indexes must be chosen to speed up important queries Indexes maintenance overhead on updates to key fields Choose indexes that can help many queries Build indexes to support index- only strategies Clustering is an important decision; only one index on a given reladon can be clustered Order of fields in composite index key can be important 8
CHAPTER 12: OVERVIEW OF QUERY EVALUATION 9
Overview How queries are evaluated in a reladonal DMBS? Evaluadon plans How are they represented? Implementadon of reladonal operators What are the alternadves for retrieving data? Query opdmizadon 10
Query Execudon Plan An extended form of reladonal algebra Tree of reladonal algebra operators Each operator may have alternadve algorithms The operators serve as building blocks for query evaluadon Each operator typically implemented using a pull interface The implementadons of the operators are carefully opdmized for good performance 11
Query Execudon Plan: Example Given the following SQL: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5 What is the reladonal algebra? 12
Query Execudon Plan: Example (cont.) Reladonal algebra tree based on the reladonal algebra we just wrote: 13
Query Execudon Plan: Example (cont.) Query execudon plan #1: 14
Query Execudon Plan: Example (cont.) Query execudon plan #2: 15
Query Opdmizadon Queries can be represented in many combinadons of operators and alternadve algorithms The process of finding a good execudon plan is called query opdmizadon Basic task is to consider several alternadve execudon plan for a query 16
Query Opdmizadon (cont.) Two main issues in query opdmizadon For a given query, what plans are considered? Algorithm to search plan space for cheapest (esdmated) plan How is the cost of a plan esdmated? Ideally, we want to find the best plan. But pracdcally, we avoid the worst plans. 17
Algorithms for Reladonal Operadons Selecdon Cost depends on # qualifying tuples Projecdon Expensive part is to remove duplicates Sordng Useful for eliminadng duplicate copies Join Expensive buy common operadons 18
Types of Joins R S Nested loop join For each tuple in R, scan the endre S Index nested loop join Scan R and for each tuple use the index on S to find matching tuples in S Sort- merge join Sort both R and S on the join acributes, and scan them to find matches 19
Cost Esdmadon For each plan considered, must esdmate cost Must esdmate cost of each operadon in plan tree Depends on input cardinalides Also depends on the types of operadons (sequendal scan, index scan, joins, etc.) Must also esdmate size of result for each operadon in tree Use informadon about the input reladons 20
Don t Forget Homework #6 due on 12/11 Project demo #2 this week Expectadon: almost done, close to what you will present to the whole class in final presentadon Project final presentadon: 12/9 & 12/11 Final exam review session: 12/14 Final exam 12/16 8-9.50 a.m. @362 Willard Accumuladve 21