Converting ER to Relational Schema

Size: px
Start display at page:

Download "Converting ER to Relational Schema"

Transcription

1 Converting ER to Relational Schema Project P-number P-name Due-Date 0..* Assignment 0..* 0..* Manager 1..1 Employee SSN E-Name Office 1 1 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

2 Translate each entity into a table, with keys. Entity : can be represented as a table in the relational model SSN E-Name Office has a key which becomes a key for the table Employee CREATE TABLE Employee (SSN CHAR(11) NOT NULL, E-Name CHAR(20), Office INTEGER, PRIMARY KEY (SSN)) 2 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

3 Translate each many-to-many relationship set into a table Project P-number P-name Due-Date 0..* Assignment 0..* 0..* Manager 1..1 Employee SSN E-Name Office What are the attributes and what is the key for Assignment? Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office) 3 3 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

4 Project P-number P-name Due-Date 0..* Assignment 0..* 0..* Manager 1..1 Employee SSN E-Name Office Answer: Assignment(P-Number, SSN) P-Number is a foreign key for Project SSN is a foreign key for Employee Project(P-Number, P-Due-Date) Employee(SSN, E-Name, Office) 4 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

5 What do we do with a one-to-many relationship? Project P-number P-name Due-Date 0..* Assignment 0..* 0..* Manager 1..1 Employee SSN E-Name Office For example, what do we do with Manager? Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office) 5 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

6 Create a foreign key for a 1-to-many relationship. Project P-number P-name Due-Date 0..* Assignment 0..* 0..* Manager 1..1 Employee SSN E-Name Office Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office) Manager is a foreign key (referencing the Employee relation) value of Manager must match an SSN in Employee 6 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

7 Or...Create a table for a 1-many relationship. Project P-number P-name Due-Date 0..* Assignment 0..* 0..* Manager 1..1 Employee SSN E-Name Office Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office) 7 vs. Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office) Manager(P-number, SSN) What are the tradeoffs between these two? Note: P-number is the key for Manager CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

8 What if SSN is the key for Manager? Project P-number P-name Due-Date 0..* Assignment 0..* 0..1 Manager 0..* Employee SSN E-Name Office Project (P-number, P-name, Due-Date) Employee (SSN, E-Name, Office, Managed-project) vs. Project (P-number, P-name, Due-Date) Employee (SSN, E-Name, Office) Manager (P-number, SSN) 8 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

9 Add attributes to the table for the relationship Project P-number P-name Due-Date 0..* Assignment 0..* 0..* Manager 1..1 role start-date end-date Employee SSN E-Name Office Assignment(P-number, SSN, role, start-date, end-date) Project(P-number, P-name, Due-Date) Employee(SSN, E-Name, Office) 9 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

10 What if a 1-to-many relationship has an attribute? Project P-number P-name Due-Date Manager start-date end-date Employee SSN E-Name Office Project(P-number, P-name, Due-Date, Manager) Employee(SSN, E-Name, Office) 10 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

11 Add attributes to the table for the relationship? Project P-number P-name Due-Date Manager start-date end-date Employee SSN E-Name Office Project(P-number, P-name, Due-Date, Manager, start-date, end-date) Employee(SSN, E-Name, Office) 11 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

12 Add attributes to the table for the relationship? Project P-number P-name Due-Date Manager start-date end-date Employee SSN E-Name Office No, bad idea. You should use an extra table for the relationship. Project(P-number, P-name, Due-Date, Manages(P-number, SSN, start-date, end-date) Employee(SSN, E-Name, Office) Notice the key for the Manages table. 12 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

13 Minimum cardinality of 1 in SQL We can require any table to be in a binary relationship using a foreign key which is required to be NOT NULL (but little else without resorting to CHECK constraints) CREATE TABLE Department ( d-code INTEGER, d-name CHAR(20), manager-ssn CHAR(9) NOT NULL, since DATE, PRIMARY KEY (d-code), FOREIGN KEY (manager-ssn) REFERENCES Employee, ON DELETE NO ACTION) 13 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

14 Weak Entity Sets Employee SSN name office Insures Policy dep-name cost strong entity set identifying relationship set weak entity set 14 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

15 Translating Weak Entity Sets Weak entity sets and identifying relationship sets are translated into a single table. Must include key of strong entity set, as a foreign key. When the owner entity is deleted, all owned weak entities must also be deleted. CREATE TABLE Insurance_Policy ( dep-name CHAR(20), cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (dep-name, ssn), FOREIGN KEY (ssn) REFERENCES Employee, ON DELETE CASCADE) CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

16 Translating entities involved in ISA employee position salary person ssn name age listed on insurance dependent gender You have three choices: 1. One table: person with all attributes 2. Two tables: employee and dependent with all attributes from person copied down 3. Three tables: one for person, employee, and dependent 16 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

17 Option 1: Use one table (for person) person ssn name age Advantages: 1. no join needed 2. no redundant attributes employee position salary listed on insurance dependent gender Disadvantages: 1. may have lots of nulls 2. slightly more difficult to find just employee or just dependent person(ssn, name, age, position, salary, gender) listed-on-insurance(emp-ssn, dep-ssn) where emp-ssn references person.ssn and dep-ssn references dep-ssn 17 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

18 Option 2: Use two tables (employee & dependent) employee position salary person ssn name age listed on insurance dependent gender Advantages: 1. fewer nulls 2. easy to get just employee or just dependent Disadvantages: 1. Not possible to have a person who is not an employee or person 2. name and age are stored redundantly employee( ssn, name, age, position, salary) dependent(ssn, name, age, gender, insurer) where insurer references employee.ssn This works if each dependent is on just one insurance. 18 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

19 Option 3: Use three tables (person, employee, dependent) 19 employee position salary person ssn name age listed on insurance dependent gender Advantages: 1. there can be persons who are not employees or dependents 2. no redundancy 3. no need to use nulls 4. matches the conceptual model Disadvantages: 1. joins are required to get all of the attributes person(ssn, name, age) employee(ssn, position, salary) where ssn refs person.ssn dependent(ssn, gender, insurer) where ssn refs person.ssn where insurer references employee.ssn This works if each dependent is on just one insurance. CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

20 Translation Steps: ER to Tables Create table and choose key for each entity; include singlevalued attributes. Create table for each weak entity; include single-valued attributes. Include key of owner as a foreign key in the weak entity. Set key as foreign key of owner plus local, partial key. For each 1:1 relationship, add a foreign key to one of the entity sets involved in the relationship (a foreign key to the other entity in the relationship)*. For each 1:N relationship, add a foreign key to the entity set on the N-side of the relationship (to reference the entity set on the 1-side of the relationship)*. 20 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

21 Translation Steps: ER to Tables (cont.) For each M:N relationship set, create a new table. Include a foreign key for each participant entity set, in the relationship set. The key for the new table is the set of all such foreign keys. For each multi-valued attribute, construct a separate table. Repeat the key for the entity in this new table. It will serve as both the key for this table as well as a foreign key to the original table for the entity. This algorithm from Elmasri & Navathe, Fundamentals of Database Systems 21 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

22 Files, Indexes, Query Optimization 22 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

23 Query Optimization Overview Which query plan is the fastest? How many query plans are there? How can we estimate the cost of a plan? But wait, how are queries (query operators) implemented? But wait, how are the files stored? 23 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

24 We ll just introduce these ideas and we ll start from bottom Relational Algebra Query Tree Search for a cheap plan Join algorithms, Heap, Index, Operating system level Issues (may be handled by DBMS or by O/S) Query Optimization Relational Operator Algs. Files and Access Methods Buffer Management Disk Space Management how a disk works DB 1 24 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

25 Disk:10 5 to 10 6 times slower than memory (we ll use in this 10 6 example) Disk access time (all three costs together) is about 2 to 7 milliseconds Memory access time: 50 to 70 nanoseconds 7 milliseconds vs. 70 nanoseconds therefore disk access is 100,000 times slower than memory access Contrast 1 second (pick up a piece of paper) vs. 100,000 seconds (drive to SF and back about 28 hours) 25 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

26 Cost of Accessing Data on Disk Time to access (read/write) a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface) Key to lower I/O cost: reduce seek/rotation delays! (you have to wait for the transfer time, no matter what) Query cost is often measured in the number of page I/Os often simplified to assume each page I/O costs the same 26 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

27 Block (page) size vs. record size Page smallest unit of transfer supported by OS Block Multiple of page, smallest unit of transfer supported by an application or a disk drive. Block and page are often used interchangeably. typical record size maybe a few hundred up to few thousand bytes typical page size 4K, 8K When would we choose block size to be larger? When would we choose block size to be smaller? 27 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

28 Index for a File An Index is a data structure that speeds up selections on the search key field(s) An index starts with a search key k and gives you a data entry k*. Given k*, you can get to the record(s) with the search key k in one I/O. 28 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

29 B+ Tree Indexes Non-leaf Pages Leaf Pages (Sorted by search key) Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages have index entries; only used to direct searches: index entry P 0 K 1 P 1 K 2 P 2 K m P m 29 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

30 Index is on the left; data is on the right Search key is Name Ashby Basu Bristow Cass Daniels Jones Smith Tracy Ashby, 25, 3000 Basu, 33, 4003 Bristow, 30, 2007 Cass, 50, 5004 Daniels, 22, 6003 Jones, 40, 6003 Smith, 44, 3000 Tracy, 44, 5004 Records are sorted by Name in the file Each page contains 3 records. Index Data File 30 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

31 This index is dense (one index entry for each data record) and clustered (data are sorted based on search key) Search key is Name Ashby Basu Bristow Cass Daniels Jones Smith Tracy Ashby, 25, 3000 Basu, 33, 4003 Bristow, 30, 2007 Cass, 50, 5004 Daniels, 22, 6003 Jones, 40, 6003 Smith, 44, 3000 Tracy, 44, 5004 Records are sorted by Name in the file; Clustered! Each page contains 3 records. Index Data File 31 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

32 This index is: sparse (one index entry per PAGE of data) and clustered (data is sorted on search key) Search key is Name Ashby Cass Smith Ashby, 25, 3000 Basu, 33, 4003 Bristow, 30, 2007 Cass, 50, 5004 Daniels, 22, 6003 Jones, 40, 6003 Smith, 44, 3000 Tracy, 44, 5004 Records are sorted by Name in the file; clustered! Each page contains 3 records. Index Data File 32 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

33 Sparse, clustered index Good for range search The data is sorted according to the search key. There is one entry for each page of data records. Consider a phone book with the heading on each page such as Mcinroy Mckee or Lowe Lozano. This tells us that all names that fall between Lowe and Lozano will be on this page. On disk, one I/O operation gets us a whole page of entries sorted by name. (~350 names) 33 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

34 This index is unclustered (records NOT Sorted on Search Key) and dense (one entry for every data record) Search key is Age Ashby, 25, 3000 Basu, 44, 4003 Bristow, 30, 2007 Cass, 50, 5004 Daniels, 22, 6003 Jones, 40, 6003 Smith, 44, 3000 Tracy, 33, 5004 Index Data File 34 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

35 Unclustered indexes In an unclustered index (also called a secondary index) the underlying data is NOT sorted according to the search key. Example: imagine building an index on a phone book based on phone number. You MUST put an entry in the index for every single person in the phone book. Thus an unclustered index MUST be dense; there is no other choice. For a range search, we need one I/O for every data record! 35 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

36 I/Os using a dense, unclustered index during a range search Non-leaf Pages Leaf Pages (Sorted by search key) Every data record = 1 I/O! You may re-read some pages! Cost to scan data file in sorted order = M*N = no. of records in the file. 36 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

37 Indexes DBMSs often create a clustered index on all primary keys. Note: primary keys are the values that must be used in foreign keys. Only one clustered index per table! Why? You need to decide whether you want additional (unclustered/secondary) indexes. You need to decide if you want composite indexes. 37 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

38 Join Algorithms an Introduction SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid R S is very common! And R S followed by a selection is inefficient. So we process joins (rather than cross product) whenever possible. Lots of effort invested in join algorithms. Assume: M pages in R, p R tuples per page, N pages in S, p S tuples per page. In our examples, R is Reserves and S is Sailors. 38 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

39 Simple Nested Loops Join Join on i th column of R and j th column of S foreach tuple r in R do foreach tuple s in S do if r i == s j then add <r, s> to result 1. In the outer loop, read the first table tuple-by-tuple. 2. For each tuple in the outer loop, then compare it to each tuple in the second table in the inner loop. This requires that we read the entire table in the second loop for each tuple in the outer loop. CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

40 Simple Nested Loops Join Table 1 on disk Memory Buffers: Table 2 on disk CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

41 Simple Nested Loops Join Table 1 on disk Memory Buffers: Table 2 on disk Query Answer 2 2 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J. Terwilliger, with permission

42 Simple Nested Loops Join Table 1 on disk Memory Buffers: No match: Discard! Table 2 on disk Query Answer CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

43 Simple Nested Loops Join Table 1 on disk Memory Buffers: No match: Discard! Table 2 on disk Query Answer CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

44 Simple Nested Loops Join Table 1 on disk Memory Buffers: No match: Discard! Table 2 on disk Query Answer CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

45 Simple Nested Loops Join Table 1 on disk Memory Buffers: No match: Discard! Table 2 on disk Query Answer CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

46 Simple Nested Loops Join Table 1 on disk Memory Buffers: Table 2 on disk No match: Discard! At this point, we have read the entire table in the inner loop. So, we advance to the second tuple in the outer loop. Query Answer CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

47 Simple Nested Loops Join Table 1 on disk Memory Buffers: No match: Discard! Table 2 on disk Query Answer CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

48 Simple Nested Loops Join Table 1 on disk Memory Buffers: No match: Discard! Table 2 on disk Query Answer CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

49 Simple Nested Loops Join Table 1 on disk Memory Buffers: Table 2 on disk Does this algorithm work for R1.sid < S1.sid? Does this algorithm work for cross product? Match! Query Answer And so forth 49 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

50 Cost of Simple Nested Loops Join For each tuple in the outer relation R, we scan the entire inner relation S, tuple by tuple. Cost: M + (p R * M) * N = *1000*500 I/Os 50,001,000 I/Os 500,010 seconds 6 days Join on i th column of R and j th column of S foreach tuple r in R do foreach tuple s in S do if r i == s j then add <r, s> to result We assume approximately 100 I/Os per second M = 1000 pages in R, p R = 100 tuples per page, N = 500 pages in S, p S = 80 tuples per page. 50 CS386/586 Lois Delcambre, Some slides adapted from R. Ramakrishnana, et al, and J.

51 Better: Page-Oriented Nested Loops Join Table 1 on disk Memory Buffers: Table 2 on disk Once we ve got these two pages in memory, check every combination from one page to the other page! 51 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

52 Page-Oriented Nested Loops Join (cont.) Table 1 on disk Memory Buffers: Table 2 on disk This page is still in memory Get the next page Do the same thing compare all combinations in memory - between these two pages! 52 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

53 Cost of Page-oriented Nested Loops Join for each page of tuples r in R do for each page of tuples s in S do (match all combinations in memory) if ri == sj then add <r, s> to result For each page of R, get each page of S, write out matching pairs of tuples <r, s>. Cost: M + M*N = *500 = 501,000 (R outer) Cost: N + N*M = *1000 = 500,500 (S outer) Therefore, use smaller relation as outer relation. 500,000 I/Os 500 seconds = 8.3 minutes (assume 1000 I/Os/second) Compare with simple nested loops: M + (p R * M) * N = *1000*500 = 50,001, CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

54 Page-oriented Nested Loops Join for each page of tuples r in R do for each page of tuples s in S do (match all combinations in memory) if ri == sj then add <r, s> to result For each page of R, get each page of S, write out matching pairs of tuples <r, s>. Cost: M + M*N = *500 = 501,000 (R outer) Cost: N + N*M = *1000 = 500,500 (S outer) Therefore typically use smaller relation as outer relation. 500,000 I/Os 8.3 minutes 54 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

55 The best loops-based join algorithm: Block Nested-Loops Join (use a block of buffers) R on disk Algorithm: B pages of Memory Buffer S on disk One page is assigned to be the output buffer (not shown on this slide) One page assigned to input from S, B-2 pages assigned to input from R Until all of R has been read { Read in B-2 pages of R For each page in S { Read in the single S page Check pairs of tuples in memory and output if they match } } Cost: M + (M/(B-2))*N. For B=35, cost is *500/33 = 16,000 I/Os 16 seconds 55 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

56 Comparing the nested loops join algorithms Algorithm simple nested loops join page-oriented nested loops join block nested loops join Number of times to read inner table Once for each row in outer table Once for each PAGE of rows in outer table Once for each b-2 pages of rows in outer table Cost formula (with example M = 1000, p r = 100, b = 52, N = 500 (we don t use p s ) M + (M*p r ) * N (1000*100) * ,000 * ,000,000 M + M*N * ,000 M + (M/(b-2) * N (1000/50) * * , CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

57 A few comments about costs These cost formulas are much simpler that real formulas: Real formulas take other costs into account (including CPU time) Real systems use buffers and caches to assist with I/Os. (So what we estimate as independent I/Os aren t necessarily actually I/Os; they might be reading data from memory or from a disk cache.) Note: you can t control/know what happens e.g., when you read data from a table or query answer using a cursor. (The OS/DBMS handles these details.) These examples are tiny! These tables would fit in memory. 57 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

58 Index Nested Loops Join foreach tuple r in R do foreach tuple s in S where r i == s j Use the Index to find s do add <r, s> to result If there is an index on the search key s j then can use the index on the inner table - get matching tuples! Cost: M + ( (M*p R ) * cost of finding matching S tuples) M + ((M*p R ) * (I/Os to find index + 1 to get the data) = (500*80*4) = 160,500 1/2 hour (Reserves as inner) = (1000*100*3) = 301,000 1 hour (Sailor as inner) These could be smaller if top levels of B+ tree are in memory. Could be 0 if entire index is in memory. For each R tuple, cost of probing S index is about 2-4 for B + tree. 80,500 (Reserves as inner) 15 min.; 151,00 (Sailor as inner) 30 min. 58 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

59 Index nested loops join Advantage: access exactly the right records in the inner loop (for equi-join). You can use a clustered or an unclustered index for an index nested loops join. Disadvantage: (necessarily) you have one I/O per row (in the table in the outer loop) rather than one I/O per page (of rows in the outer loop). 59 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

60 Hash Join Simple case the smallest table, say S, fits in main memory Build an in-memory hash index for S Proceed as for index nested-loops join Harder case neither R nor S fits in memory We won t cover this case in the class. Divide them both in the same way (1 pass) so that each partition of S fits in memory Do in-memory matching on each pair of matching partitions 60 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

61 Hash Join (simple case; small table fits in memory) First: read and process R 1. Blocks of rows of R R fits in main memory 2. For each row of R, compute h(joinval) 3. Then put row in the proper bucket Bucket 1 Bucket 2 Bucket n Then: process S against the buckets 2. For each row of S, compute h(joinval) 3. Then join this row from S with all matching rows from R; they are all in Bucket n. Blocks of rows of S 1. Read one page after another until finished. 61 Do all rows in Bucket n join with this row from S? CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

62 Algorithms for other operators Table scan Index retrieval (select operator) Index-only scan (project operator) What might a simple algorithm be for eliminating duplicates (e.g., for DISTINCT or for UNIOIN)? 62 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

63 Query Optimization Translate SQL query into a query tree (operators: relational algebra plus a few other ones) Generate other, equivalent query trees (e.g., using relational algebra equivalences) For each possible query tree: select an algorithm for each operator (producing a query plan) estimate the cost of the plan Choose the plan with lowest estimated cost - of the plans considered (which is not necessarily all possible plans) 63 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

64 Initial Query Tree - Equivalent to SQL (without any algorithms selected) SQL Query: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; Relational Algebra Tree: sname bid=100 rating > 5 sid=sid Reserves Sailors

65 Relational Algebra Equivalence: Cascade of (and Uncascade of) Selects c1 cn (R) c1 ( cn (R)) This symbol means equivalence. So you can replace c1 ( cn (R)) with c1 cn (R) And you can replace c1 cn (R) with c1 ( cn (R)) If you have several conditions connected by AND in a select operator, then you can apply them one at a time. 65 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

66 Example: c1 cn (R) c1 ( cn (R)) SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; sname sname bid=100 bid=100 rating > 5 rating > 5 sid=sid sid=sid Reserves Sailors Reserves Sailors

67 Example: c (R S) c (R) S SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; This applies only to the Sailors table! sname bid=100 rating > 5 sid=sid Reserves sname bid=100 sid=sid rating > 5 Sailors Reserves Sailors

68 Example: c (R S) c (R) S (cont.) SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; sname bid=100 What are the advantages of pushing a select past a join operator? What are the disadvantages of pushing a select past a join operator? Reserves sid=sid rating > 5 Sailors 68 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

69 Relational Algebra Equivalences Selections: c1 cn (R) c1 ( cn (R)) Selects Cascade c1 ( c2 (R)) c2 ( c1 (R)) Selects Commute Projections: a (R) a ( a1 ( an (R))) If each a i contains a. Only last project matters Joins: R (S T) (R S) T R S S R Joins are Associative Joins Commute Try to prove that: R (S T) (T R) S Try it out with: is agent (languagerel language) equivalent to: (language agent) languagerel 69 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

70 Original Query Tree (without any algorithms selected) SQL Query: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; RA Tree: sname bid=100 rating > 5 sid=sid Reserves Sailors

71 A Plan for the Original Query Tree: How shall we run the query plan? seq. scan sname (write temp file to disk) seq. scan bid=100 rating > 5 (write temp file to disk) nested Loops join sid=sid One way to execute this query is to perform each operator (starting from the bottom) and always write the intermediate results out to disk. We could choose a join algorithm and then do everything else with a sequential table scan. Reserves Sailors

72 But wait select and project operators operate on just one row at a time seq. scan On-the-fly sname (write temp file to disk) seq. scan On-the-fly bid=100 rating > 5 (write temp file to disk) Reserves sid=sid Nested Loops join Sailors We can do a select (to filter out unwanted rows) and we can do a project (to drop columns) while the row is in memory. Some previous operator must first read the rows into memory; then we can do select and project on the fly which means in memory.

73 On the fly is free On the fly means that we evaluate the operator in memory - while we have the tuple available. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; Plan: sname bid=100 rating > 5 (On-the-fly) (On-the-fly) On the fly induces no I/O cost! sid=sid (Nested Loops Join) Reserves Sailors

74 Limitations of On the fly Can only happen if: Computation can be done entirely on tuples in memory Results do not need to be materialized An earlier operator has read the table Cannot be used for first table scan! Cannot apply to all operations such as join or aggregates, etc. 74 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

75 Cost of plan 1 no index (Sailors inner loop) M = # of pages in outer table N = # of pages in inner table Cost of page-oriented nested loops join is: M + M * N * 500 = 501,000 And the on-the-fly operations have no I/O - so plan cost is 501,000 SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; Plan: Reserves sname bid=100 rating > 5 sid=sid Sailors (On-the-fly) (On-the-fly) (Nested Loops Join)

76 Create other plans Use relational algebra equivalences to produce new query trees. Advantage: you are sure that the new tree is equivalent to the original, because equivalences have proofs Assign different algorithms to operators 76 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

77 Cost of plan 2 (Reserves on inner loop) Sailors as the outer relation rather than Reserves. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; Plan: M = # of pages in outer table N = # of pages in inner table sname (On-the-fly) Cost of page-oriented nested loops join is: * 1000 = 500,500 And the on-the-fly operations have no I/O - so plan cost is 500,500 bid=100 rating > 5 sid=sid (On-the-fly) (Page-Oriented Nested Loops Join) Sailors Reserves

78 Cost of plan 3 Index nested loops What is the cost of the plan shown? M + M * pr * (# of index I/Os + 1) (1000*100*(3+1)) if 3 index I/Os or (1000 * 100 * (0 + 1)) if 0 index I/Ox SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; Plan: sname bid=100 rating > 5 (On-the-fly) (On-the-fly) Thus ,000 = 301,000 or ,000 = 101,000 Reserves sid=sid Sailors (Index nested loops)

79 Cost of plan 4 Push down selects Apply this equivalence: c (R S) c (R) S To the previous query tree to get an equivalent query tree. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5; Plan: (On-the-fly) sname What is the cost of the plan shown? Scanning sailors and reserves cost M+N I/Os. But here we read intermediate files. sid=sid What about the cost of the join? It (On-the-fly) depends on how many reservations rating > 5 there are for boat 100 and how many sailors have a rating >5. (Table scan) How would you find this information? Sailors Statistics will help. (Page-Oriented Nested Loops Join) (On-the-fly) bid=100 (Table scan) Reserves

80 To estimate cost, we need table sizes For all operators beyond the leaf level of the query plan, the input tables are the result of some earlier query. Thus, we need to estimate the size of intermediate results! (This can be difficult. This is one reason why the cost estimates may not be very good. Estimation errors tend to compound.) For example, what information would you need to estimate how many reservations are for bid 100 and how many sailors have a rating >5? 80 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

81 DBMS Usually Maintains Some Statistics in the DB Catalog Catalogs typically contain at least: # tuples and # pages for each table. # distinct key values and # pages for each index. Index height, low/high key values for each tree index. Catalogs are updated periodically - say, once a week or once a month. Perhaps they re updated during the backup. Simplest case: assume that all attribute values are uniformly distributed. Thus if gender was an attribute, the optimizer would assume that half of the rows have the male value and other half have the female value. (This might be grossly inaccurate.) 81 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

82 Calculating Selectivities Assume that rating values range from 1 to 10, and that bid values range from 1 to 100. What percentage of the incoming tuples, to the operator bid=100, will be output? What about rating > 5? bid=100 rating > 5? 82 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

83 Doing better than a uniform distribution The DBMS might gather more detailed information about how the values of attributes are distributed (e.g., histograms of the values in a field) and store it in the catalog. Suppose there was an attribute degree-program with three possible values: BS CS MS CS PhD CS Then the DBMS might count the values and know that there are 428 BS CS values, 98 MS CS values and 25 PhD CS values. This allows much better estimate of the reduction factor. 83 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

84 Independence of Reduction Factors So far, we have gathered statistics and shown how to use them, assuming uniform distributions. But what about a select operator with terms separated by AND? Example: bid=100 rating > 5 We assume that all terms are independent! Thus, if one attribute is class and the other is number-of-hours - the query optimizer might assume that class is uniformly distributed over {Fresh, Soph, Jun, Sen} and that number-ofhours is uniformly distributed over {0, 1,, 205} But, we know that class correlates with number-of-hours! Might even be that number-of-hours class. What percentage of the incoming tuples, to the operator bid=100 rating > 5, will be output? 84 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

85 Enumerating Plans for Multiple Joins Back to the problem of generating plans. Are we trying to generate as many plans as possible? No, best to generate few, as long as cheap plans are among them. In System R: only left-deep join trees are considered. This one is left-deep - the other two are not. D D C C A B C D A B A B 85 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

86 Queries Over Multiple Relations (Joins) Left-deep trees allow us to generate all fully pipelined plans. Intermediate results not written to temporary files. Not all left-deep trees are fully pipelined (e.g., SM join). A B C D Using only left-deep plans (obviously) restricts the search space. (So optimizer may not find the optimal plan.) 86 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

87 Nested Queries Nested block is optimized independently, with the outer tuple considered as providing a selection condition. Outer block is optimized with the cost of calling the nested block computation taken into account. Implicit ordering of these blocks means that some good strategies are not considered. The non-nested version of the query is typically optimized better. The optimizer might not find it from the nested version, so you may need to explicitly unnest the query. SELECT S.sname FROM Sailors S WHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid) Nested block to optimize: SELECT * FROM Reserves R WHERE R.bid=103 AND S.sid= outer value Equivalent non-nested query: SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103

88 Summary Algorithms for Relational Algebra Operators A virtue of relational DBMSs: queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!). Many alternative implementation techniques for each operator; no universally superior technique for most operators. Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops. 88 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

89 Query Optimizers Don t (Always) Find the Best Plan There are usually more plans than the optimizer can consider, even if only left deep plans are considered. The optimizer might not even try to generate all possible plans (it won t be able to consider all of them anyway). Sometimes the optimizer will compare the optimization cost to the estimated execution cost and quit early. The optimizer chooses the plan with the lowest ESTIMATED cost. Actual costs may differ. 89 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

90 90 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

91 Physical DB Design Now that we know how query optimizers work How do we choose the file organizations and indices? How do we decide whether to modify the schema for our database? This is called physical design of a database. This is sort of like query optimization - backwards

92 Some Important Physical Database Design Issues Mapping tables to physical storage/file types. (We won t talk about this part because we haven t discussed physical storage in a DBMS.) Choosing the indexes Modifying the conceptual schema Denormalization Horizontal/vertical decomposition 92 92

93 We Need to Understand the Workload to Choose Indices For each query in the workload: Which relations does it access? Which attributes are retrieved? Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? (That is, how many matching records will there be? For each update in the workload: Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? What type of updates (i.e., INSERT/DELETE/UPDATE) are prominent, and which tables and attributes are affected? 93 93

94 Tips for index selection Don t use indexes on small tables (< 200 rows) Don t use indexes on columns with few values (T/F, gender, state) For most systems, indexes on primary keys and foreign keys are sufficient Don t forget to add indexes when the schema changes! 94 94

95 Vertical decomposition: split one table into two choose which attributes go into each table Take this table: Course c# cname instructor room days And replace it with these three tables: Course1 c# room days Course2 c# cname Course3 c# instructor Advantage: if queries use only a subset of the attributes, smaller rows (fewer attributes) means faster access. Disadvantage: if queries need all attributes, then you have to join. 95 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

96 Horizontal partition/decomposition (split data into two identical tables with two different names) Course c# cname instructor room days Undergraduate-Course c# cname instructor room days Graduate-Course c# cname instructor room days 96 CS386/586 Introduction to Database Systems, Lois Delcambre, David Maier

97 Use a view to hide changes Undergraduate-Course (c#, cname, instructor, room, days) Graduate-Course (c#, cname, instructor, room, days) CREATE VIEW Courses(cid, sid, jid, did, pid, qty, val) AS SELECT * FROM Undergraduate-courses UNION SELECT * FROM Graduate-courses A query can use any of the three. Updates will need to know which table to use

98 Views A view is a query, with a name, that is stored in the database. Example view definition: CREATE VIEW gstudents AS SELECT S.* FROM student S WHERE s.gpa >= 2.5 Views can be used like base tables, in a query or in any other view. Views are like macros. 98 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

99 Views can be used to make queries simpler Completed(StudID, Course) CREATE VIEW gstudents AS SELECT S.* FROM student S WHERE s.gpa >= 2.5 Queries about good students only are easier to write SELECT S.name, S.phone FROM gstudent S NATURAL JOIN completed C WHERE C.course = CS386 ; 99 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

100 How are views implemented? When you enter a query that mentions a view in the from clause, the DBMS expands/rewrites your query to include the view definition. SELECT S.name, S.phone FROM gstudent S NATURAL JOIN completed C WHERE C.course = CS386 ; is rewritten as: SELECT S.name, S.phone FROM (SELECT S.* FROM student S WHERE s.gpa >= 2.5) AS S NATURAL JOIN completed C WHERE C.course = CS386 ; 100 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

101 Views used for Security This views presents a table (called sstudent) that is the student relation without the gpa field. CREATE VIEW sstudent AS SELECT studid, name, address FROM student Can you think of some other security examples? 101 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

102 Views used to support extensibility A company s database includes a relation: Part (PartID: Char(4), weight: real, ) Weight is stored in pounds Company is purchased by a firm that uses metric weights Databases must be integrated and the weight must be in metric units But there s much old software using pounds. Solution: views! 102 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

103 Views to support extensibility (cont.) Solution: 1. Define a new base table with kilograms, NewPart, for the integrated company. 2. CREATE VIEW Part AS SELECT PartID, *weight, (no other changes) FROM NewPart 3. Old programs still call the table Part ; now they are referencing a view; before they references the table. 103 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

104 Data Independence A database management system supports data independence if application programs are immune to changes in the conceptual and physical schemas. Why is this important? Because everything can change. How does the relational model achieve logical (conceptual) data independence? 104 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

105 Levels of Abstraction support Data Independence External view; user and data designer Logical data indepence Logical storage; data designer Physical data indepence Physical storage; DBA ES 1 ES 2 ES 3 Conceptual Schema Physical Schema 105 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

106 Levels of Abstraction in a relational DBMS External schemas are Views ES 1 ES 2 ES 3 Conceptual schema: ERDs Logical schema: tables Conceptual Schema Physical schema: the tables after they are modified by the DBA, plus indexes, files, etc. Physical Schema 106 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

107 A few Comments on Views View versus temporary table What is the difference? Which is better? Some DBMS offer a way around the view update problem Triggers Rules Active research 107 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

108 Summary A view is a stored query definition; rows are not computed until they are needed. Views can be very useful Easier query writing, security, extensibility Views cannot always be unambiguously updated Three levels of abstraction in a DBMS supports data independence: logical and physical 108 CS386/586 Introduction to Database Systems, Lois Delcambre Slide 27 Some slides adapted from R. Ramakrishnan, with permission Lecture 4

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

CS330. Query Processing

CS330. Query Processing CS330 Query Processing 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull interface: when an operator is `pulled for

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Overview of Query Evaluation

Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Relational Query Optimization. Highlights of System R Optimizer

Relational Query Optimization. Highlights of System R Optimizer Relational Query Optimization Chapter 15 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Highlights of System R Optimizer v Impact: Most widely used currently; works well for < 10 joins.

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors: Query Optimization Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors:

Query Optimization. Schema for Examples. Motivating Example. Similar to old schema; rname added for variations. Reserves: Sailors: Query Optimization atabase Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Schema for Examples (sid: integer, sname: string, rating: integer, age: real) (sid: integer, bid: integer, day: dates,

More information

Translating an ER Diagram to a Relational Schema

Translating an ER Diagram to a Relational Schema Translating an ER Diagram to a Relational Schema CS386/586 Introduction to Database Systems, Lois Delcambre 1999-2009 Slide 1 Translate each entity set into a table, with keys. Entity set: represented

More information

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week. Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator

More information

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System Review Relational Query Optimization R & G Chapter 12/15 Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations. Relational Operations Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes

Schema for Examples. Query Optimization. Alternative Plans 1 (No Indexes) Motivating Example. Alternative Plans 2 With Indexes Schema for Examples Query Optimization (sid: integer, : string, rating: integer, age: real) (sid: integer, bid: integer, day: dates, rname: string) Similar to old schema; rname added for variations. :

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15 Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into

More information

Query Evaluation Overview, cont.

Query Evaluation Overview, cont. Query Evaluation Overview, cont. Lecture 9 Feb. 29, 2016 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record

More information

Query Evaluation Overview, cont.

Query Evaluation Overview, cont. Query Evaluation Overview, cont. Lecture 9 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Architecture of a DBMS Query Compiler Execution Engine Index/File/Record Manager

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Chapter 15 Ramakrishnan & Gehrke (Sections 15.1-15.6) CPSC404, Laks V.S. Lakshmanan 1 What you will learn from this lecture Cost-based query optimization (System R) Plan space

More information

Principles of Data Management. Lecture #12 (Query Optimization I)

Principles of Data Management. Lecture #12 (Query Optimization I) Principles of Data Management Lecture #12 (Query Optimization I) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v B+ tree

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

15-415/615 Faloutsos 1

15-415/615 Faloutsos 1 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415/615 Faloutsos 1 Outline introduction selection

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DMS Internals- Part X Lecture 21, April 7, 2015 Mohammad Hammoud Last Session: DMS Internals- Part IX Query Optimization Today Today s Session: DMS Internals- Part X Query

More information

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Administriva Lab 2 Final version due next Wednesday CS 133: Databases Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Problem sets PSet 5 due today No PSet out this week optional practice

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine

An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested blocks are usually treated as calls to a subroutine QUERY OPTIMIZATION 1 QUERY OPTIMIZATION QUERY SUB-SYSTEM 2 ROADMAP 3. 12 QUERY BLOCKS: UNITS OF OPTIMIZATION An SQL query is parsed into a collection of query blocks optimize one block at a time. Nested

More information

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst March 8 and 13, 2007 Relational Query Optimization Yanlei Diao UMass Amherst March 8 and 13, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy

CompSci 516 Data Intensive Computing Systems. Lecture 11. Query Optimization. Instructor: Sudeepa Roy CompSci 516 Data Intensive Computing Systems Lecture 11 Query Optimization Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW2 has been posted on sakai Due on Oct

More information

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization

Administrivia. Relational Query Optimization (this time we really mean it) Review: Query Optimization. Overview: Query Optimization Relational Query Optimization (this time we really mean it) R&G hapter 15 Lecture 25 dministrivia Homework 5 mostly available It will be due after classes end, Monday 12/8 Only 3 more lectures left! Next

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VIII Lecture 16, March 19, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part VII Algorithms for Relational Operations (Cont d) Today s Session:

More information

CIS 330: Applied Database Systems. ER to Relational Relational Algebra

CIS 330: Applied Database Systems. ER to Relational Relational Algebra CIS 330: Applied Database Systems ER to Relational Relational Algebra 1 Logical DB Design: ER to Relational Entity sets to tables: ssn name Employees lot CREATE TABLE Employees (ssn CHAR(11), name CHAR(20),

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

External Sorting Implementing Relational Operators

External Sorting Implementing Relational Operators External Sorting Implementing Relational Operators 1 Readings [RG] Ch. 13 (sorting) 2 Where we are Working our way up from hardware Disks File abstraction that supports insert/delete/scan Indexing for

More information

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Midterm Review CS634 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Coverage Text, chapters 8 through 15 (hw1 hw4) PKs, FKs, E-R to Relational: Text, Sec. 3.2-3.5, to pg.

More information

Operator Implementation Wrap-Up Query Optimization

Operator Implementation Wrap-Up Query Optimization Operator Implementation Wrap-Up Query Optimization 1 Last time: Nested loop join algorithms: TNLJ PNLJ BNLJ INLJ Sort Merge Join Hash Join 2 General Join Conditions Equalities over several attributes (e.g.,

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007 Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

Storage and Indexing

Storage and Indexing CompSci 516 Data Intensive Computing Systems Lecture 5 Storage and Indexing Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcement Homework 1 Due on Feb

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques Chapter 12, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections v Cost depends on #qualifying

More information

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION

QUERY OPTIMIZATION E Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 QUERY OPTIMIZATION E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Database Engines Main Components Query Processing Transaction Processing Access Methods JAN 2014 Slide

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Overview of DB & IR. ICS 624 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 624 Spring 2011 Overview of DB & IR Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/12/2011 Lipyeow Lim -- University of Hawaii at Manoa 1 Example

More information

Implementation of Relational Operations: Other Operations

Implementation of Relational Operations: Other Operations Implementation of Relational Operations: Other Operations Module 4, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Simple Selections SELECT * FROM Reserves R WHERE R.rname < C% Of the form σ

More information

Query Processing and Query Optimization. Prof Monika Shah

Query Processing and Query Optimization. Prof Monika Shah Query Processing and Query Optimization Query Processing SQL Query Is in Library Cache? System catalog (Dict / Dict cache) Scan and verify relations Parse into parse tree (relational Calculus) View definitions

More information

Modern Database Systems Lecture 1

Modern Database Systems Lecture 1 Modern Database Systems Lecture 1 Aristides Gionis Michael Mathioudakis T.A.: Orestis Kostakis Spring 2016 logistics assignment will be up by Monday (you will receive email) due Feb 12 th if you re not

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems

Lecture 2 SQL. Instructor: Sudeepa Roy. CompSci 516: Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 2 SQL Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcement If you are enrolled to the class, but

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

ECS 165B: Database System Implementa6on Lecture 7

ECS 165B: Database System Implementa6on Lecture 7 ECS 165B: Database System Implementa6on Lecture 7 UC Davis April 12, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: Dynamic aspects of

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Evaluation of Relational Operations. SS Chung

Evaluation of Relational Operations. SS Chung Evaluation of Relational Operations SS Chung Cost Metric Query Processing Cost = Disk I/O Cost + CPU Computation Cost Disk I/O Cost = Disk Access Time + Data Transfer Time Disk Acess Time = Seek Time +

More information

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy CompSci 516 Data Intensive Computing Systems Lecture 3 SQL - 2 Instructor: Sudeepa Roy Announcements HW1 reminder: Due on 09/21 (Thurs), 11:55 pm, no late days Project proposal reminder: Due on 09/20 (Wed),

More information

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Implementing Relational Operators: Selection, Projection, Join Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Readings [RG] Sec. 14.1-14.4 Database Management Systems, R. Ramakrishnan and

More information

Implementing Joins 1

Implementing Joins 1 Implementing Joins 1 Last Time Selection Scan, binary search, indexes Projection Duplicate elimination: sorting, hashing Index-only scans Joins 2 Tuple Nested Loop Join foreach tuple r in R do foreach

More information

SQL. Chapter 5 FROM WHERE

SQL. Chapter 5 FROM WHERE SQL Chapter 5 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh 1 Basic SQL Query SELECT FROM WHERE [DISTINCT] target-list

More information

Database Management System

Database Management System Database Management System Lecture Join * Some materials adapted from R. Ramakrishnan, J. Gehrke and Shawn Bowers Today s Agenda Join Algorithm Database Management System Join Algorithms Database Management

More information

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...)

Introduction to Data Management. Lecture 14 (SQL: the Saga Continues...) Introduction to Data Management Lecture 14 (SQL: the Saga Continues...) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW and

More information

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems

Lecture 3 More SQL. Instructor: Sudeepa Roy. CompSci 516: Database Systems CompSci 516 Database Systems Lecture 3 More SQL Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Announcements HW1 is published on Sakai: Resources -> HW -> HW1 folder Due on

More information

Agent 7 which languages? skills?

Agent 7 which languages? skills? Agent 7 which languages? skills? select * from languagerel where agent_id = 7 lang_id agent_id 3 7 14 7 19 7 20 7 agent 7 speaks 4 languages select * from skillrel where agent_id = 7 skill_id agent_id

More information

SQL: The Query Language Part 1. Relational Query Languages

SQL: The Query Language Part 1. Relational Query Languages SQL: The Query Language Part 1 CS 186, Fall 2002, Lecture 9 R &G - Chapter 5 Life is just a bowl of queries. -Anon (not Forrest Gump) Relational Query Languages A major strength of the relational model:

More information

Overview of Query Processing

Overview of Query Processing ICS 321 Fall 2013 Overview of Query Processing Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/20/2013 Lipyeow Lim -- University of Hawaii at Manoa 1

More information

SQL: Queries, Constraints, Triggers

SQL: Queries, Constraints, Triggers SQL: Queries, Constraints, Triggers [R&G] Chapter 5 CS4320 1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for the Reserves relation contained

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Overview of Query Evaluation. Chapter 12

Overview of Query Evaluation. Chapter 12 Overview of Query Evaluation Chapter 12 1 Outline Query Optimization Overview Algorithm for Relational Operations 2 Overview of Query Evaluation DBMS keeps descriptive data in system catalogs. SQL queries

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

System R Optimization (contd.)

System R Optimization (contd.) System R Optimization (contd.) Instructor: Sharma Chakravarthy sharma@cse.uta.edu The University of Texas @ Arlington Database Management Systems, S. Chakravarthy 1 Optimization Criteria number of page

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VII Lecture 15, March 17, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part VI Algorithms for Relational Operations Today s Session: DBMS

More information

Review: Where have we been?

Review: Where have we been? SQL Basic Review Query languages provide 2 key advantages: Less work for user asking query More opportunities for optimization Algebra and safe calculus are simple and powerful models for query languages

More information

Physical Database Design and Tuning. Chapter 20

Physical Database Design and Tuning. Chapter 20 Physical Database Design and Tuning Chapter 20 Introduction We will be talking at length about database design Conceptual Schema: info to capture, tables, columns, views, etc. Physical Schema: indexes,

More information

Query Processing & Optimization. CS 377: Database Systems

Query Processing & Optimization. CS 377: Database Systems Query Processing & Optimization CS 377: Database Systems Recap: File Organization & Indexing Physical level support for data retrieval File organization: ordered or sequential file to find items using

More information

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Cost-based Query Sub-System Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer C. Faloutsos A. Pavlo

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13! Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13! q Overview! q Optimization! q Measures of Query Cost! Query Evaluation! q Sorting! q Join Operation! q Other

More information

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis 1 Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis 1 General approach to the implementation of Query Processing and Query Optimisation functionalities in DBMSs 1. Parse

More information

SQL: Queries, Programming, Triggers

SQL: Queries, Programming, Triggers SQL: Queries, Programming, Triggers CSC343 Introduction to Databases - A. Vaisman 1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for the

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables SQL: Queries, Programming, Triggers Chapter 5 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 R1 Example Instances We will use these instances of the Sailors and Reserves relations in our

More information

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23 Final Exam Review 2 Kathleen Durant CS 3200 Northeastern University Lecture 23 QUERY EVALUATION PLAN Representation of a SQL Command SELECT {DISTINCT} FROM {WHERE

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Experimenting with bags (tables and query answers with duplicate rows):

Experimenting with bags (tables and query answers with duplicate rows): January 16, 2013 Activities CS 386/586 Experimenting with bags (tables and query answers with duplicate rows): Write an SQL query (and run it against the sailors database) that does the following: 1. List

More information

Friday Nights with Databases!

Friday Nights with Databases! Introduction to Data Management Lecture #22 (Physical DB Design) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 It s time again for... Friday

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information