Query Processing Lecture #10 Database Systems 15-445/15-645 Fall 2018 AP Andy Pavlo Computer Science Carnegie Mellon Univ.
2 ADMINISTRIVIA Project #2 Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on Wednesday October 17 th (in class)
3 UPCOMING DATABASE EVENTS SQream DB Tech Talk Thursday Oct 4 th @ 12:00pm CIC 4 th Floor
4 QUERY PL AN The operators are arranged in a tree. Data flows from the leaves toward the root. The output of the root node is the result of the query. SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 p A.id, B.value A.id=B.id s value>100 A B
5 Processing Models Access Methods Expression Evaluation TODAY'S AGENDA
6 PROCESSING MODEL A DBMS's processing model defines how the system executes a query plan. Different trade-offs for different workloads. Three approaches: Iterator Model Materialization Model Vectorized / Batch Model
7 ITERATOR MODEL Each query plan operator implements a next function. On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Top-down plan processing. Also called Volcano or Pipeline Model.
8 ITERATOR MODEL for t in A: emit(t) for t in child.next(): emit(projection(t)) for t 1 in left.next(): buildhashtable(t 1 ) for t 2 in right.next(): if probe(t 2 ): emit(t 1 t 2 ) for t in child.next(): if evalpred(t): emit(t) for t in B: emit(t) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
8 ITERATOR MODEL 1 for t in A: emit(t) for t in child.next(): emit(projection(t)) for t 1 in left.next(): buildhashtable(t 1 ) for t 2 in right.next(): if probe(t 2 ): emit(t 1 t 2 ) for t in child.next(): if evalpred(t): emit(t) for t in B: emit(t) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
8 ITERATOR MODEL 3 2 1 for t in A: emit(t) for t in child.next(): emit(projection(t)) for t 1 in left.next(): buildhashtable(t 1 ) for t 2 in right.next(): if probe(t 2 ): emit(t 1 t 2 ) for t in child.next(): if evalpred(t): emit(t) for t in B: emit(t) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
8 ITERATOR MODEL 2 1 for t in child.next(): emit(projection(t)) for t 1 in left.next(): buildhashtable(t 1 ) for t 2 in right.next(): if probe(t 2 ): emit(t 1 t 2 ) for t in child.next(): if evalpred(t): emit(t) for t in A: for t in B: 3 emit(t) emit(t) 5 4 SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
9 ITERATOR MODEL This is used in almost every DBMS. Allows for tuple pipelining. Some operators will block until children emit all of their tuples. Joins, Subqueries, Order By Output control works easily with this approach. Limit
10 MATERIALIZATION MODEL Each operator processes its input all at once and then emits its output all at once. The operator "materializes" it output as a single result. The DBMS can push down hints into to avoid scanning too many tuples. Bottom-up plan processing.
11 MATERIALIZATION MODEL 1 for t in A: out.add(t) for t in child.output(): out.add(projection(t)) for t 1 in left.output(): buildhashtable(t 1 ) for t 2 in right.output(): if probe(t 2 ): out.add(t 1 t 2 ) for t in child.output(): if evalpred(t): out.add(t) for t in B: out.add(t) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
11 MATERIALIZATION MODEL 1 for t in A: out.add(t) for t in child.output(): out.add(projection(t)) for t 1 in left.output(): buildhashtable(t 1 ) for t 2 in right.output(): if probe(t 2 ): out.add(t 1 t 2 ) for t in child.output(): if evalpred(t): out.add(t) for t in B: out.add(t) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
11 MATERIALIZATION MODEL for t in A: out.add(t) for t in child.output(): out.add(projection(t)) for t 1 in left.output(): buildhashtable(t 1 ) for t 2 in right.output(): if probe(t 2 ): out.add(t 1 t 2 ) for t in child.output(): if evalpred(t): out.add(t) for t in B: out.add(t) 1 2 3 SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
11 MATERIALIZATION MODEL 4 5 for t in A: out.add(t) for t in child.output(): out.add(projection(t)) for t 1 in left.output(): buildhashtable(t 1 ) for t 2 in right.output(): if probe(t 2 ): out.add(t 1 t 2 ) for t in child.output(): if evalpred(t): out.add(t) for t in B: out.add(t) 1 2 3 SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
12 MATERIALIZATION MODEL Better for OLTP workloads because queries typically only access a small number of tuples at a time. Lower execution / coordination overhead. Not good for OLAP queries with large intermediate results.
13 VECTORIZATION MODEL Like Iterator Model, each operator implements a next function. Each operator emits a batch of tuples instead of a single tuple. The operator's internal loop processes multiple tuples at a time. The size of the batch can vary based on hardware or query properties.
14 3 2 for t in A: out.add(t) if out >n: emit(out) VECTORIZATION MODEL 1 for t in child.output(): out.add(projection(t)) if out >n: emit(out) for t 1 in left.output(): buildhashtable(t 1 ) for t 2 in right.output(): if probe(t 2 ): out.add(t 1 t 2 ) if out >n: emit(out) for t in child.output(): if evalpred(t): out.add(t) if out >n: emit(out) for t in B: out.add(t) if out >n: emit(out) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
14 3 2 for t in A: out.add(t) if out >n: emit(out) VECTORIZATION MODEL 1 for t in child.output(): out.add(projection(t)) if out >n: emit(out) for t 1 in left.output(): buildhashtable(t 1 ) for t 2 in right.output(): if probe(t 2 ): out.add(t 1 t 2 ) if out >n: emit(out) for t in child.output(): if evalpred(t): out.add(t) if out >n: emit(out) 4 for t in B: 5 out.add(t) if out >n: emit(out) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 A p A.id, B.value A.id=B.id s value>100 B
15 VECTORIZATION MODEL Ideal for OLAP queries Greatly reduces the number of invocations per operator. Allows for operators to use vectorized (SIMD) instructions to process batches of tuples.
16 PROCESSING MODELS SUMMARY Iterator / Volcano Direction: Top-Down Emits: Single Tuple Target: General Purpose Vectorized Direction: Top-Down Emits: Tuple Batch Target: OLAP Materialization Direction: Bottom-Up Emits: Entire Tuple Set Target: OLTP
17 ACCESS METHODS An access method is a way that the DBMS can access the data stored in a table. Not defined in relational algebra. Three basic approaches: Sequential Scan Index Scan Multi-Index / "Bitmap" Scan SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100 p A.id, B.value A.id=B.id s value>100 A B
18 SEQUENTIAL SCAN For each page in the table: Retrieve it from the buffer pool. Iterate over each tuple and check whether to include it. for page in table.pages: for t in page.tuples: if evalpred(t): // Do Something! The DBMS maintains an internal cursor that tracks the last page / slot it examined.
19 SEQUENTIAL SCAN: OPTIMIZATIONS This is almost always the worst thing that the DBMS can do to execute a query. Sequential Scan Optimizations: Prefetching Parallelization Buffer Pool Bypass Zone Maps Late Materialization Heap Clustering
20 ZONE MAPS Pre-computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page. Original Data val 100 200 300 400 400 Zone Map type MIN MAX AVG SUM COUNT val 100 400 280 1400 5
20 ZONE MAPS Pre-computed aggregates for the attribute values in a page. DBMS checks the zone map first to decide whether it wants to access the page. Original Data Zone Map SELECT * FROM table WHERE val > 600 val 100 200 300 400 400 type MIN MAX AVG SUM COUNT val 100 400 280 1400 5
21 L ATE MATERIALIZATION DSM DBMSs can delay stitching together tuples until the upper parts of the query plan. bar γ AVG(c) foo.b=bar.b s a>100 foo SELECT AVG(C) FROM foo JOIN bar ON foo.b = bar.b WHERE a > 100 0 1 2 3 a b c
21 L ATE MATERIALIZATION DSM DBMSs can delay stitching together tuples until the upper parts of the query plan. bar γ AVG(c) foo.b=bar.b s a>100 foo Offsets SELECT AVG(C) FROM foo JOIN bar ON foo.b = bar.b WHERE a > 100 0 1 2 3 a b c
21 L ATE MATERIALIZATION DSM DBMSs can delay stitching together tuples until the upper parts of the query plan. bar γ AVG(c) foo.b=bar.b s a>100 foo Offsets Offsets SELECT AVG(C) FROM foo JOIN bar ON foo.b = bar.b WHERE a > 100 0 1 2 3 a b c
21 L ATE MATERIALIZATION DSM DBMSs can delay stitching together tuples until the upper parts of the query plan. bar γ AVG(c) foo.b=bar.b s a>100 foo Result Offsets Offsets SELECT AVG(C) FROM foo JOIN bar ON foo.b = bar.b WHERE a > 100 0 1 2 3 a b c
22 HEAP CLUSTERING Tuples are sorted in the heap's pages using the order specified by a clustering index. If the query accesses tuples using the clustering index's attributes, then the DBMS can jump directly to the pages that it needs. 101 102 103 104
22 HEAP CLUSTERING Tuples are sorted in the heap's pages using the order specified by a clustering index. If the query accesses tuples using the clustering index's attributes, then the DBMS can jump directly to the pages that it needs. Scan Direction 101 102 103 104
23 INDEX SCAN The DBMS picks an index to find the tuples that the query needs. Which index to use depends on: What attributes the index contains What attributes the query references The attribute's value domains Predicate composition Whether the index has unique or non-unique keys
24 INDEX SCAN Suppose that we a single table with 100 tuples and two indexes: Index #1: age Index #2: dept SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
24 INDEX SCAN Suppose that we a single table with 100 tuples and two indexes: Index #1: age Index #2: dept Scenario #1 There are 99 people under the age of 30 but only 2 people in the CS department. SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US' Scenario #2 There are 99 people in the CS department but only 2 people under the age of 30.
25 MULTI-INDEX SCAN If there are multiple indexes that the DBMS can use for a query: Compute sets of record ids using each matching index. Combine these sets based on the query's predicates (union vs. intersect). Retrieve the records and apply any remaining terms. Postgres calls this Bitmap Scan
26 MULTI-INDEX SCAN With an index on age and an index on dept, We can retrieve the record ids satisfying age<30 using the first, Then retrieve the record ids satisfying dept='cs' using the second, Take their intersection Retrieve records and check country='us'. SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US'
27 MULTI-INDEX SCAN Set intersection can be done with bitmaps, hash tables, or Bloom filters. SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US' age<30 dept='cs' record ids
27 MULTI-INDEX SCAN Set intersection can be done with bitmaps, hash tables, or Bloom filters. SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US' age<30 record ids dept='cs' record ids
27 MULTI-INDEX SCAN Set intersection can be done with bitmaps, hash tables, or Bloom filters. SELECT * FROM students WHERE age < 30 AND dept = 'CS' AND country = 'US' age<30 record ids dept='cs' record ids fetch records country='us'
28 INDEX SCAN PAGE SORTING Retrieving tuples in the order that appear in an unclustered index is inefficient. The DBMS can first figure out all the tuples that it needs and then sort them based on their page id. 101 102 103 104
28 INDEX SCAN PAGE SORTING Retrieving tuples in the order that appear in an unclustered index is inefficient. The DBMS can first figure out all the tuples that it needs and then sort them based on their page id. Scan Direction 101 102 103 104
28 INDEX SCAN PAGE SORTING Retrieving tuples in the order that appear in an unclustered index is inefficient. The DBMS can first figure out all the tuples that it needs and then sort them based on their page id. Scan Direction 101 102 103 104 Page 102 Page 103 Page 104 Page 104 Page 102 Page 103 Page 102 Page 102 Page 101 Page 103 Page 104 Page 103
28 INDEX SCAN PAGE SORTING Retrieving tuples in the order that appear in an unclustered index is inefficient. The DBMS can first figure out all the tuples that it needs and then sort them based on their page id. Scan Direction 101 102 103 104 Page 102 Page 103 Page 104 Page 104 Page 102 Page 103 Page 102 Page 102 Page 101 Page 103 Page 104 Page 103 Page 101 Page 101 Page 102 Page 102 Page 102 Page 102 Page 103 Page 103 Page 103 Page 104 Page 104 Page 104
29 EXPRESSION EVALUATION The DBMS represents a WHERE clause as an expression tree. The nodes in the tree represent different expression types: Comparisons (=, <, >,!=) Conjunction (AND), Disjunction (OR) Arithmetic Operators (+, -, *, /, %) Constant Values Tuple Attribute References Attribute(A.id) = Attribute(B.id) SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.val > 100 AND Attribute(val) > Constant(100)
30 EXPRESSION EVALUATION SELECT * FROM B WHERE B.val =? + 1 Execution Context Current Tuple (123, 1000) = Query Parameters (int:999) Table Schema B (int:id, int:val) Attribute(val) + Parameter(0) Constant(1)
30 EXPRESSION EVALUATION SELECT * FROM B WHERE B.val =? + 1 Execution Context Current Tuple (123, 1000) Query Parameters (int:999) Table Schema B (int:id, int:val) = Attribute(val) 1000 + Parameter(0) Constant(1)
30 EXPRESSION EVALUATION SELECT * FROM B WHERE B.val =? + 1 Execution Context Current Tuple (123, 1000) Query Parameters (int:999) Table Schema B (int:id, int:val) = Attribute(val) 1000 + Parameter(0) 999 Constant(1)
30 EXPRESSION EVALUATION SELECT * FROM B WHERE B.val =? + 1 Execution Context Current Tuple (123, 1000) Query Parameters (int:999) Table Schema B (int:id, int:val) = Attribute(val) 1000 + Parameter(0) Constant(1) 999 1
30 EXPRESSION EVALUATION SELECT * FROM B WHERE B.val =? + 1 Execution Context Current Tuple (123, 1000) Query Parameters (int:999) Table Schema B (int:id, int:val) Attribute(val) 1000 = true + 1000 Parameter(0) Constant(1) 999 1