Module 8: Evaluation of Relational Operators

Size: px
Start display at page:

Download "Module 8: Evaluation of Relational Operators"

Transcription

1 Module 8: Evaluation of Relational Operators Module Outline 8.1 The DBMS s runtime system 8.2 General remarks 8.3 The selection operation 8.4 The projection operation 8.5 The join operation 8.6 A Three-Way Join Operator 8.7 Other Operators 8.8 The impact of buffering 8.9 Managing long pipelines of relational operators Web Forms Transaction Manager Lock Manager Plan Executor Operator Evaluator Concurrency Control Applications SQL Commands You are here! Files and Index Structures Buffer Manager Disk Space Manager Parser Optimizer SQL Interface Query Processor Recovery Manager DBMS Index Files Data Files System Catalog Database 189

2 8.1 The DBMS s runtime system In some sense we can consider the implementation of the relational operators as a database s runtime system: The query plan (network of relational operators), constitutes the program to execute, 1 the relational operators act on files on disk (relations) and implement the behaviour of the plan. 2 The efficient evaluation of the relational operators should be carefully studied and tuned: Each operator implements only a small step of the overall query plan (thus, a plan for a query of modest complexity may easily contain up to 100 operators), the set of relational operators is designed to be small, each operator fulfills multiple tasks. 1 Compare this, e.g., to Java byte codes. 2 Again, in the Java world, this would be comparable to the Java VM. 190

3 Representation of Query Plans As in internal representation of queries, a DBMS typically uses an operator tree, whose internal nodes represent logical (e.g., algebra-style) or physical (e.g., concrete implementation algorithms) operators. Directed arcs connect arguments (inputs) to operators and operators to their output. As a result of query optimization, arguments that are used in multiple places may be connected to several operators, so we may end up with networks of operators, such as: R S sort T 191

4 Logical vs. Physical Operators A typical DBMS provides several implementations for a single relational operator (i.e., instead of we have,, ). For equivalent input file(s), all variants produce an equivalent output file. Equivalent? What do you think is precisely meant by equivalent here? Why don t we just say identical? Terminology: the variants,... are the different physical operators implementing the logical operator. We will discuss physical operators in this chapter. The query optimizer analyzes a given query plan based on its knowledge of the system internals, statistics, and ongoing bookkeeping and selects the best specific variant for each operator. During query optimization, logical operators are replaced by physical ones. 192

5 Physical Properties However, a specific variant may be tailored to exploit several physical properties of the system: the presence or absence of indexes on the input file(s), the sortedness of the input file(s), the size of the input file(s), the available space in the buffer pool (cf., external sorting in Chapter 7), the buffer replacement policy,... Example: The optimizer has marked each edge of the plan to indicate if the records flowing over this edge are sorted with respect to some sort key k or not (sorted:, unsorted: ): s u R s u u u S u u sort u T s s s 193

6 In general, the query optimizer may quite heavily transform the original plan to enable the use of the most efficient physical operators variants. Example (assume that physical operators op can exploit sortedness of their input(s), e.g., might be sort-merge join): R s s s S u sort s s s T s s 194

7 8.2 General remarks The system catalog A relational table can be stored in different file structures, we can create one or more indexes on each table, which are also stored in files. Conversely, a file may contain data from one (or more) table(s) or the entries in an index. Such data is refered to as (primary) data in the database. A relational DBMS maintains information about every table and index it contains. Such descriptive information is itself stored in a collection of special tables, the so-called catalog tables, aka. the system catalog, the data dictionary, the system catalog, or just the catalog. Catalog information includes relation and attribute names, attribute domains, integrity constraints, access privileges, and much more. Also, the query processor (or the query optimizer) draws a lot of information from the system catalog, such as, e.g., file structure for each table, availability of indexes, number of tuples in each relation, number of pages in each file,... We ll come back to some of these later. 195

8 8.2.2 Principal approaches to operator evaluation Algorithms for evaluating relational operators have a lot in common, they are based upon one of the following principles: 1 Indexing. If some form of (selection, join) condition is given, use an index to examine just the tuples that satisfy the condition. In more generality:... to examine a superset of candidate tuples that may satisfy the condition. 2 Iteration. Examine all tuples in an input table, one after the other. Index-only plans:... if there is an index covering all required attributes, we can scan the index instead of the data file. 3 Partitioning. By partitioning on a subset of attributes values, we can often decompose an operation into a less expensive collection of operations on partitions. Sorting and hashing are commonly used partitioning techniques. Devide-and-conquer:... partitioning is an instance of this principle of algorithm design. 196

9 8.3 The selection operation No index, unsorted data Selection (p) reads an input file r in of records and writes those records satisfying predicate p into the output file: Algorithm: (p, r in, r out) Input: predicate p, input file r in Output: output file r out written (side-effect) Observations: out createfile(r out); in openscan(r in ); while (r nextrecord(in)) EOF do if p(r) then appendrecord(out, r); closefile(out); Reading special record EOF from a file indicates the end of the input file. This simple procedure does not require r in to come with any special physical properties (the procedure is exclusively defined in terms of heap files, see Section 2.4.1). Predicate p may be arbitrary. 197

10 Query execution cost We summarize the characteristics of this implementation of the selection operator as follows: p (r in ) input access 3 prerequisites I/O cost file scan (openscan) of r in none (p arbitrary, r in may be a heap file) r in + sel(p) r }{{} in }{{} input cost output cost r in denotes the number of pages in file r in, r in denotes the number of records (if b records fit on one page, we have r in = r in /b ). 3 Sometimes also called access path in the literature and text books. 198

11 Selectivity sel(p), the selectivity of predicate p, is the fraction of records satisfying predicate p: 0 sel(p) = p (r in ) 1 r in Selectivity What can you say about the following selectivities? 1 sel(true) 2 sel(false) 3 sel(a = 0) 199

12 8.3.2 No index, sorted data If the input file r in is sorted with respect to a sort key k, we can use binary search on r in to find the first record matching predicate p more quickly. To find more hits, scan the sorted file. Obviously, predicate p must match the sort key k in some way. Otherwise we won t benefit from the sortedness of r in. When does a predicate match a sort key? Assume r in is sorted on attribute A in ascending order. Which of the selections below can benefit from the sortedness of r in? 1 A=42 (r in ) 2 A>42 (r in ) 3 A<42 (r in ) 4 A>42 AND A<100 (r in ) 5 A>42 OR A>100 (r in ) 6 A>42 OR A<32 (r in ) 7 A>42 AND A<32 (r in ) 8 A>42 AND B=10 (r in ) 9 A>42 OR B=10 (r in ) 200

13 We defer the treatment of disjunctive predicates (e.g., A > 42 OR A < 32) until later. The characteristics of selection via binary search are: p (r in ) input access prerequisites I/O cost binary search, then sorted file scan of r in r in sorted on key k, p matches sort key k log 2 r in + sel(p) r }{{} in + sel(p) r }{{} in }{{} binary search sorted scan output cost 201

14 8.3.3 B + tree index A clustered B + tree index on r in whose key matches the selection predicate p is clearly the superior method to evaluate p (r in ): Descend the B + tree to retrieve the first index entry to satisfy p. Then scan the sequence set to find more matching records. If the index is unclustered and sel(p) indicates a large number of qualifying records, it pays off to 1 read the index entries k, rid in the sequence set, 2 sort those entries on their rid field, 3 and then access the pages of r in in sorted rid order. Note that lack of clustering is a minor issue if sel(p) is close to 0. Why? p (r in ) input access access of B + tree on r in, then sequence set scan prerequisites clustered B + tree on r in with key k, p matches key k I/O cost 3 }{{} + sel(p) r in + sel(p) r }{{} in }{{} B + tree acc. sorted scan output cost 202

15 8.3.4 Hash index, equality selection A selection predicate p matches a hash index only if p contains a term of the form A = c (assuming the hash index is over key attribute A). We are directly led to the bucket of qualifying records and pay I/O cost only for the access of this bucket 4. Note that sel(p) is likely to be close to 0 for equality predicates. p (r in ) input access prerequisites I/O cost hash table on r in r in hashed on key k, p has term k = c sel(p) r in + sel(p) r }{{} in }{{} hash access output cost 4 Remember that this may include access cost for the pages of an overflow chain hanging off the primary bucket page. 203

16 8.3.5 General selection conditions Indeed, selection operations with simple predicates like A θ c (r in ) are a special case only. We somehow need to deal with complex predicates, built from simple comparisons and the boolean connectives AND and OR Conjunctive predicates and index matching Our simple notion of matching a selection predicate with an index can be extended to cover the case where predicate p has a conjunctive form: A 1 θ 1 c 1 }{{} conjunct AND A 2 θ 2 c 2 AND AND A n θ n c n. Here, each conjunct is a simple comparison (θ i {=, <, >,, }). An index with a multi-attribute key may match the entire complex predicate. 204

17 Matching a multi-attribute hash index. Suppose a hash index is maintained for the 3-attribute key k = (A, B, C) (i.e., all three attributes are input to the hash function). Which types of conjunctive selection predicates p would match this index? p =? Predicate matching rule for hash indexes: A conjunctive predicate p matches a (multi-attribute) hash index with key k = (A 1, A 2,..., A n ), if p covers the key k, i.e. 1 p A 1 = c 1 A 2 = c 2 A n = c n or 2 p A 1 = c 1 A 2 = c 2 A n = c n φ (conjunct φ is not supported by the index itself and has to be evaluated separately after index retrieval). 205

18 Matching a multi-attribute B + tree index. We have a B + tree index available on the multi-attribute key (A, B, C), i.e., the B + tree nodes are inserted/searched for using a lexicographic order on the three attributes. What this means is that inside the B + tree two keys k 1 = (A 1, B 1, C 1 ) and k 2 = (A 2, B 2, C 2 ) are ordered according to k 1 < k 2 A 1 < A 2 (A 1 = A 2 B 1 < B 2 ) (A 1 = A 2 B 1 = B 2 C 1 < C 2 ). Which types of conjunctive selection predicates p would match this B + tree index? 206

19 Predicate matching rule for B + tree indexes A conjunctive predicate p matches a (multi-attribute) B + tree index with key k = (A 1, A 2,..., A n ), if p is a prefix of key k, i.e. 1 p A 1 θ 1 c 1 p A 1 θ 1 c 1 A 2 θ 2 c 2 or. p A 1 θ 1 c 1 A 2 θ 2 c 2 A n θ n c n 2 p A 1 θ 1 c 1 φ p A 1 θ 1 c 1 A 2 θ 2 c 2 φ. p A 1 θ 1 c 1 A 2 θ 2 c 2 A n θ n c n φ 207

20 Intersecting rid sets If we find that a conjunctive predicate does not match a single index, its (smaller) conjuncts may nevertheless match distinct indexes. Example: The conjunctive predicate in p q (r in ) does not match an index, but both conjuncts, p and q, do. A typical optimizer might thus decide to transform the original query r in p q into r in p q rid rid denotes an set intersection operator defined by rid equality. 208

21 The selectivity of conjunctive predicates What can you say about the selectivity of the conjunctive predicate p q? sel(p q) =? 209

22 Disjunctive predicates Chosing an intelligent execution plan for disjunctive selection predicates of the general form A 1 θ 1 c 1 A 2 θ 2 c 2 A n θ n c n. is much harder: We are forced to fall back to a naive file scan based evaluation (see Section 8.3.1) as soon as only a single term does not match an index. Why? If all terms are supported by indexes we can exploit a rid-based set union rid to improve the plan: r in A 1 θ 1 c 1. A n θ n c n rid 210

23 The selectivity of disjunctive predicates What can you say about the selectivity of the disjunctive predicate p q? sel(p q) =? Predicates involving attribute attribute comparisons Can you think of a clever query plan for a selection operation like the one shown below? A=B (r in ). 211

24 Bypass Selections Problem: parts of a selection condition may be expensive to check (typically, we assumed this was not the case!), or be very inselective. It is useful to evaluate cheap (and selective) predicates first. Boolean laws used for this include: true P true (evaluating P is not necessary) false P P (only now evaluate P ) Example: Q := σ (F1 F 2 ) F 3 (R), where the selectivities and cost of each part of the selection condition are as follows: formula selectivity cost F 1 s 1 = 0.6 C 1 = 18 F 2 s 2 = 0.4 C 2 = 3 F 3 s 3 = 0.7 C 3 =

25 Evaluation Alternative 1: Bring the selection condition into disjunctive normal form (DNF) it is already in DNF in our case. Push each tuple from the input through each disjunct in parallel. Collect matching tuples from each disjunct (eliminating duplicates!) #=1000 #=700 F 3 #=1000 #=1000 F 2 #=400 #=240 F 1 #= dups. elim d Mean cost per tuple (ignoring cost for duplicate eliminiation!): C }{{} 3 + C 2 + s }{{} 2 C 1 = 50.2 }{{} upper path lower path: F 2 lower path: F 1 213

26 Evaluation Alternative 2: Bring the selection condition into conjunctive normal form (CNF). CNF [(F 1 F 2 ) F 3 ] = (F 1 F 3 ) (F 2 F 3 ). Push each tuple from the input through each conjunct in a row. Matching tuples survive all conjunct (no duplicate elimination necessary!) Mean cost per tuple: #=1000 F2 F 3 #=820 F 1 F 3 #=772 C 2 + (1 s 2 ) (C 3 + s 3 (C 1 + (1 s 1 ) C 3 )) + s 2 (C 1 + (1 s 1 ) C 3 ) = Problem: F 3 evaluated multiple times, result could be cached! Mean cost per tuple with caching: C 2 + C 3 + s 2 (1 s 3 ) C 1 =

27 Evaluation Alternative 3: Bypass Plan Goal: eliminate tuples early, avoid duplicates. Introduce Bypass Selection Operator F, which produces two results: true and false outputs. (N.B. the two outputs are disjoint!) Bypass plans are derived from the CNF, i.e., (F 1 F 3 ) (F 2 F 3 ) in our example. Boolean factors and disjuncts in factors are sorted by cost. #=1000 F 3 #=600 #=420 false F 2 F 3 #112 true #=160 #=772 #=400 false F 1 true #=240 Mean cost per tuple (... disjoint union): C 2 + (1 s 2 ) C 3 + s 2 (C 1 + (1 s 1 ) C 3 ) = 40.6 Many variations are possible, e.g., for tuning in parallel environments. 215

28 8.4 The projection operation Projection (l) modifies each record in its input file and cuts off any field not listed in the attribute list l. Example: A B C 1 "foo" 3 1 "bar" 2 A,B 1 "foo" 2 1 "bar" 0 1 "foo" 0 = 1 A B 1 "foo" 1 "bar" 1 "foo" 1 "bar" 1 "foo" = 2 A B 1 "foo" 1 "bar" In general, the size of the resulting file will only be a fraction of the original input file: 1 any unwanted fields (here: C) have been thrown away, and 2 cutting off record fields may lead to duplicate records which have to be eliminated 5 to produce the final result. 5 Remember that we are bound to implement set semantics. 216

29 While step 1 calls for a rather straightforward file scan (indexes won t help much here), it is step 2 which makes projection costly. To implement duplicate elimination we have two principal alternatives: 1 sorting, or 2 hashing Projection based on sorting Sorting is one obvious preparatory step to facilitate duplicate elimination: records with all fields equal will be adjacent to each other after the sorting step. One benefit of a sort-based projection is that operator l output file, i.e.: will write a sorted (See algorithm on next slide.) r in? sort l s 217

30 Algorithm: Input: Output: (l, r in, r out) attribute list l, input file r in output file r out written (side-effect) out createfile(r tmp); in openscan(r in ); while (r nextrecord(in)) EOF do r r with any field cut off not listed in l; appendrecord(out, r ); closefile(out); external-merge-sort(r tmp, r tmp, θ); out createfile(r out); in openscan( run * 0 ); lastr ; while (r nextrecord(in)) EOF do if r lastr then appendrecord(out, r); lastr r; closefile(out); Sort ordering θ? How do we have to specify the ordering θ to make sure the above algorithm works correctly? 218

31 In this algorithm, sorting and duplicate elimination are two separate steps executed in sequence. Marriage of sorting and duplicate elimination? Can you imagine how a DBMS could fold the formerly separate phases ( 1 external merge sort, 2 duplicate elimination) to avoid the two-stage approach? The outline of the external merge sort algorithm is reproduced below. Pass 0: 1 Read B pages at a time, 2 use in-memory sort to sort the records on these B pages, 3 write the sorted run to disk. (N.B.: Pass 0 writes N/B runs to disk, each run contains B pages except the last run which may contain less.) Passes 1,... (until only a single run is left): 1 Select B 1 runs from previous pass, read a page from each run, 2 perform a (B 1)-way merge and use the B-th page as temporary output buffer. 219

32 8.4.2 Projection based on hashing If the DBMS has a fairly large number of buffer pages (B, say) to spare for the l (r in ) operation, a hash-based projection may be an efficient alternative to sorting: Partitioning phase: 1 Allocate all B buffer pages. One page will be the input buffer, the remaining B 1 pages will be used as hash buckets. 2 Read the file r in page-by-page, for each record r cut off fields not listed in l. 3 For each such record, apply hash function h 1(r) = h(r) mod (B 1) which depends on all remaining fields of r and store r in hash bucket h 1(r). (Write the bucket to disk if full. 6 ) input file partitions 2 hash function... h B B 1 disk B main memory buffers disk 6 You may read this as: a bucket s overflow chain resides on disk. 220

33 After partitioning, we are ensured that duplicate elimination is an intra-partition problem only: two identical records r, r have been mapped to the same partition: h 1 (r) = h 1 (r ) r = r. We are not done yet, though. Due to hash collisions, the records in a partition are not guaranteed to be all equal: We need a... h 1 (r) = h 1 (r ) r = r. Duplicate elimination phase: 1 For each partition, read each partition page-by-page. (Buffer page layout as before.) 2 To each record, apply hash function h 2! h 1. Why? 3 If two records r, r collide w.r.t. h 2, check if r = r. If so, discard r. 4 After the entire partition has been read in, append all hash buckets to the result file (which will be free of duplicates). N.B.: The hash-based approach is efficient only if the duplicate elimination phase can be performed in-memory (i.e., any partition may not exceed the buffer size). 221

34 8.4.3 Use of indexes for projection If the index key contains all attributes of the projection, we can use an indexonly plan to retrieve all values from the index pages without accessing the actual data records. Next we apply hashing or sorting to eliminate duplicates from this (much smaller) set of pages. If the index key includes the projected attributes as a prefix, and the index is a sorted index (e.g., a B + tree), we can use an index-only plan, both to retrieve the projected attribute values and to eliminate the duplicates as well. 222

35 8.5 The join operation The semantics of the join operation (r 1 p r 2 ) is most easily described in terms of two other relational operators: r 1 r 2 p r 1 r 2 p ( denotes the cross product operator, predicate p may refer to record fields in files r 1 and r 2.) The are several alternative algorithms that implement r 1 p r 2, and some of them actually implement the above relational equivalence: 1 enumerate all records in the cross product of r 1 and r 2, 2 then pick those record pairs satisfying predicate p. More advanced algorithms try to avoid the obvious inefficency in step 1 (the size of the intermediate result is r 1 r 2 ) and instead try to select early. 223

36 8.5.1 Nested loops join The nested loops join (NL-) is the basic join algorithm variant. Its I/O cost is forbidding, though. Algorithm: (p, r 1, r 2, r out) Input: predicate p, input files r 1,2 Output: output file r out written (side-effect) out createfile(r out); in 1 openscan(r 1); while (r nextrecord(in 1)) EOF do in 2 openscan(r 2); while (r nextrecord(in 2)) EOF do if p(r, r ) then appendrecord(out, r, r ); closefile(out); For obvious reasons, file r 1 is referred to as the outer (relation), while r 2 is commonly called the inner (relation). 224

37 Cost of NL- We can easily modify the algorithm such that for each page of the outer relation (instead of for each record), one scan of the inner relation is initiated. (If we ignored this simple modification, the I/O cost would be a prohibiting r 1 r 2 for the inner loop!) p (r 1, r 2 ) input access file scan (openscan) of r 1,2 prerequisites none (p arbitrary, r 1,2 may be heap files) 7 I/O cost r 1 + r }{{} 1 r 2 }{{} outer loop inner loop 7 Ignoring the cost to write the result file r out. 225

38 The I/O cost for the simple NL- is staggering since NL- effectively enumerates all records in the cross product of r 1 and r 2. Example: Assume r 1 = 1000 and r 2 = 500, on current hardware, a single I/O operation takes about 10 msec (see Section 2.1.1). The resulting processing time for the NL- of r 1 and r 2 thus amounts to ( ) 10 msec = msec 83 mins. Remark: Swapping the roles of r 1 and r 2 (outer inner) does not buy us much here. This will, however, be different for advanced join algorithms. 8 8 If the DBMS s record field accesses are designed with care we can assume that r 1 p r 2 = r 2 p r

39 8.5.2 Block nested loops join Observe that plain NL- utilizes only 3 buffer pages at a time and otherwise effectively ignores the presence of spare buffer space. Given B pages of buffer space we can easily refine NL- to use the entire available space. The buffer setup is as follows: input files join result h hash table for block of r1 B 2 pages... input buffer (scan r2 page wise) output buffer disk B main memory buffers disk The main idea is to read the outer file r 1 in chunks of B 2 pages (instead of page-by-page as in NL-). Hash table? Which role does the in-buffer hash table over file r 1 play here? 227

40 Algorithm: (p, r 1, r 2, r out) Input: equality predicate p (r 1.A = r 2.B), input files r 1,2 Output: output file r out written (side-effect) out createfile(r out); in 1 openscan(r 1); repeat // try to read a chunk of maximum size (but don t read beyond EOF of r 1) B min(b 2, #remaining blocks in r 1); if B > 0 then read B blocks of r 1 into buffer, hash record r of r 1 to buffer page h(r.a) mod B ; in 2 openscan(r 2); while (r nextrecord(in 2)) EOF do compare record r with records r stored in buffer page h(r.b) mod B ; if r.a = r.b then appendrecord(out, r, r ); until B < B 2 ; closefile(out); If predicate p is a general predicate, block NL- is still applicable (at the cost of more CPU cycles, since all B 2 in-buffer blocks of r 1 have to be scanned to find a join partner for record r of r 2 ). 228

41 p (r 1, r 2 ) input access chunk-wise file scan of r 1, page-wise file scan of r 2 prerequisites p equality predicate (or arbitrary), r 1,2 may be heap files r1 I/O cost r 1 + r }{{} 2 B 2 outer loop }{{} inner loop Block NL- beats plain NL- in terms of I/O cost by far. To return to our running Example: Assume, as before, r 1 = 1000 and r 2 = 500, on current hardware, a single I/O operation takes about 10 msec (see Section 2.1.1), and assume B = 100. Resulting processing time for the block NL- of r 1 and r 2 : 1000 ( ) 10 msec = msec = 65 secs (... as opposed to 83 mins before!) 229

42 Which relation is outer? 230

43 8.5.3 Index nested loops join Whenever there is an index on (at least) one of the join relations that matches the join predicate, we can take advantage by making the indexed relation the inner relation of the join algorithm. We do not need to compare the tuples of the outer relation with those of the inner, but rather use the index to retrieve the matches efficiently. Algorithm: (p, r 1, r 2, r out) Input: predicate p, input files r 1,2, index on r 2 Output: output file r out written (side-effect) out createfile(r out); in 1 openscan(r 1); while (r nextrecord(in 1)) EOF do use index on r 2 to find all matches for r appending them to output out; closefile(out); Index nested loops avoids enumeration of the cross-product. 231

44 Cost of index nested loops depends on the available index. p (r 1, r 2 ) input access file scan (openscan) of r 1 index access to r 2 prerequisites index on r 2 matching join predicate p 9 I/O cost r 1 + r }{{} 1 (cost of 1 index access to r 2 ) }{{} outer loop inner loop This algorithm is especially useful, if the index is a clustered index, furthermore, even with unclustered indexes and few matches per outer tuples, index nested loops outperforms simple nested loops. 9 Ignoring the cost to write the result file r out. 232

45 8.5.4 Sort-merge join In a situation like the one depicted below, sort-merge join might be an attractive alternative to block NL-: r 1 s A=B r s 2 1 Both join inputs are sorted (annotation s on the incoming edges), and 2 the join predicate (here: A = B) is an equality predicate. Note that this effectively matches the situation just before the merge step of the two-way merge sort algorithm (see Chapter 7): simply consider join inputs r 1 and r 2 as runs that have to be merged. The merge phase has to be slightly adapted to ensure correct results are produced in a situation like this (with duplicates on both sides): 0 1 A C 1 "foo" 2 "foo" B 2 "bar" 2 "baz" A 4 "foo" R.A=S.B B D 1 true 2 false C 2 true A 3 false 233

46 Notes on the algorithm shown below: The code assumes that any comparison with EOF (besides itself) fails. Function tell(f ) yields the current file pointer of file f. The companion function seek(f, l) moves f s file pointer to position l. Unix: see man ftell and man fseek. Algorithm: (p, r 1, r 2, r out ) Input: equality predicate p (r 1.A = r 2.B), input files r 1,2 Output: output file r out written (side-effect) out createfile(r out ); in 1 openscan(r 1 ); in 2 openscan(r 2 ); r nextrecord(in 1 ); r nextrecord(in 2 ); // continued on next slide

47 //... continued from previous slide; while r EOF r EOF do while r.a < r.b do r nextrecord(in 1 ); while r.a > r.b do r nextrecord(in 2 ); l tell(in 2 ); while r.a = r.b do // repeat the scan of r 2 (implements the from previous slide) seek(in 2, l); r getrecord(in 2 ); // while we find matching records in r 2... while r.a = r.b do appendrecord(out, r, r ); r nextrecord(in 2 ); r nextrecord(in 1 ); r r ; closefile(out); 235

48 Summary and analysis of sort-merge join: p (r 1, r 2) input access sorted file scan of both r 1,2 prerequisites p equality predicate r 1.A = r 2.B, r 1 sorted on A, r 2 sorted on B I/O cost best case: If... worst case: If... I/O performance figures. Example: Just like before r 1 = 1000 and r 2 = 500, on current hardware, a single I/O operation takes about 10 msec. Resulting processing time for the sort-merge join of r 1 and r 2 : best case: worst case: ( ) 10 msec = msec = 15 sec ( ) 10 msec = msec 83 mins 236

49 Final remarks on sort-merge join: If either (or both) of R, S are not available in sorted order according to the join attribute(s), we can obtain the sort order by introducing an explicit sort step into the execution plan before the join operator. If we need to do explicit sorting before the join, we can combine the last merge phase of the (merge) sorting with the join (at the expense of slightly higher memory requirements). 237

50 8.5.5 Hash joins Hash join algorithms (there are quite a few!) follow a simple idea of partitioning: Instead of one big join compute many small joins: use the same hash function h to split r 1 and r 2 into k partitions, join each of the k pairs of partitions of r 1,2 separately. Due to hash partitioning, join partners from r 1 and r 2 can only be found in matching partitions i (hash joins only work for equi-joins!) Since the k small joins are independent of each other, this provides good parallelization potentials! The principal idea behind hash joins is the algorithmic divide-and-conquer paradigm. 238

51 Conceptually, a hash join is devided into a partitioning phase (or building phase) and a probing phase (or matching phase). The building phase scans each input relation in turn, filling k buckets. The probing phase scans each of the k buckets once, and computes a small join (hopefully in memory), e.g., using another hash function h 2. Partitions of R and S hash function h2 Join Result h2 Hash table for partition Ri (k < B-1 pages) Input buffer (To scan Si) Output buffer Disk B main memory buffers Disk 239

52 Algorithm: (p, r 1, r 2, r out ) Input: equality-predicate p, input files r 1,2 Output: output file r out written (side-effect) // building phase: in 1 openscan(r 1 ); while (r nextrecord(in 1 )) EOF do add r to buffer page h(r) // flushing buffer pages as they fill closefile(r 1 ); in 2 openscan(r 2 ); while (s nextrecord(in 2 )) EOF do add s to buffer page h(s) // flushing buffer pages as they fill closefile(r 2 ); // continued on next slide

53 //... continued from previous slide // probing phase: out createfile(r out ); for l = 1,..., k do // build in-memory hash table for r l 1, using h 2 for each tuple r in r l 1 do read r and insert it into hash table position h 2 (r); // scan r l 2 and probe for matching r l 1 tuples for each tuple s in r l 2 do read s and probe hash table using h 2 (s); for matching r 1 tuples r, appendrecord(out, r, s ); clear hash table for next partition; closefile(out); 241

54 Cost of this hash join Ignoring memory bottlenecks, this ( Grace Hash Join ) algorithm reads each page of r 1,2 exactly once in the building phase and writes about the same amount of pages out for the partitions. The probing phase reads each partition once. p (r 1, r 2 ) input access file scan (openscan) of r 1,2 prerequisites equi-join, r 1,2 may be heap files I/O cost r 1 + r 2 + r }{{} 1 + r 2 }{{} read write } {{ } building phase + r 1 + r 2 }{{} probing phase = 3 ( r 1 + r 2 ) Ignoring the cost to write the result file r out. 242

55 I/O performance figures. Example: Just like before r 1 = 1000 and r 2 = 500, on current hardware, a single I/O operation takes about 10 msec. Resulting processing time for the hash join of r 1 and r 2 : 3 ( ) 10 msec = msec = 45 sec More elaborate hash join algorithms deal, e.g., with the case that partitions do not fit into memory during the probing phase. 243

56 Memory Requirements for Grace Hash Join We have to try to fit each hash partition into memory for the probing phase. Hence, to minimize partition size, we have to maximize the number of partitions. While partitioning, we need 1 buffer page per partition and 1 input buffer. With B buffers, we can thus generate B 1 partitions. This gives partitions of size R B 1 (for equal distribution). The size of an (in-memory) hash table for the probing phase needs to be f R B 1, for some fudge factor f a little large than 1. During the probing phase, we need to keep one such in-memory hash table, one input buffer plus one output buffer in memory, which results in B > f R B In summary, we thus need approximately B > f R pages of buffer space for the Grace Hash Join to perform well. If one or more partitions do not fit into main memory during the probing phase, this degrades performance significantly. 244

57 Utilizing Extra Memory Suppose we are partitioning R (and S) into k partitions where B > f R k, i.e. we can build an in-memory hash table for each partition. The partitioning phase needs k + 1 buffers, which leaves us with some extra buffer space of B (k + 1) pages. If this extra space is large enough to hold one partition, i.e., B (k +1) f R k, we can collect the entire first partition of R in memory during the partitioning phase and need not write it to disk. Similarly, during the partitioning of S, we can avoid storing its first partition on disk and rather immediately probe the tuples in S s first partition against the in-memory first partition of R and write out results. At the end of the partitioning phase for S, we are already done with joining the first partitions. The savings obtained result from not having to write out and read back in the first partitions of R and S. This version of hash join is called Hybrid Hash Join. 245

58 8.5.6 Semijoins Origin: Distributed DBMSs. (here: transport cost dominates I/O-cost) Remember: Semijoin R S := π R (R S) R Idea: to compute the distributed join between two relations R, S stored on different nodes N R, N S (assuming we want the result on N R ; let the common attributes be J): 1 Compute π J (R) on N R. 2 Send the result to N S. 3 Compute π J (R) S on N S. 4 Send the result to N R. 5 Compute R (π J (R) S) on N R. N.B. Step 3 computes the semijoin between S and R. This algorithm is preferable over sending all of S to N R, if (C tr denotes transport cost, depending on size of transfered data): C tr (π J (R)) + C tr (S R) < C tr (S). 246

59 Example: Semijoin Let relations R and S be given as R A B S B C D This yields π B (R) B S R B C D Cost of Semijoin: C tr = = 15 whereas sending all of S has C tr =

60 8.5.7 Summary of join algorithms No single join algorithm performs best under all circumstances. Choice of algorithm affected by sizes of relations joined, size of available buffer space, availability of indexes, form of join condition, selectivity of join predicate, available physical properties of inputs (e.g., sort orders), desirable physical properties of output (e.g., sort orders),... Performance differences between good and bad algorithm for any given join can be enormous. Join algorithms have been subject to intensive research efforts, particularly also in the context of parallel DBMSs. 248

61 8.6 A Three-Way Join Operator Within the INGRES project at UC Berkeley, a three-way join operator has been developed. Observations: Suppose we want to compute the join R A S B T, where A is an attribute common to R and S, B is common to S and T. This is an instance of a (three-way) star join with S as the center relation. Using only traditional (two-way) join algorithms, choices will include left-deep NL--plans (with or without index) iterating over, say, S as outer, using either of R or T as first inner and the other of two as second inner relation. When thinking of simple NL--algorithms, this means that for each combination of matching SR- (or ST -) tuple, we have to iterate over all of T (or S), resulting in a complexity on the order of O(n m k), for n, m, k the size of the involved relations (either in terms of number of tuples or number of pages). This roughly corresponds to three levels of nested loops. 249

62 Disadvantage: This three-way join algorithm makes optimization even more complex, since a sequence of two binary (logical) operators needs to be mapped to a single ternary (physical) operator. 250 The INGRES Three-Way Join Algorithm Idea: Scan the center relation, S in our example. For each tuple s S do: Find all matching R-tuples r and collect them in a temporary space S (e.g., using a nested loop or an index). Find all matching T -tuples t and collect them in a temporary space T (e.g., using a nested loop or an index). Append to the output the product (i.e., all combinations) of the one s tuple with the r and t tuples from the two temporary spaces R and S. N.B.: this corresponds to only two levels of nested loops, one outer loop (over S), with two loops inside, but one after the other, hence a complexity of only O(n (m + k)).

63 8.7 Other Operators Set Operations Intersection and Cross Product... are implemented as special joins : for intersection, use equality on all attributes as join condition; for the product, use true ; hence, there is no need to further consider those. With Union and Difference,... the challenge lies in duplicate identification. based on sorting and one based on hashing. There are two approaches, one Work out the details on your own

64 8.7.2 Aggregates The language SQL supports a number of aggregation operators (such as, sum, avg, count, min, max). Basic algorithm: scan the whole relation and maintain some running information during that scan. Compute the aggregate value from the running information upon completion of the scan: Aggregate sum avg count min max Running Information Total of values read Total, Count of values read Count of values read Smallest value read Largest value read Grouping: if aggregation is combined with grouping, we first have to do the grouping, using hashing or sorting (or an appropriate index). Then, use the running information on a per-group basis. Index-only: sometimes, aggregate values can be computed without accessing the data records at all, by just using an available index

65 8.8 The impact of buffering Effective use of the buffer pool is crucial for efficient implementations of a relational query engine. Several operators use the size of available buffer space as a parameter. Keep the following in mind: 1 When several operators execute concurrently, they share the buffer pool. 2 Using an unclustered index for accessing records makes finding a page in the buffer rahter unlikely and dependent on (rather unpredictably!) the size of the buffer. 3 Furthermore, each page access is likely to refer to a new page, therefore, the buffer pool fills quickly and we obtain a high level of I/O activity. 4 If an operation has a repeated pattern of page accesses, a clever replacement policy and/or sufficient number of buffers can speed up the operation significantly. Examples of such patterns are: 253

66 Simple nested loops join: for each outer tuple, scan all pages of the inner relation. If there is enough buffer space to hold entire inner relation, the replacement policy is irrelevant. Otherwise it is critical: LRU will never find a needed page in the buffer ( Sequential Flooding problem, see Section 2.3) MRU gives best buffer utilization, the first B 2 pages of the inner will always stay in the buffer. Nested block join: for each block of the outer, scan all pages of the inner relation. Since only one unpinned page is available for the scan of the inner, the replacement policy makes no difference. Index nested loop join: for each tuple in the outer, use the index to find matching tuples in the inner relation. For duplicate values in the join attributes of the outer relation, we obtain repeated access patterns for the inner tuples and the index. The effect can be maximized by sorting the outer tuples on the join attributes. 254

67 8.9 Managing long pipelines of relational operators Note that any relational operator that we have been discussing takes a parameter r out, i.e., a file (name) to be written to hold the operator s output. In some sense, we are using secondary storage as a one-way communication channel between operators in a plan. Consequences of this approach: 1 We pay for the (substantial) I/O effort to feed into and read from this communication channel. 2 The operators in a plan are executed in sequence, the first result record is produced not before the last relational operator in the pipeline executes: r 1 r 2 p tmp 1 l tmp 2 q tmp 3... tmp n k N.B.: No more than three temporary files tmp i need to exist at any point in time during execution. 255

68 Architecting the query processor in this fashion bears much resemblance with using the Unix shell like this: 1 # report all large MP3 audio files 2 #... below the current working directory 3 $ find. -size +1MB > tmp1 4 $ xargs file < tmp1 > tmp2 5 $ grep -i MP3 < tmp2 > tmp3 6 $ cut -d: -f1 < tmp3 7 output tmp[0-9] 8 $ rm Unix supports another type of communication channel, the pipe, which lets the participating commands exchange data character-by-character: 1 # report all large MP3 audio files 2 #... below the current working directory 3 $ find. -size +1MB xargs file grep 4 output -i MP3 cut -d: -f1 256

69 The execution of the pipe is driven by the rightmost command: 1 To produce a line of output, cut only needs to see the next line in its input: grep is requested to produce this input. 2 To produce this line of output, grep only needs to see the next line in its input: xargs is requested to produce this input As soon as find has produce a line of output, it is passed through the pipe, transformed by xargs, grep, and cut and then echoed to the terminal. In the database world, this mode of executing a pipepline (a query plan) is called streaming: A streaming query processor avoids to write temporary files (the tmp i ) whenever possible, operators communicate their output record-by-record (or block-by-block), a result records appears as soon as it is available (as opposed to when the complete result has been computed 11 ). 11 This is of major importance in interactive DBMS environments (ad-hoc query interfaces). 257

70 Example: 1 $ grep foo 2 XML 3 foobar 4 foobar 5 What does foo mean anyway? 6 What does foo mean anyway? 7 Enough already 8 ^D 9 $ Note, however, that we have to modify the implementations of our relational operators to support streaming. Currently, all operators consume their input as a whole, then write their output file as a whole, and only then return control to the query processor. 258

71 8.9.1 Streaming Interface To support streaming we need a record-by-record calling convention. New operator interface (let denote a relational operator):.reset() Operator is requested to reset so that a call to.next() will produce the first result record..next() The operator is requested to produce the next record of its result. Returns EOF if all result records have been requested already. 259

72 Example (implementation of p (r in )): Algorithm: Input: in.reset(); (p, in).reset() predicate p, in-bound stream in Algorithm: Input: Output:.(p, in).next() predicate p, in-bound stream in next record of selection result (or EOF ) while (r in.next()) EOF do if p(r) then // immediately return if next result record found return r; return EOF ; 260

73 Given a query plan like the one shown below, query evaluation is driven by the query processor like this (just like in the Unix shell): 1 The whole plan is initially reseted by calling reset() on the root operator, i.e., q.reset(). 2 The reset() call is forwarded through the plan by the operators themselves (see.reset() on previous slide). 3 Control returns to the query processor. 4 The root is requested to produce its next result record, i.e., the call q.next() is made. 5 Operators forward the next() request as needed. As soon as the next result record is produced, control returns to the query processor again. r 1 r 2 scan p l q scan 261

74 In short, the query processor uses the following routine to evaluate a query plan: Algorithm: Input: Output: eval (q) root operator of query plan q query result sent to terminal q.reset(); while (r q.next()) EOF do print(r); print("done."); 262

75 A streaming scan operator. Complete the implementation below to provide a streaming file scan operator: Algorithm: scan(f ).reset() Input: filename f... Algorithm: Input: Output:... scan(f ).next() filename f next record in file f or EOF 263

76 A streaming NL- operator. Complete the implementation below to provide a streaming NL- operator (see 8.5.1): Algorithm: (p, in 1, in 2).reset() Input: predicate p, in-bound streams in 1,2... Algorithm: (p, in 1, in 2).next() Input: predicate p, in-bound streams in 1,2 Output: next record in join result or EOF

77 Below is a code snippet used in a real DBMS product. The overall structure of this code almost perfectly matches the recent discussion: 1 /* efltr -- apply filter predicate pred to stream 3 Filter the in-bound stream, only stream elements that fulfill e->pred 4 contribute to the result. No index support whatsoever. 5 */ 6 erc eop FLTR(eOp *ip) 7 { 8 eobj FLTR *e = (eobj FLTR *)eobj(ip); 9 10 /* Challenge the in-bound stream until it is exhausted... */ 11 while (eintp(e->in)!= eeos) { 12 eintp(e->pred); 13 /*... or a stream element fulfills predicate e->pred */ 14 if (et as bool(eval(e->pred))) { 15 eval(ip) = eval(e->in); 16 return eok; 17 } 18 } 19 return eeos; 20 } erc eop FLTR RST(eOp *ip) 23 { 24 eobj FLTR *e = (eobj FLTR *)eobj(ip); ereset(e->in); 27 ereset(e->pred); return eok; 30 } 265

78 8.9.2 Demand-Driven vs. Data-Driven Streaming The iterator interface as shown above implements a demand-driven query processing infrastructure: consumers (later operators) request more input (by calling next()) from their producers (earlier operators) whenever they are ready to process the input. Demand-driven streaming minimizes ressource requirements and wasted effort in case a user/client does not want to see the whole result. In contrast, data-driven streaming requires more ressources, uses a different query processing infrastructure, and can exploit more parallelism. Each operator starts (asynchronously) to work on its input as soon and as fast as possible. Output is enqueued into a pipeline to the consumers as it occurs. The pipelines need to do buffering and/or to suspend producers. An operator only needs to wait, if there is no more input yet, or if the outputpipeline is full. 266

79 Bibliography Graefe, G. (1993). Query evaluation techniques for large databases. ACM Computing Surveys, 25(2): Kemper, A., Moerkotte, G., Peithner, K., and Steinbrunn, M. (1994). Optimizing disjunctive queries with expensive predicates. In Snodgrass, R. T. and Winslett, M., editors, Proc. ACM SIGMOD Conference on Management of Data, pages , Minneapolis, MS. ACM Press. Ramakrishnan, R. and Gehrke, J. (2003). Database Management Systems. McGraw-Hill, New York, 3 edition. Steinbrunn, M., Peithner, K., Moerkotte, G., and Kemper, A. (1995). Bypassing joins in disjunctive queries. In Dayal, U., Gray, P., and Nishio, S., editors, Proc. Intl. Conf. on Very Large Databases, pages , Zurich, Switzerland. Morgan Kaufmann. Wong, E. and Youssefi, K. (1976). Decompostion A strategy for query processing. ACM Transactions on Database Systems, 1(3):

Chapter 8. Implementing the Relational Algebra. Architecture and Implementation of Database Systems Winter 2010/11

Chapter 8. Implementing the Relational Algebra. Architecture and Implementation of Database Systems Winter 2010/11 Chapter 8 Implementing the Relational Algebra Architecture and Implementation of Database Systems Winter 2010/11 Block Index Wilhelm-Schickard-Institut für Informatik Universität Tübingen 8.1 In many ways,

More information

Chapter 8. Implementing the Relational Algebra. Architecture and Implementation of Database Systems Winter 2008/09

Chapter 8. Implementing the Relational Algebra. Architecture and Implementation of Database Systems Winter 2008/09 Chapter 8 Implementing the Relational Algebra Architecture and Implementation of Database Systems Winter 2008/09 Block Index Wilhelm-Schickard-Institut für Informatik Universität Tübingen 8.1 In many ways,

More information

Chapter 8. Implementing the Relational Algebra. Architecture and Implementation of Database Systems Summer 2014

Chapter 8. Implementing the Relational Algebra. Architecture and Implementation of Database Systems Summer 2014 Chapter 8 Implementing the Relational Algebra Architecture and Implementation of Database Systems Summer 2014 Block Index Wilhelm-Schickard-Institut für Informatik Universität Tübingen 1 In many ways,

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Architecture and Implementation of Database Systems Query Processing

Architecture and Implementation of Database Systems Query Processing Architecture and Implementation of Database Systems Query Processing Ralf Möller Hamburg University of Technology Acknowledgements The course is partially based on http://www.systems.ethz.ch/ education/past-courses/hs08/archdbms

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

External Sorting Sorting Tables Larger Than Main Memory

External Sorting Sorting Tables Larger Than Main Memory External External Tables Larger Than Main Memory B + -trees for 7.1 External Challenges lurking behind a SQL query aggregation SELECT C.CUST_ID, C.NAME, SUM (O.TOTAL) AS REVENUE FROM CUSTOMERS AS C, ORDERS

More information

15-415/615 Faloutsos 1

15-415/615 Faloutsos 1 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415/615 Faloutsos 1 Outline introduction selection

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Implementation of Relational Operations: Other Operations

Implementation of Relational Operations: Other Operations Implementation of Relational Operations: Other Operations Module 4, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Simple Selections SELECT * FROM Reserves R WHERE R.rname < C% Of the form σ

More information

Architecture and Implementation of Database Systems (Summer 2018)

Architecture and Implementation of Database Systems (Summer 2018) Jens Teubner Architecture & Implementation of DBMS Summer 2018 1 Architecture and Implementation of Database Systems (Summer 2018) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2018 Jens

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques Chapter 12, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections v Cost depends on #qualifying

More information

Module 5: Hash-Based Indexing

Module 5: Hash-Based Indexing Module 5: Hash-Based Indexing Module Outline 5.1 General Remarks on Hashing 5. Static Hashing 5.3 Extendible Hashing 5.4 Linear Hashing Web Forms Transaction Manager Lock Manager Plan Executor Operator

More information

Module 9: Selectivity Estimation

Module 9: Selectivity Estimation Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock

More information

Module 9: Query Optimization

Module 9: Query Optimization Module 9: Query Optimization Module Outline Web Forms Applications SQL Interface 9.1 Outline of Query Optimization 9.2 Motivating Example 9.3 Equivalences in the relational algebra 9.4 Heuristic optimization

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data

More information

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Cost-based Query Sub-System Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer C. Faloutsos A. Pavlo

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework

More information

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Implementing Relational Operators: Selection, Projection, Join Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Readings [RG] Sec. 14.1-14.4 Database Management Systems, R. Ramakrishnan and

More information

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer

More information

Module 4: Tree-Structured Indexing

Module 4: Tree-Structured Indexing Module 4: Tree-Structured Indexing Module Outline 4.1 B + trees 4.2 Structure of B + trees 4.3 Operations on B + trees 4.4 Extensions 4.5 Generalized Access Path 4.6 ORACLE Clusters Web Forms Transaction

More information

RELATIONAL OPERATORS #1

RELATIONAL OPERATORS #1 RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query

More information

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations. Relational Operations Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )

More information

EXTERNAL SORTING. Sorting

EXTERNAL SORTING. Sorting EXTERNAL SORTING 1 Sorting A classic problem in computer science! Data requested in sorted order (sorted output) e.g., find students in increasing grade point average (gpa) order SELECT A, B, C FROM R

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

PS2 out today. Lab 2 out today. Lab 1 due today - how was it? 6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your

More information

TotalCost = 3 (1, , 000) = 6, 000

TotalCost = 3 (1, , 000) = 6, 000 156 Chapter 12 HASH JOIN: Now both relations are the same size, so we can treat either one as the smaller relation. With 15 buffer pages the first scan of S splits it into 14 buckets, each containing about

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying

More information

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Administriva Lab 2 Final version due next Wednesday CS 133: Databases Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Problem sets PSet 5 due today No PSet out this week optional practice

More information

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation. Query Processing QUERY PROCESSING refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis 1 Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis 1 General approach to the implementation of Query Processing and Query Optimisation functionalities in DBMSs 1. Parse

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

University of Waterloo Midterm Examination Sample Solution

University of Waterloo Midterm Examination Sample Solution 1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,

More information

Query Processing: The Basics. External Sorting

Query Processing: The Basics. External Sorting Query Processing: The Basics Chapter 10 1 External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot use traditional

More information

Overview of Query Evaluation. Chapter 12

Overview of Query Evaluation. Chapter 12 Overview of Query Evaluation Chapter 12 1 Outline Query Optimization Overview Algorithm for Relational Operations 2 Overview of Query Evaluation DBMS keeps descriptive data in system catalogs. SQL queries

More information

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007

Relational Query Optimization. Overview of Query Evaluation. SQL Refresher. Yanlei Diao UMass Amherst October 23 & 25, 2007 Relational Query Optimization Yanlei Diao UMass Amherst October 23 & 25, 2007 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan:

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Query processing and optimization

Query processing and optimization Query processing and optimization These slides are a modified version of the slides of the book Database System Concepts (Chapter 13 and 14), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.

More information

ECS 165B: Database System Implementa6on Lecture 7

ECS 165B: Database System Implementa6on Lecture 7 ECS 165B: Database System Implementa6on Lecture 7 UC Davis April 12, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: Dynamic aspects of

More information

QUERY EXECUTION: How to Implement Relational Operations?

QUERY EXECUTION: How to Implement Relational Operations? QUERY EXECUTION: How to Implement Relational Operations? 1 Introduction We ve covered the basic underlying storage, buffering, indexing and sorting technology Now we can move on to query processing Relational

More information

3.1.1 Cost model Search with equality test (A = const) Scan

3.1.1 Cost model Search with equality test (A = const) Scan Module 3: File Organizations and Indexes A heap file provides just enough structure to maintain a collection of records (of a table). The heap file supports sequential scans (openscan) over the collection,

More information

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Midterm Review CS634 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke Coverage Text, chapters 8 through 15 (hw1 hw4) PKs, FKs, E-R to Relational: Text, Sec. 3.2-3.5, to pg.

More information

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week.

Database Systems. Announcement. December 13/14, 2006 Lecture #10. Assignment #4 is due next week. Database Systems ( 料 ) December 13/14, 2006 Lecture #10 1 Announcement Assignment #4 is due next week. 2 1 Overview of Query Evaluation Chapter 12 3 Outline Query evaluation (Overview) Relational Operator

More information

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

4 Hash-Based Indexing

4 Hash-Based Indexing 4 Hash-Based Indexing We now turn to a different family of index structures: hash indexes. Hash indexes are unbeatable when it comes to equality selections, e.g. SELECT FROM WHERE R A = k. If we carefully

More information

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Query Processing. Introduction to Databases CompSci 316 Fall 2017 Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Datenbanksysteme II: Caching and File Structures. Ulf Leser Datenbanksysteme II: Caching and File Structures Ulf Leser Content of this Lecture Caching Overview Accessing data Cache replacement strategies Prefetching File structure Index Files Ulf Leser: Implementation

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 6 Lifecycle of a Query Plan 1 Announcements HW1 is due Thursday Projects proposals are due on Wednesday Office hour canceled

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

DBMS Query evaluation

DBMS Query evaluation Data Management for Data Science DBMS Maurizio Lenzerini, Riccardo Rosati Corso di laurea magistrale in Data Science Sapienza Università di Roma Academic Year 2016/2017 http://www.dis.uniroma1.it/~rosati/dmds/

More information

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1 CMPUT 391 Database Management Systems Query Processing: The Basics Textbook: Chapter 10 (first edition: Chapter 13) Based on slides by Lewis, Bernstein and Kifer University of Alberta 1 External Sorting

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000 192 Chapter 14 5. SORT-MERGE: With 52 buffer pages we have B> M so we can use the mergeon-the-fly refinement which costs 3 (M + N). TotalCost=3 (1, 000 + 1, 000) = 6, 000 HASH JOIN: Now both relations

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

University of Waterloo Midterm Examination Solution

University of Waterloo Midterm Examination Solution University of Waterloo Midterm Examination Solution Winter, 2011 1. (6 total marks) The diagram below shows an extensible hash table with four hash buckets. Each number x in the buckets represents an entry

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is

More information

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query

Goals for Today. CS 133: Databases. Relational Model. Multi-Relation Queries. Reason about the conceptual evaluation of an SQL query Goals for Today CS 133: Databases Fall 2018 Lec 02 09/06 Relational Model & Memory and Buffer Manager Prof. Beth Trushkowsky Reason about the conceptual evaluation of an SQL query Understand the storage

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VIII Lecture 16, March 19, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part VII Algorithms for Relational Operations (Cont d) Today s Session:

More information