Stop-and-Resume Style Execution for Long Running Decision Support Queries

Size: px
Start display at page:

Download "Stop-and-Resume Style Execution for Long Running Decision Support Queries"

Transcription

1 Stop-and-Resume Style Execution for Long Running Decision Support Queries Surajit Chaudhuri Raghav Kaushik Abhijit Pol Ravi Ramamurthy Microsoft Research Microsoft Research University of Florida Microsoft Research ABSTRACT Long running decision support queries can be resource intensive and often lead to resource contention in data warehousing systems. Today, the only real option available to the DBAs is to carefully select one or more queries and terminate them. However, the work done by such terminated queries is entirely lost even if they were very close to completion and these queries will need to be run in their entirety at a later time. In this paper, we show how instead we can support a Stop-and-Resume style query execution that can leverage partially the work done in the initial query execution. In order to re-execute only the remaining work of the query, a Stop-and-Resume execution would need to save all the previous work. But this technique would clearly incur high overheads which is undesirable. In contrast, we present a technique that can be used to save information selectively from the past execution so that the overhead can be bounded. Despite saving only limited information, our technique is able to reduce the running time of the resumed queries substantially. We show the effectiveness of our approach using TPCH queries.. INTRODUCTION Decision support queries can be long running. For example, recent TPC-H [7] benchmark results show that these queries might take even hours to execute on large datasets. When multiple long running queries are executed concurrently, they compete for limited resources including CPU, main memory, and disk space, especially the workspace area used to store temporary results, sort runs and spilled hash partitions. Contention for valuable resources can substantially increase the execution times of the queries. Today, the only real option available to the DBAs is to carefully select one or more queries (based on criteria such as the importance of the query or the amount of resources used by it or progress information) and terminate them. Terminating one or more queries releases all resources CPU, memory, and disk space allocated to them, and can be effectively used to complete other running queries. In today s database systems, the work done by such terminated queries is entirely lost even if they were very close to completion and these queries will need to be run in their entirety at a later Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Database Endowment. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires a fee and/or special permissions from the publisher, ACM. VLDB 07, September 3-8, 007, Vienna, Austria. Copyright 007 VLDB Endowment, ACM /07/09. time. In this paper we show how instead we can support a Stopand-Resume style query execution that can leverage partially the work done in the initial query execution to optimize the execution time if we were to resume its execution. One way to implement such a Stop-and-Resume style query execution is to suspend the execution threads of one or more lowpriority queries and resume them at a later time whenever appropriate. However, the main problem with this approach is that suspending the execution of a query only releases the CPU resources, the memory and disk resources are still retained until the query execution thread is resumed. In fact, any attempt to reuse all intermediate results without recomputation would require us to keep the state corresponding to those tuples and thus requiring in worst-case very large memory and/or disk resources (for instance, hash tables in memory, sort runs in disk etc) which could amount to large overheads. Therefore, in this paper we investigate Stop-and-Resume style query execution that is constrained to save and reuse only a fixed number of records (intermediate records or output records) and thus releasing all resources. We formulate a bounded query checkpointing problem. Such a formulation is attractive since it limits the resources retained by a query that has been terminated. The central issue then is to identify the set of records (whose cardinality does not exceed the prescribed bound) to save and reuse such that the work done during the query restart is minimized. The key challenge is to identify such a set of records irrespective of when the query is terminated. Given that there is a tradeoff between the amount of state saved and the efficiency of the query restart, there will naturally be cases where most of the work has to be redone if the bound the small. However, our experiments, based on a prototype built by modifying Microsoft SQL Server 005, indicate that there are many cases where bounded query checkpointing can be effective even when only a small amount of state is saved and reused. Of course, our algorithm yields increasing benefits when more state is allowed. Our proposal for Stop-and-Resume style execution is targeted at data-warehousing environments geared towards decision support. We assume that the database is read only except for a batched update window when no queries are executed. Finally, although saving and reusing intermediate results is reminiscent of the techniques used for dynamic query optimization [][6][5], the constraint of saving and reusing only a bounded amount of intermediate results is unique to our problem (see Section 7 for a more detailed comparison). The outline of the paper is as follows. In Section we review some preliminary concepts and also present desirable properties of Stop-and-Resume style query execution. Section 3 discusses Stopand-Resume execution for query plans consisting of a single pipeline. In this section, we introduce how the Stop-and-Resume

2 query execution can be supported in the framework of query processing engine of relational database systems. Based on a model of work for query execution, we also present an algorithm to optimally compute the set of intermediate results to save and reuse for the case of single pipeline execution plans. In Section 4, we extend our algorithms to work for more complex execution plans. We present an experimental evaluation of our prototype system in Section 5 and present discussions on important extensions in Section 6. Section 7 presents related work.. PRELIMINARIES. Definitions Stop-and-Resume style query execution involves two distinct phases, the initial run and the restarted run. The main approach we take in this paper focuses on saving intermediate results generated during the initial run (first query execution until it is terminated) and avoid recomputing the saved results during the restart run (re-execution of the same query at a later time). The Stop-and-Resume style query execution is described around query execution plans. A query execution plan is a tree where nodes of the tree are physical operators. Each operator exposes a GetNext interface and query execution proceeds in a demand driven fashion [7]. In this paper, for simplicity we do not consider parallel query execution plans. An operator is a blocking operator if it produces no output until it consumes at least one of its inputs completely. Consider an example of a Hash join between two tables (A and B). The Hash join is an example of blocking operator. The probe phase cannot begin until the entire build relation is hashed. A pipeline can be thought of as a maximal subtree of operators in an execution plan that execute concurrently. The examples in Figure illustrate plans that execute in a single pipeline. Every pipeline has one or more source nodes, the operator that is the source of the records operated upon by remaining nodes in the pipeline. The Table Scan A in Figure (a) and Index Scan A in Figure (b) are examples of source nodes. We discuss execution plans consisting of multiple pipelines in Section 4.. Desirable Properties of a Stop-and- Resume Style Query Execution In this section we discuss desirable properties of a Stop-and- Resume style query execution. Correctness: The query when restarted must produce exactly the same set of output records that would have been produced by the traditional query execution engine without checkpointing. Low Overhead: An ideal Stop-and-Resume style query execution should incur very low overheads during the first execution. If we don t terminate the query, the performance should be comparable to normal query execution. This is an important design point for viability. In addition, query termination must proceed instantaneously; this sharply constrains the amount of state that we can save. Generality: A Stop-and-Resume framework should be applicable irrespective of when the query is terminated and also work for a wide range of query execution plans. Efficiency of Restart: The sum of the execution time before the query is terminated and the execution time after it is restarted should be as close as possible to the execution time of uninterrupted query execution. The key metric is how much of the work done during the initial run can be saved during the restart run..3 Modeling Work during Query Execution As hinted in the introduction, our approach is to identify the best set of intermediate records to save and reuse in order to have an efficient restart. In order to reason about the best set we require a way to measure the amount of work done during query execution. One can naturally consider using the optimizer cost model for this purpose; however we are looking at a more lightweight alternative. In this paper we use the total number of GetNext calls measured over all the operators to model the work done during query execution. Recent work on progress estimation for queries also employ similar schemes to model the work done during query execution [3][3][4][4]. It is shown in [3] that the GetNext model has good correlation with execution times. We also present a validation of the GetNext model in Section 5., where we find that the amount of savings as measured by the GetNext model is almost the same as the savings in execution times. 3. SINGLE PIPELINE QUERY PLANS In this section, we begin our discussion of Stop-and-Resume query execution starting with the class of queries that consist of a single execution pipeline. We extend our solution to query execution plans consisting of multiple pipelines in Section 4. We focus our discussion on pipelines that consist of a single source node and where the results of the pipeline are obtained by invoking the operator tree on each source tuple and taking their union in order. For such a pipeline P, the result of invoking the operator tree on tuple t from the source is denoted as P(t). If the set of tuples returned by the source node is S, then the result of executing the pipeline is Ǔ t є S P(t) where Ǔ denotes an in-order union. We denote source tuples using the symbols t, t, and tuples produced at the root of P using the symbols r,r, We note that there are pipelines (having operators such as Top, Merge-Join) that do not fall in this class. Our techniques are applicable for such cases, but for simplicity we restrict ourselves to pipelines as defined above. Filter Table Scan A (a) Index Scan A Index Nested Loops Join Figure. Examples of a single pipeline query execution plans. (b) Index Seek B

3 3. A Framework for Stop-and-Resume Execution The setting for Stop-and-Resume execution is as follows. During the initial run of the query, the query can be killed at any point. The system is allowed to keep track of some state during the initial run. When the query is killed, this state is saved coupled with a modified execution plan that utilizes this state. During the restart run, this modified plan is executed. In general, we are allowed to save not only intermediate results (output of an operator) but also the internal state of an operator (such as hash tables and sort runs). In this paper, we focus on saving only intermediate results. We discuss the internal state of operators in Section 6. Consider a simple stop-resume technique that saves all intermediate results generated during the initial run. We save records at the root of the pipeline as and when they are produced. For example, for the plan shown in Figure (a), we save the all records returned by the Filter operator. When the query is terminated, these results are persisted. During the restart run, we can avoid re-computing these saved results by remembering the position of the record of Table A that was accessed last by the Scan node before termination. The query plan used for the restart run would first read the saved results, skip to the above position in the Scan node and continue scanning Table A from that position. We refer to this strategy as the SAVE- ALL strategy. In order to implement the SAVE-ALL strategy, we need to modify the database server with the following mechanisms: saving records at the root node of a pipeline, needed during the initial run, and skipping a set of records in the source node of a pipeline, needed during the restart run. We discuss these mechanisms in detail in Section 3.. Though simple and correct, the SAVE-ALL scheme has the following drawback. Since the query can be terminated at any point during execution, the number of intermediate results generated can be arbitrarily large. Consider the execution plan in Figure (a). If the selectivity of the filter predicate is close to, then the filter operator would return almost all the rows in Table A, thus resulting in a large set of intermediate results. In general, the amount of state retained by the SAVE-ALL strategy is unbounded. Thus, the overhead involved in persisting this state when the query is killed is unbounded, which is clearly undesirable. This limitation suggests that we restrict the amount of state that can be persisted during the initial run. We thus formulate a bounded query checkpointing problem. While we defer the formal specification of this problem to Section 3.3, our strategy is similar to SAVE-ALL except that we save only a bounded subset of the results persisted by SAVE-ALL and thus skip a smaller set of records in the source node. For a given bound on the amount of state we can persist, there are various choices of which subset of intermediate records to save. Our goal is pick an appropriate subset so that the amount of work saved during the restart run is maximum. Since we do not know when the query is going to be terminated, this subset has to be maintained during query execution and we are faced with an online problem for which we propose an optimal online algorithm in Section 3.4. Finally, we note that there is an inherent tradeoff between the amount of state that is saved and the amount of work we can save during restart. If we sharply bound the amount of state that can be saved, there will be cases when almost the whole work has to be redone. Consider queries that scan all the tuples in a table (e.g. select * from T) or sort all the tuples in a table. Irrespective of what k tuples we save; any scheme can skip at most k records in the scan. If k is small, most of the work has to be redone. Note that this has nothing to do with any specific algorithm for bounded checkpointing, rather with the paradigm itself. This should set our expectations. But as we discuss in this section and demonstrate in the experimental section (Section 5), there are many cases where bounded checkpointing can yield significant dividends at a low overhead. 3. Primitives for Stop-and-Resume Execution In this section, we outline some important primitives for supporting Stop-and-Resume execution. TupleIDs at the Source Node We assume that there is a unique tupleid value that can be used to identify each record in the source node. This can be done by using a tupleid value or by adding the primary key value to the key of a clustering index. For instance, Microsoft SQLServer adds the primary key to uniquefy any index key. We also assume that for any intermediate record (r) generated, the root of the pipeline can keep track of the tupleid value for the corresponding source node. We denote this primitive as Source(r). Skip-Scan Operator We introduce a new operator that can be used to implement the skip functionality outlined in the previous section; it can be thought as a generalized version of a scan operator. It takes two parameters as an input, a starting position (say LB) and an ending position (UB). The semantics of the operator is as follows; it scans the source node and stops the scan after processing the tuple at position LB. It then skips to the ending position (UB) and resumes the scan from the next tuple. The operator will include the record at position LB and will not include the record at position UB. We refer to this primitive as the skip-scan operator. Figure shows how the skip-scan operator can be used for the resumption of the execution plan shown in Figure (a). Conceptually, the results that are cached corresponding to the skipped region are unioned with the rest of the result of the filter operator. The skip-scan operator can be built on top of existing operators such as Table Scan and Clustered Index Scan utilizing the random access primitives from the storage manager. For instance, in a clustered index we can seek to the UB value using the key (we assume they are unique). In the case of Heap Files we could remember the page (pageid, slotid) to resume the scan from. In general, the skip-scan operator can be extended to skip multiple ranges. In this paper we primarily focus on skipping a single contiguous range at the source node. Signaling Mechanism Operators in the query execution plan (that can serve as root of execution pipeline) are extended with the ability to process the set of saved intermediate results. In addition, there needs to be a

4 simple signaling mechanism to coordinate between the skip-scan and the root of the pipeline at the time of the restart. Similar to the end-of-stream (EOS) message that a scan operator sends at termination, the skip-scan operator sends an end-of-lb (EOLB) message before it skips to the UB. On receiving the EOLB message the pipeline root processes the saved records. This mechanism is important in ensuring that the ordering of the tuples at the output of the pipeline is preserved. Figure, indicates how processing the saved records after getting the EOLB message can ensure the ordering of the tuples at the output of the filter operator. Filter LB Union Skip UB Figure. Skip-Scan Operator Filter 3.3 Bounded Query Checkpointing We now formally define the bounded query checkpointing problem. Based on the primitives discussed above, we first define a space of restart plans. Fix a pipeline P. Suppose that the tuples returned at the root are r,r, in order. We refer to record r i as the i th record. An ordered set I consisting of tuples r i,,r i+j returned contiguously at the root of the pipeline is referred to as a window of result tuples whose size is j+ tuples. Such a window identifies a restart plan R(I) as follows. Let LB and UB denote the boundaries of the source region that can be skipped by saving the tuples in I. Recall for an intermediate record r, Source(r) denotes the tupleid value at the source node that generated it. Formally: LB(I) is Source(r i- ) if i>, and 0 (start of the scan) if i=. UB(I) is Source(r i+j ). Then, R(I) is obtained by taking P and replacing the source scan with a skip scan operator seeded with these LB and UB values, and taking an ordered union of the tuples in I at the root of the pipeline. The window I is said to be valid if R(I) is equivalent to P. The bounded query checkpointing problem is the following online problem. Fix a budget of k tuples. As the pipeline P is executing, the goal is to maintain a valid window I of result tuples such that (a) its size is at most k, and (b) among all valid windows that satisfy condition (a), the execution cost of R(I) as measured in GetNext calls is minimum. While our approach can be extended to cover the case where the storage constraint is given in terms of number of bytes, in this paper we consider the constraint specified in terms of number of records for simplicity. We now introduce the notion of the benefit of a restart plan that we use instead of its cost. This notion is based on the number of GetNext calls skipped. We discuss the equivalence of the restart plan in Section 3.5. For record r i, let N i be the total number of GetNext calls issued in the pipeline until the point tuple r i was generated at the root. We define the benefit of saving a record r i as Benefit(r i ) = (N i N i- ) if i > = N i if i = The Benefit of a window (r i..,r i+j ), denoted as Benefit(r i, r i+j ) is the sum of the Benefits of all its records. Benefit(r i, r i+j ) measures the amount of work done in generating this set of intermediate results. We observe that: Claim: Let the total number of GetNext calls issued if the pipeline were run to completion be N. The cost of the restart plan corresponding to the window (r i.. r i+j ) is (N - Benefit(r i, r i+j )). Thus, our goal is to find the result window that has the maximum Benefit. Figure 3 illustrates the problem for the query plan in Figure (a) where the constraint on number of intermediate tuples k is set to the value 3. The Best-3 Region highlighted shows the window of result tuples with the maximum Benefit value. First-3 Region Table A Filter Best-3 Region Records that do not pass filtering criteria Output Records of Filter Operator Figure 3.Table A s disk representation with distribution of output records of Filter operator. Note that bounding the state that is maintained and persisted is critical to our problem formulation. We desire the overhead incurred owing to Stop and Resume execution be minimal. On the other hand, it is critical that even with a sharply bounded overhead, we are able to avoid a significant amount of work when the query is resumed. It is important to find if there exists a small window of result tuples that together yield significant benefit. By saving this small number of tuples, we can then avoid most of the work during resumption. We now report a study that measures the variation in the per-tuple benefit. We study this for a query over the TPC-H data that selects all rows from the lineitem table that satisfy the predicate l_shipdate <= The execution plan for this query is identical to the plan shown in Figure (a). We execute this query over the TPC-H dataset where the data distribution is skewed (refer to Section 5 for details). Figure 4 plots the benefit of an intermediate tuple r i against i. We note that as desired, there is significant non-uniformity in the benefit values. Indeed, most of the benefit is clustered in a range of about 000 tuples between r 4000 and r 6000, even though the lineitem table has 6 million tuples.

5 This indicates that even with a very tight budget on the state, we can achieve significant savings. We next present our online algorithm to exploit this nonuniformity and optimally compute the set of records to save for single pipeline plans. Total Number of GetNext Calls Number of Intermediate Results Figure 4. Benefit(r i ) vs i. 3.4 Opt-Skip Algorithm We now present our online algorithm Opt-Skip for the bounded query checkpointing problem that is guaranteed to preserve the optimal plan as query execution proceeds. /* B = buffer for current window, k= total budget */ /* BestB = buffer for best window */ Algorithm Opt-Skip For Each intermediate record r i do: Append r i to B If B.Size()<= k and Valid(B) then BestB = B Else If B.Size() > k then B = (r i-k+,., r i ) // advance window If Benefit(B) > Benefit(BestB) and Valid(B) then BestB = B Figure 5. Algorithm Opt-Skip If the number of records returned at the pipeline root is less than or equal to the budget k, then we can simply save all of them. Otherwise, we always keep track of the valid window of result records with size at most k that has the best Benefit value. We use an additional buffer of size k to progressively search for a window of k records with a higher Benefit value. This is achieved by maintaining a sliding window of size k as records are returned at the pipeline root. Whenever the current window is valid and has a higher benefit than the current best, we reset the current best appropriately. The sliding window ensures that no (ordered) set of records with a higher Benefit value is missed. Algorithm Opt-Skip is outlined formally in Figure 5. The current best window is maintained in a buffer BestB while the sliding window of k tuples is maintained in buffer B. Note that the Opt-Skip algorithm guarantees to give the optimal solution. 3.5 Checking Validity Notice that in the Algorithm Opt-Skip, we update the BestB buffer only if B is valid. We now discuss our method for checking that a window of result records is valid. Intuitively, we require that the set of source records skipped in the restart plan and the saved window of results be such that the skipped records generate exactly these results. We refer to this property as the skipping invariant. We capture this condition formally as follows. Consider a window of result records B = (r i,,r i+j ). The set of source tuples S skipped consists of all tuples with id > LB(B) and <= UB(B). Then the plan R(B) is equivalent to P if Ǔ t є S P(t) = B. For any window B of result tuples, we can observe that B is contained in Ǔ t є S P(t). However, the converse need not necessarily hold. Consider for example an Index Nested Loop Join between Tables A and B (Figure 6). Say, the outer relation has keys N and some record in the outer relation has more than k matches in the inner relation owing to duplicates. Figure 6(a) illustrates how this can lead to the violation of the skipping invariant (using k=3). Table A Table B k+ duplicate Sliding Window (Size k=3) (a) No Bestk (c) Table A k=3 duplicate Table B.. Figure 6. Handling duplicates using k+ buffers. We handle this issue by allocating two extra buffers in addition to the current window of k tuples (buffer B in Algorithm Opt-Skip). We use these k+ buffers to maintain a sliding window of size k+. Thus, if B is the window (r i,,r i+k- ), then we also store r i- and r i+k (if they exist corner cases can be easily handled.) We check that Ǔ t є S P(t) is contained in B by checking that (Source(r i- )!= Source(r i )) and (Source(r i+k- )!= Source(r i+k )). Figure 6 illustrates this. Using a window size of k+, we can detect that cases 6(b) and 6(c) are incorrect and only 6(d) maintains the skipping invariant in the presence of duplicates. Thus Opt-Skip uses k+ buffers. 4. SOLUTION FOR GENERAL CASE In the previous section, we introduced Stop-and-Resume execution for plans with single pipelines. The main idea behind Sliding Window (Size k+=5) (b) (d) No Bestk Bestk

6 our solution was to save results at the root of the pipeline and skip the corresponding portion of the source node of the pipeline. In this section, we extend our solution to more complex query execution plans that consist of multiple pipelines. 4. Execution Plans with Multiple Pipelines A query execution plan involving blocking operators (such as sort and hash join) can be modeled as a partial order of pipelines called its component pipelines where each blocking operator is a root of some pipeline. For example, the execution plan in Figure 7 features two pipelines, P and P. These pipelines correspond to the build side and probe side of a Hash Join, respectively. In pipeline P, the Table A is scanned and the records that pass the selection criteria of Filter operator are used in the build phase of the Hash Join. The execution of P commences after hashing is finished on the build side. The index on Table B is scanned and records are probed into hashed table for matches. Filter Table Scan A Hash Join Figure 7. Execution Plan with Multiple Pipelines 4. Bounded Query Checkpointing for Multipipeline Plans The space of restart plans defined in Section 3 for single pipeline plans (which we call single-pipeline restart plans in this section) naturally extends to multi-pipeline plans. We are now allowed to save the tuples produced at the root of more than one pipeline and use the skip-scan operator to skip reading portions of the corresponding source node. For instance, in the execution plan in Figure 7, we could save tuples produced by the filter operator in pipeline P and use the skip-scan operator for Table A. Another alternative is to save tuples produced by the probe side of the hash join operator and use the skip-scan operator on Table B. A multipipeline restart plan is obtained by replacing some subset of the component pipelines with corresponding single-pipeline restart plans. A multi-pipeline restart plan is valid if all its component restart plans are valid. It is not hard to see that validity is sufficient for equivalence to the original plan. Our goal, as with single pipeline plans is to find a valid restart plan such that the total state saved, counted as the number of tuples, is bounded and where the cost of the plan measured in terms of GetNext calls is minimized. Again, as with single pipeline plans, we introduce the notion of the benefit of a tuple returned at the root of some pipeline, which is identical to the case of single-pipeline plans. Thus, we are left with the online problem of maintaining the valid restart plan that yields the maximum benefit. One limitation of the above approach is that we restrict ourselves to pipelines where the source nodes support the skip-scan P P Index Scan B operation. In general there may be pipelines where the source node is say a sort node. We cannot currently exploit the skip-scan operator in sort pipelines. But the techniques described in Section 4. can still exploit the other pipelines in such execution plans. Similar to the case of simple selection queries (see Figure 4), the Benefit values for tuples that result from a join pipeline can also vary non-uniformly. We show this empirically for a query over the TPC-H data. The query we choose is such that the plan generated is the one in Figure 8 where we are building a hash table on the Orders.Orderkey and probing using Lineitem.Orderkey. We enforce the predicate o_orderdate <= ' ' on Orders. Figure 7 plots the benefit of join result tuple r i against i. We find a significant non-uniformity in the benefit values similar to Figure 4. Total Number of GetNext Calls Number of Intermediate Results Figure 8. Distribution of GetNext calls 4.3 Algorithms Any pipeline in an execution plan can be in one of three states, it has either completed execution or is currently executing or has not yet started. Naturally, the candidates for best-k tuples must be picked from the output of either the currently executing pipeline or from the pipelines that have completed their execution. The Opt-Skip algorithm outlined in Section 3 can potentially be applied to one or more pipelines. Notice that the Benefit value can be aggregated across the pipelines. Given a fixed number of intermediate results (k) to save, the challenge is in distributing the k tuples among different pipelines to maximize the benefit. Computing the optimal distribution of k tuples in the multipipeline case could however, require excessive bookkeeping, the algorithm would need to keep track of the best-k values for different values of k for different pipelines. The amount of working space required can be as high as nk in contrast to working space of k buffers required by the Opt-Skip algorithm for a single pipeline. In order to keep the overheads of our solution low, we examine different ways in which we can use the single pipeline solution (Section 3.3) as a building block in order to compute the best set of tuples to save for query execution plans with multiple pipelines. At any point, we need a valid restart plan. We always maintain the BestB buffer containing k tuples (see Section 3.4) for the current pipeline. Whenever a pipeline finishes execution or the query is terminated, we need to find the best distribution of the k tuples

7 among the current and previous pipelines. We now outline three approaches that consider multiple pipelines in computing the best k tuples to save. Current-Pipeline Approach: A simple scheme is to only consider the currently executing pipeline and apply the Opt-Skip algorithm from the previous section. But this could potentially be a bad choice. Consider a join query between Lineitem, Orders table where the query plan is similar to one showed in Figure 7 (assume A=Lineitem, B=Orders). Let the currently executing pipeline be the pipeline P involving the index Scan on the Orders table. If the query is terminated at this point, note that any scheme that uses the skip-scan operator on the pipeline P cannot avoid rescanning the entire lineitem table during the restart. A better strategy might actually utilize the skip-scan operator at the input to the join (pipeline P). In our experimental section, we show that using exclusively the current pipeline can sometimes turn out to be a bad decision. Max-Pipeline Approach: This approach picks the single best pipeline that can provide the best Benefit value given a budget of k intermediate tuples. One interesting property is that this algorithm requires a working space of only k buffers which is the same as the case for single pipelined plans. The algorithm is a generalization of the algorithm Opt-Skip. Note that the Opt-Skip algorithm described in the previous section keeps track of the current best (BestB) and a sliding window B that slides forward to find the better set. We can overlap the BestB from the pipelines that have already completed execution to compare with the window B for the pipeline currently executing. At any point we are now guaranteed to find the single best pipeline to cache k tuples across all previous pipelines. Merge-Pipeline Approach: This approach uses a heuristic to allocate the k tuples among different pipelines. It uses the Opt- Skipalgorithm to compute the k tuples to save for each pipeline independently. Consider the point where the second pipeline has finished executing. We now have k tuples cached corresponding to two contiguous regions in the source nodes of the two pipelines. Let the intermediate tuples saved be (r.. r k ) and (s.. s k ). We need to eliminate k tuples from this set of k tuples to satisfy the constraint of bounded query checkpointing. We employ a greedy heuristic that works as follows. Let rmin denote the tuple that has the smaller Benefit value among r and r k and smin denote the tuple that the smaller Benefit value among s and s k. We eliminate the tuple that has the minimum Benefit value among rmin and smin. Intuitively, the greedy heuristic iteratively removes the tuple that has the minimum Benefit value among the edges of the two ranges. This technique would distribute the k tuples while still retaining contiguous ranges in both the pipelines. After every pipeline completes execution, this algorithm will be invoked to get rid of k tuples. The working space required by this algorithm is 3k buffers. Note that we currently use the GetNext Model as a rough estimate to compare the benefit across different pipelines and pick the best one. For complex queries, involving expensive operations such as UDFs, a weighted version of the GetNext model becomes necessary to account for different weights for different pipelines. Sub-tree Caching: There is however, one special case for execution plans consisting of multiple pipelines. If for any pipeline the total number of intermediate results generated is less than the budget k value, then one can not only skip the corresponding leaf but also skip evaluating all the pipelines that preceded the current pipeline as well. We refer to this case as subtree caching. We ensure that the benefit value for the set of tuples is appropriately set to the number of GetNext calls issued over the entire sub-tree. In the following section, we present an experimental evaluation that quantifies the benefits of bounded checkpointing. 5. EXPERIMENTS In this section, we present an experimental evaluation of bounded query checkpointing. Our prototype was built by modifying the query execution engine of Microsoft SQL Server 005. We added the primitives required for Stop-and-Resume execution outlined in Section 3.. The clustered index scan operator was extended to accept a LB and UB value and implement the skip-scan operator as described in Section 3.. Currently, in the prototype the saved intermediate results from the initial run are cached in memory and reused during the restart. The main goals of the experiments are as follows To understand when using the skip-scan operator is most beneficial To examine the utility of bounded query checkpointing for complex queries To compare the different algorithms presented for the case of multi-pipeline execution plans Database: We use both benchmark and real data sets to evaluate our prototype. We use the TPC-H GB database. The original data generator does not introduce any skew in the data. We used a data generator that introduces a skewed distribution for each column independently in a relation. We use a zipfian distribution with a skew factor of z =. Every table had a clustered index on the primary key. For our workload, we used all the queries that reference the Lineitem table; these are typically the long running queries among the TPCH suite. In order to understand how the prototype works on real data sets, we also present an evaluation using the personal edition of the Sky Server database [8]. This is a publicly available database of astronomy data. Evaluation Metric: Queries that are terminated have to be rerun entirely in the absence of techniques suggested in this paper. The main evaluation metric we use is how much of the work done by the query during the initial run is saved in the restart due to the algorithms presented in this paper. We use the GetNext model of work (see Section.3 for details). Let T denote the total number of GetNext calls in the execution of a query Q. Let T denote the number of GetNext calls invoked during the initial run before the query is stopped. Let T be the number of GetNext calls invoked during the restart to reach the same point in execution where the query was originally stopped. Note that this point in execution is well defined for our restart plans. We define the as = (T-T) / T * 00 A main advantage of using the GetNext model for the work done is that we can talk precisely about when a query is stopped and restarted (after 50% of T is done etc.) and that the total number of GetNext calls for a query plan is fixed.

8 We instrumented the query execution to write out the total number of GetNext calls seen thus far as well as the amount of work that can be saved if the query were terminated at that point (this is the Benefit value corresponding to the best set of intermediate results saved until that point). After the query execution is complete, we can analyze this log to figure out what would happen if the query were terminated at any particular point in the past (for the current k value). 5. Use of GetNext Model We observed for the TPC-H queries used in our experiment the leaf nodes in the query execution plan were all clustered index scans. For such query plans, we observed that the Percentage Work Saved measured using the GetNext model was strongly correlated to the corresponding Speed Up in execution time. Prior work on progress estimation indicates that the GetNext model is robust for such scan-based plans [3]. For instance, Figure 8 illustrates the difference between the computed using execution times and the GetNext model for the following join query. The execution times were measured on a machine with 3. GHz CPU and GB of RAM. SELECT COUNT (*) FROM lineitem, orders WHERE l_orderkey = o_orderkey AND l_receiptdate > The query plan is a hash join with the lineitem as the build and represents a simple case of a multi-pipeline plan. The constraint k value (50) was not large enough to cache all intermediate tuples satisfying the filter on the lineitem table. Figure 9 illustrates the absolute error in estimating the (using the GetNext model and actual execution times) as a function of when the query is terminated. Note that the absolute error is uniformly low (around %) across different termination points of the query. We also observed similar results for different selectivity values for the predicate. Absolute Error.60%.40%.0%.00% 0.80% 0.60% 0.40% 0.0% GetNext vs Execution Times 5% 50% 75% 90% Point of Term ination Absolute Error Figure 9. GetNext Model vs. Execution Times. We do note that for execution plans involving operators such as index seeks, a weighted version of the GetNext model would be more appropriate. 5. Effectiveness of the Skip-Scan Operator In this section, we first study the skip-scan operator in the context of a simple query in order to understand where it is likely to yield the best dividends and what its limitations are. Note that the key aspect that influences the efficiency of the skipscan operator (for a given selectivity value) is how these tuples are laid out with respect to the original ordering of the table (see Figure 3). In this section, we study the effectiveness of the skipscan operator by varying these parameters. We use a simple join query between the lineitem and orders table which has a range predicate on the l_receiptdate column of the lineitem table. We vary the selectivity value from around 0.005% to 0% by suitably varying the predicate on the l_receiptdate column. We picked a suitable k value so that it is large enough to save all the intermediate tuples only for the cases where the selectivity is less than 0.0%. For the above query, we used different versions of the lineitem table, one clustered on the l_orderkey attribute and another clustered on the l_shipdate attribute. We also studied the case in which the selected tuples are distributed with uniform gaps in the base relation (i.e. the predicate has no correlation with the clustering column). Figure 0 plots the values for all the 3 cases (clustered indexes on l_shipdate and l_orderkey and the uniform case). We assume the query is stopped at 50%; the results are similar for other points of termination Effectiveness of Skip-Scan Operator 0.005% 0.0% 0.0% % 0% Selectivity Unif orm l_orderkey l_shipdate Figure 0. Effect of Clustering. The first point to note is for the cases where the entire set of intermediate results can be saved (the 0.05% and 0.0% cases), the percentage work saved is maximum independent of the clustering of the tuples that pass the selection. In general, if the gaps between the tuples are uniformly distributed, then any set of k-tuples would have the same benefit value and moreover the benefit would decrease with increasing selectivity. In such cases, one can expect only relatively small benefits in the restart for small values of k. The clustered index on l_shipdate represents an interesting case. Even though the predicate is on the l_receiptdate attribute, the l_receiptdate is always greater than the corresponding l_shipdate. Tuples with higher l_shipdate value are guaranteed to have a higher l_receiptdate value. Thus, tuples that satisfy a range predicate on the l_receiptdate would always be clustered together and the rest of the portions can be skipped during the restart. This represents the best-case scenario for the skip-scan operator as this implicit correlation ensures that the skip-scan operator remains effective even with increasing selectivity. The clustered index on the l_orderkey represents an intermediate case (tuples with a higher l_orderkey value are likely to have a higher l_receiptdate but not guaranteed). For this case the

9 correlation (and as a result the benefits) are not as good as the case of the l_shipdate clustered index but at the same time it is better when compared to the uniform case. Thus, the skip-scan operator yields the maximum benefits when either selectivity is low or there is a strong correlation between predicate column and the clustering column. 5.3 Evaluation on TPC-H Queries In this section, we study the performance of our algorithms for bounded query checkpointing in the presence of complex decision support queries. One important factor is the monitoring overheads imposed by our techniques in the normal case (when the query is not terminated). For the TPC-H workload, we observed that for most of the queries the overheads were within 3% of the original query execution times. Figure shows the for the TPC-H workload assuming the queries are terminated at the 50% point. We used a k value of 50 (If we assume a database page size of 8K and an average size of an intermediate record as 3 bytes, 50 tuples can fit in a single page). The plots in the Figure show that we can save a significant portion of the work done in the initial run by utilizing only a small amount of state. Among the 7 queries in the workload, we can save 40% or more of the work done in the initial run for nearly half the queries in the workload. Query 7 represents a case where we can avoid redoing all the initial work (the percentage work saved is 00%); this is because the predicates are very selective and we can cache all the intermediate results generated thus far in a pipeline. An important point to note is that if the SaveAll technique were to be restricted to the same budget, then it would only be applicable for Query 7 in the workload, while the techniques proposed in this paper are more generally applicable. On the other hand for Queries and 3, we get practically no benefit. Query 3 for instance selects most of the tuples in lineitem, thus saving just 50 tuples does not help much. These represent cases where the selected tuples are distributed with uniform gaps (see Section 5.) and as a result, the skip-scan operator is not as effective. We observed that for most of the queries the Group-By operator was applied at the top of the query execution plan and at the 50% termination point, the operator had not yet started execution. Thus for most of the queries, saving internal operator state would not have been effective. The only exceptions are Query and Query 6, which are single pipeline queries. Query selects 97% of the tuples and computes a set of aggregates. For this case, saving and reusing the partial sums corresponding to the aggregates would have resulted in saving almost all the initial work. For Query 6, even though we do not save partial sums, we still obtain an improvement of 0%. We further discuss partial operator state in Section Algorithm Choice for Multiple Pipelines In Section 4., we presented three algorithms for the case of Multiple Pipelines (Current-Pipeline, Max-Pipeline and Merge- Pipeline). We had mentioned that it could potentially be useful to retain information about previous pipelines and the Current- Pipeline algorithm may not be optimal. Among the 7 queries executed we observed for nearly half of the queries this observation was true (K=50) TPCH Query Figure. (Termination at 50%). Figure illustrates how the varies as a function of when the query is terminated for the Current-Pipeline algorithm. For instance, Query 0 if stopped at around 60% completion can save around 5% of the work during the restart. But if the same query were stopped at 80% completion, there are practically no benefits. The key point to note is that there could be high variance in the benefits of the Current-Pipeline algorithm depending on the effectiveness of the skip-scan operator at different pipelines Current-Pipeline Algorithm 0.00 % 30% 40% 50% 60% 70% 80% 90% Point of Term ination Figure. Ignoring Previous Pipelines. Query 0 Query Query 4 Query 8 For Query 0 (a join of 4 relations), we also illustrate how the Max-Pipeline and the Merge-Pipeline algorithm compare to the Current-Pipeline algorithm in Figure 3. For this query, the best pipeline to skip is the one in which the source node is the Lineitem table. Both the Max-Pipeline and Merge-Pipeline algorithm are able to identify this; the Current-Pipeline algorithm however chooses to always go with the current pipeline, in this case the later pipelines (after the 70% point) are not good candidates for utilizing the skip-scan operator. In this case the savings obtained by using the Max-Pipeline and the Merge- Pipeline algorithm were the same. In our experiments, we found that the Max-Pipeline algorithm performed as well as the Merge-Pipeline algorithm for many cases. But, there were instances utilizing the skip-scan operator at multiple pipelines made a significant difference. For instance, Figure 4 shows how the Merge-Pipeline algorithm outperforms the Max-pipeline algorithm for TPCH Query. Query

10 involves self-joins of the Lineitem table and the Merge-Pipeline can utilize the skip functionality on multiple scans of the Lineitem table when compared to the Max-Pipeline algorithm which can use the skip-scan operator on at most one scan node. Notice that the savings are equal until the first pipeline finishes execution (at around 50%). The Merge-Pipeline algorithm can effectively utilize multiple pipelines after this point in execution. Comparing Multi-Pipeline Algorithms 5.6 Evaluation on SkyServer Database In this section, we study the effectiveness of bounded checkpointing on real data. We use the publicly available personal edition of the SkyServer database [8]. The query workload consists of 3 queries. Figure 6 shows the Percentage Work Saved for all the long running queries from this workload (the termination point is 50%). Effect of K Value Current-Pipeline Max-Pipeline Merge-Pipeline Percantage Work Saved (K=000) (K=5000) (K=50000) 30% 40% 50% 60% 70% Point of Termination 80% 90% TPCH Query Figure 3. Multi-Pipeline Algorithms. 0% 0% 30% Max-Pipeline vs Merge-Pipeline 40% 50% 60% Point of Termination 70% 80% 90% MAX-Pipeline (Query ) Merge-Pipeline (Query ) Figure 4. Max-Pipeline vs. Merge-Pipeline 5.5 Effect of k value In Figure 5, we study how the effectiveness of the algorithm changes with the change in the number of intermediate results that are allowed to be saved. We show results for k = 000, 5000 and We assume the termination point is 50%. As the plots indicate the benefits do not necessarily increase linearly with increasing the k value. There are a couple of reasons for this. Firstly, based on the intermediate sizes and the k values, subtree caching (Section 4) can make a huge difference. For instance, for the K=50000 case, we can save 00% of the work for 5 of the queries (compared to the single query in Figure 0). Another reason is the significant non-uniformity in the amount of work done for each intermediate record for the TPCH workload (as the plots in Figure 4 and 8 showed). Notice that if the work done in generating each intermediate result in a pipeline were uniform, then one would expect a linear return for an increasing value of k. These results (in conjunction with those in Section 5.3) show that our algorithms can be effective even when only a small amount of state is saved and reused. Of course, our algorithm yields increasing benefits when more state is allowed. Figure 5. as a function of k (K=50) SkyServer Query# Figure 6. Evaluation on SkyServer Database The results show that bounded checkpointing is useful for real workloads; for nearly half of the queries we can provide a savings of 40% or more. An important point to note is that queries and 3 are single pipeline queries with no group-by or aggregations. Thus the numbers reported are optimal (see Section 3.3); we cannot do better for these queries under the constraint of bounded checkpointing. 5.7 Summary The experimental results indicate that bounded query checkpointing can indeed be an effective alternative to terminating the query and losing all the work done in the initial run. Another important factor to consider is the overhead imposed by our proposed techniques in the normal case (if the query were not terminated). For the TPC-H workload, we compared the execution times of the queries with and without the overheads of the proposed techniques. For most of the queries, the overheads were within 3% of the query execution times. We summarize the experimental results below There are many cases where the skip-scan operator is useful (selective predicates, sub-tree caching or correlation between the predicates and the clustering

Stop-and-Restart Style Execution for Long Running Decision Support Queries

Stop-and-Restart Style Execution for Long Running Decision Support Queries Stop-and-Restart Style Execution for Long Running Decision Support Queries Surajit Chaudhuri Raghav Kaushik Abhijit Pol Ravi Ramamurthy Microsoft Research Microsoft Research University of Florida Microsoft

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Avoiding Sorting and Grouping In Processing Queries

Avoiding Sorting and Grouping In Processing Queries Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Implementing Relational Operators: Selection, Projection, Join Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Readings [RG] Sec. 14.1-14.4 Database Management Systems, R. Ramakrishnan and

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

1.1 - Basics of Query Processing in SQL Server

1.1 - Basics of Query Processing in SQL Server Department of Computer Science and Engineering 2013/2014 Database Administration and Tuning Lab 3 2nd semester In this lab class, we will address query processing. For students with a particular interest

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations. Relational Operations Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

EXTERNAL SORTING. Sorting

EXTERNAL SORTING. Sorting EXTERNAL SORTING 1 Sorting A classic problem in computer science! Data requested in sorted order (sorted output) e.g., find students in increasing grade point average (gpa) order SELECT A, B, C FROM R

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Query Processing. Introduction to Databases CompSci 316 Fall 2017 Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2

More information

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk

More information

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into

More information

Midterm Review. March 27, 2017

Midterm Review. March 27, 2017 Midterm Review March 27, 2017 1 Overview Relational Algebra & Query Evaluation Relational Algebra Rewrites Index Design / Selection Physical Layouts 2 Relational Algebra & Query Evaluation 3 Relational

More information

Diagnosing Estimation Errors in Page Counts Using Execution Feedback

Diagnosing Estimation Errors in Page Counts Using Execution Feedback Diagnosing Estimation Errors in Page Counts Using Execution Feedback Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy Microsoft Research One Microsoft Way, Redmond WA USA. surajitc@microsoft.com

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

PS2 out today. Lab 2 out today. Lab 1 due today - how was it?

PS2 out today. Lab 2 out today. Lab 1 due today - how was it? 6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

CS122 Lecture 15 Winter Term,

CS122 Lecture 15 Winter Term, CS122 Lecture 15 Winter Term, 2014-2015 2 Index Op)miza)ons So far, only discussed implementing relational algebra operations to directly access heap Biles Indexes present an alternate access path for

More information

Dtb Database Systems. Announcement

Dtb Database Systems. Announcement Dtb Database Systems ( 資料庫系統 ) December 10, 2008 Lecture #11 1 Announcement Assignment #5 will be out on the course webpage today. 2 1 External Sorting Chapter 13 3 Why learn sorting again? O (n*n): bubble,

More information

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Columnstore and B+ tree. Are Hybrid Physical. Designs Important? Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 B+ tree & Columnstore on same table = Hybrid design 4? C O L C O L B+ tree B+ tree ? C O L C O L B+ tree B+ tree

More information

External Sorting Implementing Relational Operators

External Sorting Implementing Relational Operators External Sorting Implementing Relational Operators 1 Readings [RG] Ch. 13 (sorting) 2 Where we are Working our way up from hardware Disks File abstraction that supports insert/delete/scan Indexing for

More information

CompSci 516 Data Intensive Computing Systems

CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework

More information

Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017

Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017 Triangle SQL Server User Group Adaptive Query Processing with Azure SQL DB and SQL Server 2017 Joe Sack, Principal Program Manager, Microsoft Joe.Sack@Microsoft.com Adaptability Adapt based on customer

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

University of Waterloo Midterm Examination Solution

University of Waterloo Midterm Examination Solution University of Waterloo Midterm Examination Solution Winter, 2011 1. (6 total marks) The diagram below shows an extensible hash table with four hash buckets. Each number x in the buckets represents an entry

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing: Lab 2 -- Due Weds. Project meeting sign ups 6.830 Lecture 8 10/2/2017 Recap column stores & paper Join Processing: S = {S} tuples, S pages R = {R} tuples, R pages R < S M pages of memory Types of joins

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

Optimization of Queries with User-Defined Predicates

Optimization of Queries with User-Defined Predicates Optimization of Queries with User-Defined Predicates SURAJIT CHAUDHURI Microsoft Research and KYUSEOK SHIM Bell Laboratories Relational databases provide the ability to store user-defined functions and

More information

Revisiting Pipelined Parallelism in Multi-Join Query Processing

Revisiting Pipelined Parallelism in Multi-Join Query Processing Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu Elke A. Rundensteiner Department of Computer Science, Worcester Polytechnic Institute Worcester, MA 01609-2280 (binliu rundenst)@cs.wpi.edu

More information

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch Advanced Databases Lecture 4 - Query Optimization Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

CS330. Query Processing

CS330. Query Processing CS330 Query Processing 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull interface: when an operator is `pulled for

More information

GSLPI: a Cost-based Query Progress Indicator

GSLPI: a Cost-based Query Progress Indicator GSLPI: a Cost-based Query Progress Indicator Jiexing Li #1, Rimma V. Nehme 2, Jeffrey Naughton #3 # Computer Sciences, University of Wisconsin Madison 1 jxli@cs.wisc.edu 3 naughton@cs.wisc.edu Microsoft

More information

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 External Sorting Today s Topic Implementing the join operation 4/8/2009 Luke Huan Univ. of Kansas 2 Review DBMS Architecture

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems Query Processing: A Systems View CPS 216 Advanced Database Systems Announcements (March 1) 2 Reading assignment due Wednesday Buffer management Homework #2 due this Thursday Course project proposal due

More information

Module 4: Tree-Structured Indexing

Module 4: Tree-Structured Indexing Module 4: Tree-Structured Indexing Module Outline 4.1 B + trees 4.2 Structure of B + trees 4.3 Operations on B + trees 4.4 Extensions 4.5 Generalized Access Path 4.6 ORACLE Clusters Web Forms Transaction

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012 Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten

More information

Index Merging. Abstract. 1. Introduction

Index Merging. Abstract. 1. Introduction Index Merging Surajit Chaudhuri and Vivek Narasayya 1 Microsoft Research, Redmond Email: {surajitc, viveknar}@microsoft.com http://research.microsoft.com/~{surajitc, viveknar} Abstract Indexes play a vital

More information

RELATIONAL OPERATORS #1

RELATIONAL OPERATORS #1 RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query

More information

CSE 190D Spring 2017 Final Exam

CSE 190D Spring 2017 Final Exam CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,

More information

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa

Beyond EXPLAIN. Query Optimization From Theory To Code. Yuto Hayamizu Ryoji Kawamichi. 2016/5/20 PGCon Ottawa Beyond EXPLAIN Query Optimization From Theory To Code Yuto Hayamizu Ryoji Kawamichi 2016/5/20 PGCon 2016 @ Ottawa Historically Before Relational Querying was physical Need to understand physical organization

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Announcements (March 1) Query Processing: A Systems View. Physical (execution) plan. Announcements (March 3) Physical plan execution

Announcements (March 1) Query Processing: A Systems View. Physical (execution) plan. Announcements (March 3) Physical plan execution Announcements (March 1) 2 Query Processing: A Systems View CPS 216 Advanced Database Systems Reading assignment due Wednesday Buffer management Homework #2 due this Thursday Course project proposal due

More information

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System

Review. Relational Query Optimization. Query Optimization Overview (cont) Query Optimization Overview. Cost-based Query Sub-System Review Relational Query Optimization R & G Chapter 12/15 Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory

More information

Next-Generation Parallel Query

Next-Generation Parallel Query Next-Generation Parallel Query Robert Haas & Rafia Sabih 2013 EDB All rights reserved. 1 Overview v10 Improvements TPC-H Results TPC-H Analysis Thoughts for the Future 2017 EDB All rights reserved. 2 Parallel

More information

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

Overview of Query Evaluation

Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein Student Name: Student ID: UCSC Email: Final Points: Part Max Points Points I 15 II 29 III 31 IV 19 V 16 Total 110 Closed

More information

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12 SQL QUERY EVALUATION CS121: Relational Databases Fall 2017 Lecture 12 Query Evaluation 2 Last time: Began looking at database implementation details How data is stored and accessed by the database Using

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Database Management System

Database Management System Database Management System Lecture Join * Some materials adapted from R. Ramakrishnan, J. Gehrke and Shawn Bowers Today s Agenda Join Algorithm Database Management System Join Algorithms Database Management

More information

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky

Administriva. CS 133: Databases. General Themes. Goals for Today. Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Administriva Lab 2 Final version due next Wednesday CS 133: Databases Fall 2018 Lec 11 10/11 Query Evaluation Prof. Beth Trushkowsky Problem sets PSet 5 due today No PSet out this week optional practice

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation. Query Processing QUERY PROCESSING refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information