AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania.

Size: px
Start display at page:

Download "AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania."

Transcription

1 AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS Mengmeng Liu Computer and Information Science University of Pennsylvania WPE-II exam Janurary 28, 2 ASTRACT Traditional database query processors separate query optimization from query execution: query plans are chosen by query optimizers and sent to execution engines for processing, until all query results are completely computed. However, new applications such as data streams, large-scale and data integration applications, require query processors to adapt to unpredictable data characteristics and dynamic environments. Query optimization and query execution need to be interleaved to constantly find new plans and replace obsolete plans, as new information is discovered about the data being processed. The main challenges of adaptive query processing lie in ensuring correct results, eliminating duplicates and maintaining good performance. In this report we survey three different systems on adaptive query processing: Tukwila-99 [], Tukwila-4 [2], and Cape-4 [2]. We then divide adaptive query processing into five stages: plan pre-optimization, plan monitoring, plan analysis, plan re-optimization and plan migration. In each of these stages, different approaches are examined, analyzed, and compared when applicable. Empirical evaluations of different systems are also discussed in this report.

2 Contents Introduction 2 Overview of Adaptive Query Processing Systems 3 2. Tukwila-99: Plan-partitioning Adaptivity for Data Integration Tukwila-4: Data-partitioning Adaptivity for Data Integration Cape-4: Data-partitioning Adaptivity for Data Streams Adaptive Query Processing in Five Stages 9 3. Plan Pre-optimization Plan Monitoring Plan Analysis Plan Re-optimization Plan Migration Cape-4 s moving state strategy Cape-4 s parallel track strategy Tukwila-4 s stitch-up strategy Analysis Evaluations Adaptive Query Processing vs. Static Query Processing Comparison of Plan Migration Strategies Summary Conclusion 27 ii

3 Chapter Introduction Traditional database query processors decouple query optimization from query execution. Query optimizers compute optimal plans, and send them to query engines for execution until all query results are completely computed. Commercial database systems, such as Oracle, D2, and Microsoft SQL Server, still use this model today. However, such a model is not applicable to new database applications, such as data streams, federated systems, large-scale distributed applications, and data integration systems. Hence, a new query processing model, adaptive query processing, has been proposed and studied [3, 2,, 2, 2, 4] over the last decade. In adaptive query processing, query optimizers are interleaved with query executors to constantly find new plans and to replace obsolete plans, as new properties of the data that are available to the system are discovered. Two of the main reasons that adaptive query processing is needed in these new applications are listed below. Missing or inaccurate statistics In federated and data integration applications, statistics on source data are often missing or inaccurate. This is mainly because data sources are heterogeneous, remote, sometimes unreliable, and connected by wrappers which may have little knowledge about data statistics. It is known that query optimizers rely on data statistics (e.g., cardinality, value distributions) to estimate plan costs in choosing optimal plans. Hence, if data statistics are missing or inaccurate, query optimizers may mistakenly estimate plan costs, resulting in choosing inferior plans for execution. Therefore, query optimizers have to use heuristics to find optimal plans, decreasing the likelihood of the selected plan being the optimal plan. With adaptive query processing, data statistics may be better estimated after queries are executed; in consequence, a better plan may be chosen to replace the old plan, improving the overall performance of executing that query. In data stream applications, data may arrive after query processors start execution. Under such circumstances, traditional query processors may not work. In this highly dynamic environment, data characteristics fluctuate and data arrival rates are unknown; hence, it is hard to make good estimations on plan costs. Adaptive query processors allow systems to incorporate run-time information to assist finding good plans after a query is executed. Even in standard static databases, traditional query optimizers may sometimes make bad estimations. Indeed, query optimizers estimate the costs of query plans using a cost model that relies

4 heavily on the cardinality information, which in turn depends on estimates of selectivities of the predicates in the query. Most query optimizers estimate selectivities of individual predicates by summaries of data distributions, such as histograms. However, selectivities of the conjunction of predicates sometimes cannot be correctly estimated because of correlations between these conjunctive predicates. In fact, in worst case, estimates of conjunctive predicates may have exponential growth in errors [9]. Standard query optimizers [7, 6, 6] typically make the assumption that the attributes of any relation are independent. This is problematic if a query has a conjunction of predicates over correlated attributes. For example, it is quite likely that people s ages and salaries are correlated. Assume the percentage of people who are older than 3 and of the people who earn more than 3 dollars a year are both 5 percent, and the percentage of people who satisfy both these conditions is 5 percent. A query like SELECT count(*) FROM P WHERE P.age 3 AND P.salary 3 may cause the query optimizer to make wrong estimations on the cardinality of the output, if it assumes that these two predicates are independent. It would estimate the output cardinality as 25 percent of the size of the whole source table which is in fact 5 percent. Under such circumstances, adaptive query processing may be helpful. If a query optimizer chooses a poor plan, the statistics and the correlations can be monitored by the adaptive query processor; hence, cost estimations can be constantly updated to help finding the correct plan. Unpredictable system behavior Another big challenge in new applications is unpredictable system behaviors. In large-scale and shared-nothing parallel systems, unpredictable behaviors may be prevalent, such as machine failures, communication errors, or other system faults. Adaptive query processing has the benefit of changing to a feasible plan solution when an unpredictable system behavior occurs, in order to ensure the availability of query processing. Adaptive query processors also can monitor the system state trend at run-time to predict system behaviors before system faults occur. This can help systems to make early adaptations, avoiding losing query results. In other applications such as data streams, system resources may be exhausted in the middle of execution. For instance, the system may run out of memory, or the network may be clogged by messages. Adaptive query processing can make timely changes to alleviate these problems and choose future plans to avoid such problems from occurring again. Adaptive query processing has been extensively studied from the 9 s. Initial efforts include query re-optimization at materialized points [3], CPU scheduling-based adaptation [8, 8] and redundantly-computed adapting methods []. Recently there has been work in addressing the moving state problem [2, 2, 4]. Apart from these plan-based methods, a tuple-based routing scheme, Eddy [2, 5], has been proposed to re-route tuples to plans in highly dynamic environments. There are also a few surveys that summarize various aspects of adaptive query processing [5, 3, 4]. In this paper, we examine three different adaptive query processing systems: Tukwila-99 [], Tukwila-4 [2], and Cape-4 [2]. In Chapter 2 we briefly describe each system. In Chapter 3, we divide adaptive query processing into five consecutive stages: plan pre-optimization, plan monitoring, plan analysis, plan re-optimization and plan migration. We study different approaches of the systems in each of these five stages, and compare them when applicable. We examine experimental evaluations of these systems in Chapter 4, and conclude in Chapter 5. 2

5 Chapter 2 Overview of Adaptive Query Processing Systems In this chapter, we introduce three adaptive query processing systems: Tukwila-99 [], Tukwila- 4 [2], and Cape-4 [2]. For each system, we briefly discuss the motivating applications and challenges, the overall system architecture, and the main proposed techniques. 2. Tukwila-99: Plan-partitioning Adaptivity for Data Integration Motivation Traditional query processing models may not be applicable to most data integration applications for several reasons: the absence of statistics, unpredictable data arrival rates, and redundancy among sources. We discussed the first two reasons in the previous chapter. The third one, redundancy among sources, may cause the query processor to waste time in processing duplicate tuples from duplicate sources. The Tukwila-99 [] system aims to address these challenges by adopting an adaptive query processing model, and by incorporating run-time information into the decision engine. Architecture Figure 2. features the high-level system architecture of the Tukwila-99 [] system. The query optimizer and the execution engine are interleaved to answer queries. Users pose queries to the system, and they go through a Query Reformulation phase to be reformulated into queries over data sources. This is because user queries may be formulated over virtual mediated schemas, and they need to be reformulated over source schemas. The job of the query optimizer is to transform the reformulated query into a physical query execution plan for the execution engine. The optimizer in Tukwila-99 is able to create partial plans if data statistics are incomplete, and it produces rules to define adaptive behavior. The execution engine processes the query plans that are produced by the optimizer. It includes an event handler for dynamically interpreting rules generated by the optimizer and supports a few re-planning techniques when adaptation is triggered. Finally, the query execution engine communicates with data sources through a set of 3

6 Figure 2.: System Architecture of Tukwila-99 ([]) wrappers. Wrappers are responsible for transforming the data from the format used in sources to the format used in the Tukwila-99 system. Main techniques Tukwila-99 s adaptive techniques are mostly plan-partitioning based. That is, the re-optimization or re-scheduling of plans can only occur at the end of fragments, e.g., the end of pipelined units or blocking operators, where all the source data must be processed together. The intermediate results, which are output by the fragments, must be materialized before being processed by the re-optimized portion of the plan. The optimizer must decide how many fragments to complete, balancing the potential performance penalty of materialization versus the potential benefit of being able to adapt the plan if the original plan is poor. Since there is very little information to go by, this is by necessity heuristics-driven. We will later discuss the other two systems s data-partitioning techniques, in which different portions of data can be processed by different plans in parallel, which are not necessarily materialized at intermediate points. The Tukwila-99 system exploits a rule-based framework that is built into the core of the adaptation decision engine. Generally, the rules in Tukwila-99 are produced by the query optimizer and interpreted and executed by the query execution engine. These rules have the form WHEN event IF conditions THEN actions. They usually specify when and how to modify the run-time behavior of certain operators and which conditions to check in order to detect opportunities for re-optimization. For example, a rule can be written as follows. WHEN closed(join) IF monitored-card(join) > 2 * estimated-card(join) THEN reoptimize This rule means that when a join operator finishes execution, if the run-time monitored cardinality of this join operator is twice as large as the estimated cardinality of the join operator, then the system chooses to trigger the re-optimization procedure at this join operator. However, this join operator must be the end of a fragment. This rule may help the system to alleviate the bad effects of inaccurate cardinality estimations. In Tukwila-99, these rules are written in procedural languages such as C/C++ to facilitate manipulation and event handling, although they resemble 4

7 active rules in deductive databases [9]. Examples of events include opening or closing an operator, failing to connect to sources, running out of memory, and having processed n tuples by an operator. Once an event has triggered a set of rules, the conditions of each rule are evaluated in parallel. A condition can be a comparison of a monitored state, an estimated state or a threshold. After all conditions for a given event have been evaluated, actions are executed. Tukwila-99 s actions consist of setting the overflow method of a pipelined join, deactivating an operator, rescheduling the remaining operator tree, and re-optimizing the remaining plan. As discussed before, these actions are all plan-partitioning methods. There are a few restrictions on this rule-based system to avoid common mistakes. For example, all of a rule s actions must be executed before another event is processed, and two rules that might affect each other can not be executed in parallel. Tukwila-99 also features two novel adaptive operators that can be invoked when certain conditions hold. One of them is a double pipelined hash join operator with memory overflow mechanisms. It is symmetric and incremental, which has the benefit of avoiding blocking. It fetches an input, probes the input against the hash table on the other side, and outputs the result immediately when it matches. The only trade-off over a non-pipelining hash join is that it has to buffer the state, e.g., the hash tables of both joined relations. If the system runs out of memory, then a pipelined hash join operator with memory overflow mechanisms can flush portions of its state to a disk. The other adaptive operator introduced in Tukwila-99 is the dynamic collect operator. In essence, it is a union operator except that it fetches only the necessary data sources; as data sources may be redundant. Furthermore, if at run-time sources become slow or unavailable, the collect operator can adapt to a new back-up source. Conclusion Tukwila-99 introduces a number of plan-partitioning adaptive mechanisms. Adaptivity is designed into the core to facilitate interleaving of query optimization and query execution. Tukwila-99 proposes new query operators, such as the double pipelined join operator and the dynamic collect operator, to respond to insufficient memory and redundant source data. It also provides a rule-based platform that can incorporate different adaptive mechanisms. 2.2 Tukwila-4: Data-partitioning Adaptivity for Data Integration Motivation Tukwila-99 proposes a nice framework for adaptive query processing, whereas Tukwila- 4 extends this framework to allow for data-partitioning adaptivity. Data-partitioning adaptivity can route different parts of the data to different plans, without materializing intermediate results or requiring adaptive activity to be limited to fragment points. However, data-partitioning introduces several new challenges. For instance, the state of operators in the old plan may be required in the new adapted plan; hence, how to identify the common state and maintain effective book-keeping information to avoid re-creation of the entire state is important and challenging. On the other hand, merely collecting query results from the old plan and the new plan may not be sufficient. The query results computed by joining data from different plans must be included in an efficient and effective way. In summary, data-partitioning adaptivity can be formalized according to the rules of the relational algebra. Plan migration is a means of accomplishing that in a way that is 5

8 Figure 2.2: System Architecture of Tukwila-4 ([]) guaranteed to result in correct answers. Architecture The architecture of the Tukwila-4 system is shown in Figure 2.2. This architectural figure focuses on the plan re-optimization and plan migration components. The query reformulation and wrapper interfaces are similar to those of the Tukwila-99 system, but they are not shown in this figure. Here, the query optimizer and the query execution engine are again interleaved for adaptation purposes. However, some components in the query optimizer and the execution engine are more refined compared to those in Tukwila-99 (shown in Figure 2.). First, there is a separate thread that monitors the run-time statistics and consequently updating the statistics in the cost estimator of the query optimizer as well as the global statistics. This thread maintains statistics collected at all phases. Second, the query re-optimization procedure triggered by the query execution engine is executed by the query optimizer, which chooses a new query plan. Finally, and probably more importantly, Tukwila-4 allows for data-partitioning adaptivity, where different portions of the data can be sent to different phrases of the plan. The adaptive process in Tukwila-4 continues until all the source data are completely processed. Main techniques The major contribution of the Tukwila-4 system is its plan migration techniques in computing the stitch-up plan that complements the old plan and the new plan in generating complete query results. We will describe the approaches to generating and computing stitch-up plans in detail later in Section 3.5. In addition to plan migration, Tukwila-4 s platform also allows for many other mechanisms: re-using the state from previous plan phases (avoiding the need to recompute certain query subexpressions), monitoring the process of execution, and re-estimating plan costs. We will discuss the monitoring process and query re-optimization in Section 3.2 and Section 3.4 respectively. Here we briefly discuss Tukwila-4 s state sharing techniques. The internal state (e.g., hash tables) of stateful operators (e.g., join, aggregate) can be shared for equivalent subexpressions. In Tukwila-4, state structures (e.g., sorted lists, hash tables) and iterator modules (e.g., build-then-probe, mergedriven) are decoupled to allow sharing state structures across operators in different plans. For example, an adaptation from a pipelined hash join to a nested-loop join can benefit from this decoupling. Another important issue in state sharing is that the operator state with the same subexpression 6

9 but with different hashed structures cannot be shared. For example, in (A ) C, relation may join relation A on attribute x. However, in a logically equivalent expression A ( C), relation may join relation C on attribute y where y x. Hence the state of relation must be re-hashed to be shared with the other plan. Conclusion Tukwila-4 extends Tukwila-99 s platform to allow for data-partitioning adaptivity. It assumes that query plans can be continuously adapted at mid-execution and that source data are partitioned to these different phases of the plans. This data-partitioning assumption imposes new plan migration challenges, and complete non-redundant query results have to be computed efficiently during the process of plan adaptation. Tukwila-4 addresses this problem by computing stitch-up plans and discusses state sharing techniques to facilitate this computation. Tukwila- 4 emphasizes plan re-optimization and plan migration techniques that complement Tukwila-99 s techniques, and they can be incorporated into the same Tukwila framework. 2.3 Cape-4: Data-partitioning Adaptivity for Data Streams Motivation Cape-4 [2] aims to address a similar plan migration problem as Tukwila-4 does. However, Cape-4 targets data stream applications instead of data integration applications. Coincidently, these two papers were presented at the same conference; hence, they can be regarded as independent. In data stream applications, data may arrive after query execution, and queries may exist forever since there is no bound on the size of the data. Therefore, this situation poses some new challenges to the plan migration problem. First, a windowing model should be defined to bound the life span of input tuples in operators, since input data may be unbounded in length. Second, correct query results should be defined based on the specific windowing model. Third, the data arriving after adaptation should be computed correctly, and must be processed by the system in order. As with Tukwila-4, approaches to plan migration should ensure correct results, eliminate duplicates, and maintain good performance. Architecture Cape-4 is built around plan migration. The Cape-4 paper [2] assumes that the old plan and the new plan are given, and the main task is to design algorithms to migrate the state properly. As a result, there are no plan monitoring or plan re-optimization techniques discussed in the paper. On the other hand, since Cape-4 aims for data stream applications, its data source model is very different from Tukwila s. In Tukwila, data sources are connected through wrappers to the system, and a query reformulation component near the front-end is used to reformulate data integration queries. In contrast, there is no need for query reformulation in Cape-4 for data stream sources, but the queries may be continuously executed with unbounded input data. Main techniques Cape-4 proposes two approaches addressing the plan migration problem: the moving state strategy and the parallel track strategy. The moving state strategy moves the state inside operators (e.g., hash tables inside the join operator) from the old plan to the new plan and feeds the new data (data that arrives after the adaptation point) only to the new plan to be processed. In contrast, the parallel track strategy sends new data to both the old plan and the 7

10 new plan without moving the state. oth strategies ensure correct results and eliminate duplicates; however, they have different performance overhead under different conditions. The details of these two strategies will be described in Section 3.5. Another important issue is a clear definition of the window semantics over stream data and a clear definition of the correct results under such model. In Cape-4, every stateful operator is associated with a window size. For example, a join operator A with window size W means that for every tuple a of stream A, it only joins tuple b of with T b T a W where T x is the timestamp of a tuple x. This timestamp is the arriving time recognized by the local machine. Under such semantics, the size of the state inside the join operator can be bounded. Since tuples from stream A are strictly ordered on timestamp, a tuple b can be purged from the state of at the arrival of a if and only if T a T b > W. This is based on the intuition that any tuple b that satisfies this condition can not possibly join with any tuple from stream A that arrives after a; hence, this tuple b can be discarded when a arrives. Given these window semantics and purging rules, the correct results can be defined. Note that more complex cases, such as a combined tuple (intermediate tuple such as A tuples) which has been purged by a combined tuple, are also discussed. Conclusion Cape-4 focuses on the plan migration problem in adaptive query processing. It assumes a data streaming model, in which data sources arrive at the system continuously, even after execution. This assumption poses more challenges to Cape-4; the plan should be migrated correctly and efficiently under the clear window semantics over stream data. Cape-4 proposes two techniques to plan migration: the moving state strategy and the parallel track strategy. It also develops a cost model to estimate the cost of plan migration. Several experiments are shown in the paper to evaluate the two strategies under different system configurations and stream workloads. 8

11 Chapter 3 Adaptive Query Processing in Five Stages In this chapter, we propose a framework that divides adaptive query processing into five consecutive stages. Each stage is an integral part of the whole process. As shown in Figure 3., these five stages are Plan Pre-optimization (generating an initial plan for a query), Plan Monitoring (monitoring the plan status, system performance, as well as data characteristics), Plan Analysis (analyzing how well the current plan functions and deciding whether an adaptation is needed), Plan Re-optimization (finding a new plan that is better than the current plan), and Plan Migration (migrating the current plan to the new plan). These five stages form a loop and are continuously executed until query results are computed. We define a process that considers each stage in sequence and each sequence forms a phase. A complete execution may span multiple phases. In the next five sections, we will discuss each stage in detail. In each stage, we examine the techniques used in the systems described in the last chapter: Tukwila-99, Tukwila-4, and Cape-. Table 3. lists the stages that each paper discusses. For those stages in which only one system is discussed, such as Plan Pre-optimization, we describe the approach used in the system; for other stages in which at least two systems are discussed, such as Plan Migration, we describe their approaches and analyze the differences among these systems. An Example Throughout the discussions in this chapter, we will use an example, when applicable, to illustrate the main technical points. Suppose there are in total three data source relations: A(x, y), (x, z), and C(y, z), and an example query over these data sources is as follows. select * from A,,C where A.x =.x and A.y = C.y and.z = C.z This query asks for natural joins over relations A,, and C. There are in total three predicates in the where clause; however, two join operators are sufficient to execute this query. Figure 3.2 shows three equivalent yet different logical query plans that can be used to execute this query. The three plans are computed by using algebraic transformation rules, e.g., associativity or commutativity of 9

12 Figure 3.: Five stages of adaptive query processing Ay. Cy., z. Cz. Ax. x., Ay. Cy. Ax. x., z. Cz. A C A C AC Ax. x. z. Cz. Ay. C. y A C A C ( A ) C A ( C ) ( A C) Plan : Plan 2: Plan 3: Figure 3.2: Three possible plans for the example query joins. The left plan joins A and before C; the middle plan joins and C before A; and the right plan joins A and C before. If we assume that all join operators are symmetric, then these three plans are exactly the three possible logical query plans of this query. 3. Plan Pre-optimization Traditional query optimizers (e.g., Starburst [7], Volcano [6], and System-R [6]) generally use a dynamic programming algorithm to find an optimal plan over the plan search space. This optimal plan is a plan with the minimal cost based on a cost model combining performance factors, such as CPU, I/O, memory, and bandwidth, which in turn depends largely on the accuracy of statistics over source data (e.g., cardinality and data distributions). Hence, if statistics are incomplete, a plan pre-optimizer must rely on heuristics to find the initial plan for the system. Among the three systems, Tukwila-99 and Tukwila-4 discuss plan pre-optimization. We describe their approaches below. Tukwila-99 Tukwila-99 only allows for plan-partitioning adaptivity; hence, re-optimization or re-scheduling can only take place at the end of fragments. The data must be materialized before re-optimization. Furthermore, the initial plan can be a partial plan as long as it is a complete fragment. For instance, suppose in our example above, relation A has tuples, relation has, tuples, but relation C is of unknown size. Tukwila-99 s pre-optimizer would return A in this stage as a partial plan because relations A and are known to be the smallest and can be joined together. When the system reaches the end of the fragment, the whole intermediate results of A are materialized before the re-optimization of the remaining portion of the plan.

13 System Pre-optimization Monitoring Analysis Re-optimization Migration Tukwila-99 [] yes yes yes yes no Tukwila-4 [2] yes yes yes yes yes Cape-4 [2] no no no no yes Table 3.: The stages that each paper discusses Tukwila-4 The main difference of the pre-optimizer between Tukwila-99 and Tukwila-4 is that the initial plan must be complete even when statistics are missing. This is because the datapartitioning adaptivity of Tukwila-4 requires an initial plan to be responsible for the execution of the old data. In order to find the complete query, Tukwila-4 extends a standard top-down optimizer (recursion with memorization) with a guess for each relation with missing statistics. These heuristics may return a poor plan, but at least they give a complete plan based on which adaptation can be invoked. Discussion Tukwila-99 s allows for a partial plan to be chosen at this stage. However, Tukwila- 4 requires that the initial plan must be complete. When statistics are missing or incomplete, the plan pre-optimizer needs to make a guess for the missing value. Under such circumstances, the plan pre-optimizer is at best heuristics-based. 3.2 Plan Monitoring When a query plan has been chosen by the optimizer, adaptive query processing can be taken advantage of only if plans are monitored during the process of query execution. Information should be gathered at this stage to guide adaptation. Among the three systems, Tukwila-99 and Tukwila-4 discuss plan monitoring. We describe their approaches below. Tukwila-99 Tukwila-99 monitors events in response to important changes in the execution state, such as open/close (e.g., starting or completing an operator), error (e.g., unable to contact source), timeout (e.g., data source has not responded in n msecs), out-of-memory (e.g., join has insufficient memory), or threshold (n tuples processed by an operator). The execution system monitors these events which might trigger adaptation. On the other hand, Tukwila-99 also monitors dynamic information in the system that may be compared to estimated values to check if they satisfy certain conditions. For example, it monitors state (the operator s current state), cardinality (the number of tuples produced so far), time (the wait time since the last tuple), and memory (the memory used so far). Tukwila-4 Tukwila-4 also monitors operator-level information to aid the runtime decisionmaking. Every operator maintains a counter to indicate how many tuples it has output. (Unlike [7], it is observed with no measurable performance penalty.) It also monitors information exposed by the state structures of stateful operators (such as join and aggregation). Such information includes keys, ordering, size, and cardinality. Finally, I/O delay and tuple availability delay (the wait time

14 since the last tuple) should be monitored to facilitate the re-scheduling of operators; operators (e.g., pipelined hash join) react to such delays in scheduling the work during idle cycles. Discussion Tukwila-99 and Tukwila-4 monitor similar run-time system information as well as operator-level statistics. Tukwila-4 observes that plan monitoring is sometimes expensive because continuous monitoring consumes CPU cycles without contributing to computing results. Therefore, it is important to monitor only the information that is necessary for adaptation and to lower the granularity of monitoring as much as possible. 3.3 Plan Analysis In this stage, the progress of the current plan is analyzed for adaptation. ased on the analysis, decisions can be made on when to adapt and how often to adapt. Since Cape-4 does not specifically discuss plan analysis, we introduce Tukwila-99 and Tukwila-4 s approaches to plan analysis below. Tukwila-99 Tukwila-99 s analysis on when to adapt is based on rules generated by the optimizer. Usually these rules include conditions in which the monitored state deviates from the expected state, e.g., monitored cardinality is twice as large as the estimated cardinality; or unexpected run-time behavior occurs, e.g., the wait time for a tuple exceeds a threshold. On the other hand, Tukwila-99 decides how often to adapt based on milestones, e.g., an operator has processed n tuples. Note that Tukwila-99 can only allow re-optimization at the end of fragments; hence, only the conditions that are evaluated at the end of fragments can invoke the next re-optimization stage. Tukwila-4 In Tukwila-4, a global and cost-based evaluation of plan progress is performed by a low-priority background thread which re-optimizes the query. Therefore, its plan analysis is performed simultaneously with the next plan re-optimization stage, which we will discuss in Section 3.4. The decisions on when to adapt and how often to adapt are similar to Tukwila-99; that is, it is guided by rules generated by the optimizer. However, the main difference is that, here, adaptation can be invoked whenever the state is stable (e.g., for a pipelined hash join, whenever each tuple finishes probing, the state is stable), and not necessarily at the end of fragments. Discussion Tukwila-4 s plan analysis differs from Tukwila-99 s in that it is more flexible in determining when to invoke adaptation. Conditions for when to adapt and how often to adapt are generally guided by rules pre-specified in the query optimizer. On the other hand, the granularity of adaptation may largely depend on the granularity of plan analysis; hence, the specifications of milestones or pre-defined intervals have a large impact on how often to adapt. 3.4 Plan Re-optimization When the system confirms that the current plan is not functioning properly, a re-optimization process is invoked to find the next optimal plan. Among the three systems, Tukwila-99 and Tukwila- 4 discuss plan re-optimization. We describe their approaches below. 2

15 Tukwila-99 Tukwila-99 s adaptive techniques are mostly plan-partitioning based. That is, the re-optimization or re-scheduling of plans can only change the portion of a plan that has not yet been executed. Therefore, the re-optimizer in Tukwila-99 is limited in that the portion of the plan that has already executed data cannot ever be changed. It also requires the materialization of intermediate results before re-optimization, which adds overhead to the overall performance. For example, suppose the initial plan is as shown as Plan in Figure 3.2. After the execution of fragment A, an adaptation is invoked. Tukwila-99 can only adapt the remaining portion of the plan, e.g., change the join implementation of the operator that joins A with C. It can never adapt Plan to either Plan 2 or Plan 3 in Figure 3.2. In addition, Tukwila-99 requires that the intermediate state A must be materialized before adaptation, which is not necessary in this example. Tukwila-4 In Tukwila-4, data-partitioning adaptivity allows for different portions of data to be processed by different plans. In addition, in every phase of plan re-optimization, the optimizer refines its cost estimates with the most recent monitored state. For the same example as shown in Figure 3.2, suppose during the execution of Plan, the monitoring information suggests that the size of relation A is much larger than its expected size. Therefore, Tukwila-4 invokes its reoptimizer. However, the size of relation C is still possibly unknown at that point. In order to get a good cost estimate of all the candidate plans, the re-optimizer may still need heuristics. Tukwila- 4 proposes several heuristics that utilize the monitored run-time information. For example, the selectivity of a logical operator in the plan is shared by all logically equivalent subexpressions. This means that whatever physical join implementation algorithm is used, the selectivity of a logical operator monitored at run-time can be re-used. For another example, suppose the system wants to estimate the cardinality of the intermediate relation C in which the size of C is still unknown. Assume that the cardinality of A C and the cardinality of A is known during the execution. In this case, Tukwila-4 uses a heuristics which assumes that the join between A and C is a key-foreign key join and C is the foreign-key relation. Hence, the cardinality of C can be estimated as the same as the cardinality of A C. It may not be the case for other examples, but the optimizer gives a conservative estimate for unknown relations. Generally, Tukwila-4 is able to adapt to any candidate plan because of its support for data partitioning. Discussion Tukwila-99 s plan re-optimizer is limited in that it can only re-optimize the remaining portion of a plan at the end of a fragment. Tukwila-4 s plan re-optimizer is more general in that it may adapt to any candidate plan plus it is given the run-time monitored statistics. It is also worth noting that an optimal plan is possibly more expensive to migrate than a less optimal plan. To address this problem, Tukwila-4 (and Cape-4 as well) proposes a cost model to estimate the cost of migration and the cost of the new plan. The re-optimizer needs to take both costs into account when searching for an optimal plan. 3

16 Adaptation Point A C A C A C Old Plan: ( A ) C New Plan: A ( C) Source Data A C A A C C Figure 3.3: A motivating example of plan migration 3.5 Plan Migration Plan migration is the final stage in the loop of adaptive query processing. Note that only datapartitioning methods require this stage because at the adaptation point, not all data have been processed. Different portions of the data have to be processed by different phases of the plans. Plan migration is concerned with the mid-execution transition of the state from one query plan (old plan) to a semantically equivalent yet more efficient query plan (new plan). Each tuple from data sources can only be processed by a unique plan; hence, connecting and sharing the state across plans is extremely important, especially when query plans contain stateful operators such as joins. Approaches to plan migration need to address several important issues: How can we share the state among different plans? How can we ensure not losing tuples in the process? How can we avoid duplicated results that are generated by different plans? How can we take advantage of old plans when migrating to new plans? Let us go back to our example shown in Figure 3.2. Suppose the query pre-optimizer chooses an old plan, (A ) C, as the initial plan to start with, and suppose both join operators are implemented as pipelined hash joins. We denote A as the data of relation A, which are processed before adaptation by the old plan, and similarly for and C. Suppose now the system decides to adapt to a new plan A ( C) with both join operators implemented as pipelined hash joins. The data that has not been processed by the old plan are sent to the new plan for execution, and we denote these new data as A for the relation A (similarly for and C ). In this example, adaptation occurs once. Hence, the whole source data is a combination of the old data and the new data, i.e., A = A + A. For brevity, in the discussions below, we use a + to represent a union over two relations. We may omit the symbol in joining two relation, e.g., A represents A. Figure 3.3 shows the case in which this old plan is adapted to the new plan. From the figure, we can see that the old data, A,, C, are sent to the old plan, and the new data, A,, C 4

17 Tuples generated after adaptation: A ( C ) ( A A )( C C C ) C C C A C A C A A C C Old Plan New Plan Figure 3.4: Cape-4 s moving state strategy: status after state movement are sent to the new plan. Without any state sharing or state movement, the old plan outputs (A ) C and the new plan outputs A ( C ). However, combining these two results are not sufficient to compute the complete results of the query, which should be A C. The reason can be explained in the equation below. A C = (A + A ) ( + ) (C + C ) = A C + A C + (A C + A C + A C + A C + A C + A C ) (3.) Those delta items in the bracket are what the plan migration algorithms need to correctly compute. They are generally computed by joining the data across different plans. In this section, we use the above example to illustrate three different approaches to state migration: Tukwila-4 s stitch-up strategy, Cape-4 s moving state strategy, and Cape-4 s parallel track strategy. We demonstrate how these approaches ensure complete results and eliminate duplicates during the process of plan migrating. We also quantitatively analyze these three strategies with respect to different performance metrics and finally compare them against each other Cape-4 s moving state strategy Cape-4 proposes two strategies to solve the plan migration problem. The first one is called the moving state strategy. The basic idea is to move the appropriate operator state from the old plan to the new plan to facilitate joining old data with new data there. Figure 3.4 shows the status of both the old and new plan, when executing our example query as in Figure 3.3, after the old state as been moved to the new plan at appropriate operators. In this example, state A,, and C are moved from the old operator state to the new operator state inside new join operators. Note 5

18 that not all of the state is transferred, e.g., intermediate state A is not moved to the new plan because it is not useful there. Next, the processor of the new plan checks the intermediate state, e.g., C, which was not computed before but is necessary for computation and re-computes the state. After state matching and state re-computation has been performed, the new tuples can be sent to the new plan for processing. The operations of this strategy for the example is summarized below.. Move matched state A,, and C from the old plan to the new plan. 2. Recompute state C at the new plan. 3. New tuples from A,, and C are sent to the new plan. Each new tuple probes the current state on the other side of the join operator, and if there is a match, the joined result is output and is propagated to the next operator. Let us check whether this strategy ensures complete results and guarantees no duplicates. We strictly need to check whether all the terms in Equation 3. are generated by this strategy exactly once, and no extra tuples are generated. We omit a formal proof here, but give an intuitive explanation. In the new plan shown in Figure 3.4, the lower operator that joins tuples from relations and C finally produce a new result C + C + C. Suppose we represent this new result as C. Then the new results generated by the new plan in the end should be A ( C ) + (A + A )( C ), which is A ( C ) + (A + A )( C ) = A C + (A + A )( C + C + C ) = A C + A C + A C + A C + A C + A C + A C. (3.2) This is essentially equal to all the terms in Equation 3. excluding A C. This strategy computes all necessary tuples at the new plan Cape-4 s parallel track strategy Another strategy discussed in the Cape-4 system is called the parallel track strategy. The basic idea is to perform most computations at the old plan. This is enabled by sending new data to both the old plan and the new plan in parallel. Figure 3.5 shows the status of both the old and new plan at the adaptation point. This strategy performs the following operations for our example.. New tuples from A,, and C are sent to both the old plan and the new plan. 2. The following operations are executed in parallel. At both the old plan and the new plan, each new tuple probes the current state on the other side of the join operator. If there is a match, the joined output is propagated to the next operator. There is a difference in the join algorithm of the old plan. At the top join operator of the old box, where A and C gets joined, the output that joins merely new tuples such as A C should be excluded (as indicated in Figure 3.5). 6

19 ( )( ) Tuples generated after adaptation: A C C C A A A C A A C A A A A Exclude C A C A C A A C Old Plan New Plan Figure 3.5: Cape-4 s parallel track strategy: status at the adaptation point Here we discuss the reason why Cape-4 s parallel track strategy can ensure complete results and can eliminate duplications as well. We provide an intuitive explanation here. First, the tuples of A C are computed at the new plan in parallel when the old plan is executed; hence, those tuples must not be produced by the old plan due to duplication concerns. Second, at the old plan, as shown in Figure 3.5, the lower operator that joins tuples from relation A and will produce the new result A + A + A. Suppose we denote this new result as A. Then the new result generated by the old plan in the end should be (C + C )(A ) + C A excluding A C, that is, (C + C )(A ) + C A A C = (C + C )(A + A + A ) + C A A C = A C + A C + A C + A C + A C + A C. (3.3) This is equal to all the terms in Equation 3. excluding A C and A C. It ensures correct results, because after adaptation, the new plan computes A C, and before adaptation, the old plan had computed A C. Equation 3.3 also explains the reason why the topmost join in the old plan should exclude A C. In this strategy, most computations are performed at the old plan. Cape-4 also discusses cases of data streams. For static databases, the parallel execution process continues until all the tuples, A,, and C, have been processed by both plans. However, for data stream applications, two tuples that can be joined in an operator should not have timestamps apart larger than the window size W (as discussed in Section 2.3). For example, when a new tuple Exclusion or minus must be used carefully, e.g., the bigger expression must include the smaller expression, so that the bigger one can extract the smaller one from it 7

20 Tuples generated after adaptation: A C ( A A )( )( C C ) A C A C Exclude x x x A C A C A C A A C A C C C Exclude C Old plan New Plan Stitch-Up Plan Figure 3.6: Tukwila-4 s stitch-up strategy: status at the point when the new plan finishes execution from A has the time stamp larger than W plus the time stamp of the newest tuples in, then all the tuples in are no longer eligible to join with future tuples from A ; hence, can be purged. When all the old state A,, C and A are purged, the old plan stops execution because this is when the old plan can no longer generate any new results Tukwila-4 s stitch-up strategy In contrast to Cape-4 s two strategies, Tukwila-4 proposes a different strategy, the stitch-up strategy, to perform most of the computations at a new stitch-up plan. The stitch-up plan is shown as the stitch-up plan in Figure 3.6. Generally, the stitch-up plan selects the best plan available as a basis and generates a similar logical operator tree with some previous state re-used and incorporated (notice that the union operator in the figure facilitates incorporating state C without sacrificing equivalence). It also requires sharing state from the old plan as well as the new plan, so that all the terms in Equation 3. can be properly computed. For our example, this strategy works as follows.. Generate a stitch-up plan with the best available plan as a basis, and modify the plan in order to re-use the previous state. 2. Perform computations of new tuples from A,, and C at the new plan Move the state A,, and C from the old plan to the stitch-up plan. 4. Move the state A,, C, and C from the new plan to the stitch-up plan. 2 The stitch-up plan is usually performed either in parallel to executing the new plan (in which new tuples must be sent to both plans) or after completing the new plan (in which new tuples can only be sent to the new plan), where the former is required for streaming applications and the latter is good for static applications. In the following discussions, we use the latter case. 8

21 5. Perform computations of moved state at the stitch-up plan. Note that the lower join should exclude C from the output, and the higher join should exclude A x x C x from the output where x is either all or all. Here we briefly discuss why Tukwila-4 s stitch-up strategy can ensure complete results without generating duplicates. In this approach, the new plan generates A C as output results. The stitch-up plan, on the other hand, computes the complete tuples of (A + A )( + )(C + C ) except for A C and A C. Hence, the combined results of the new plan and the stitch-up plan ensures that complete and unique terms are computed. Tukwila-4 also discusses a number of heuristics to improve the computation in the stitch-up plan. First, every join operator maintains an exclusion list specifying which patterns are to be excluded. Second, the exclusion can be done at the structure-level, rather than at the tuple level. For example, the join operator that excludes the pattern C prevents the entire state from probing the C state. Third, the intermediate state that is pre-computed by the other plans can be shared, e.g., there is no need to compute C at the stitch-up plan. The more the state can be shared, the lower the computation cost Analysis ased on the three strategies discussed above, we analytically compare their performance executing our example query in Table 3.2. We compare the performance of different strategies in each of the three plans; the old plan, the new plan, and the stitch-up plan, in terms of communication (number of tuples sent or received from the other plans), computation (number of tuples probed in joins), and output cardinality (number of tuples output from the plan). We denote A as the number of tuples in relation A, and we use A to represent A for brevity; hence, A is not necessarily equal to A. As an example in estimating the number of tuples probed in joins, joining relations A and requires A probes. It can then be inferred that in estimating the computation cost of the new plan in the parallel track s strategy, joining and C requires C probes, and joining A with C requires A C probes; hence, in total it requires C + A C probes. Most of the numbers in Table 3.2 can be inferred by such computations. It is worth noting that in the table, (A,, C) = A C + A C + A C + A C + A C + A C. ased on this quantitative analysis shown in Table 3.2, we summarize our observations in Table 3.3. We briefly list the reasons for those observations. Note that we use MS, PT, and SU to represent moving state strategy, parallel track strategy, and stitch-up strategy respectively. Communication Cost MS only migrates the old state from the old plan to the new plan. If the old state is small, then this is the best strategy available, since it avoids extra sending and receiving of the new state. PT receives the new state at the old plan; hence, it places more bandwidth burden on the old plan. SU receives both the old state and the new state at the stitch-up plan; hence, it not only places more bandwidth burden on the stitch-up plan, but also requires more bandwidth at the new plan if the new state is large. Computation Cost Note that each strategy receives A + + C tuples at the new plan by default; hence, we do not list them in the table. MS performs the least computation in terms of the 9

22 Strategy Metric Old Plan New Plan Stitch-up Plan moving state Comm. Send A + + C tuples; Receive A + + C tuples; N/A Comp. N/A Probe A C N/A A C + C tuples; Output N/A Output (A,, C) + N/A A C tuples; parallel track Comm. Receive A + + C tuples; N/A N/A Comp. Probe A A + Probe C + N/A A C A C tuples; A C tuples; Output Output (A,, C) tuples; Output A C tuples; N/A stitch-up Comm. Send A + + C tu- Send C tuples; Receive A + + ples; Comp. N/A Probe C + A C tuples; C + C tuples; Probe C + A C C tuples; Output N/A Output A C tuples; Output (A,, C) tuples; Table 3.2: Quantitative analysis of three strategies to plan migration for the example strategy moving state parallel track burden on the old plan Stitch Up burden on the new plan and the stitch-up plan CommunicationComputation cost Cost migrating best w.r.t. old state the old plan best w.r.t. the new plan best w.r.t the old and the new plans Output Cardinality all at the new plan delta tuples at the old plan, and new tuples at the new plan new tuples at the new plan, and delta tuples at the stitch-up plan Steady Output the new plan waits for state transfer to finish steady output from the old plan steady output from the new plan, but the stitch-up plan waits for the new plan to finish Multiple Adaptations most computations done at the newer plan most computations done at the older plan most computations done at the final stitch-up plan Table 3.3: A comparison of three strategies based on the analysis in Table 3.2 2

23 old plan, PT (or SU) performs the least computation in terms of the new plan, and SU performs the least computation in terms of the combination of the old and the new plans. This analysis can help a system determine which strategy to choose under different conditions. For example, if there is not sufficient computing power to support with the old plan, it is good to choose MS or SU over PT. Output Cardinality All strategies output the same size of tuples concerning the combination of all plans. However, MS outputs all tuples at the new plan; PT outputs (A,, C) at the old plan and A C at the new plan; and SU outputs A C at the new plan and (A,, C) at the stitch-up plan. Steady Output Here we examine whether for data stream applications, these strategies can output results steadily around the adaptation point, as output steadiness might affect user satisfaction. In MS, the new plan can only perform computations when the old state has been transferred; hence, there might be a period of silence. In PT, the old plan and the new plan can immediately output results because of parallel execution. In SU, the new plan can immediately output results; however, the stitch-up plan needs to wait for the new plan to finish to fetch the state; hence, some tuple results may not be computed steadily. Multiple Adaptations It is quite possible that the adaptive process is performed in multiple phases. Here we examine the behavior of each strategy when there are multiple adaptation phases. MS always migrates the state from the old plan to the new plan; hence, it always performs most computations at the newest plan of them all. PT, on the other hand, always performs most computations at the oldest plan of them all. SU, only performs computations at a final stitch-up plan, as a matter of fact. If there are multiple adaptations, only one stitch-up plan is used to mix different phases of state. This stitch-up plan is based on the operator tree of the best plan available, depending on how much state can be re-used to facilitate computation. 2

24 Chapter 4 Evaluations In this chapter, we review two empirical studies based on the the three systems we have discussed. In the first set of studies, we examine the differences between adaptive query processing and static query processing under similar workloads and configurations. The second set of studies focuses on the performance comparison among different plan migration techniques given specific parameters. Since different experiments use different workloads and different system configurations to evaluate their performance, we excerpt a few representative studies and analyze each study individually. We finally conclude that the observations suggested in these experimental studies validate our analysis in Chapter Adaptive Query Processing vs. Static Query Processing In this section, we present two figures that show the benefits of adaptive query processing over static query processing in Tukwila-99 (Figure 4.) and Tukwila-4 (Figure 4.2), respectively. Tukwila-99 The experiment of Tukwila-99 is performed on a scaled version of the TPC-D M dataset, and seven queries are computed over four base tables of the dataset with the exception of the Lineitem table. The optimizer is given correct source cardinalities, but no histograms are available; hence, the optimizer has to compute its intermediate result cardinalities based on estimations of the join selectivities. All the joins are implemented as pipelined hash joins. Figure 4. shows the benefits of adaptive query processing over static query processing on the execution time. The pipelined strategy executes the query statically. The materialized strategy simply materializes the output at each join. This is even worse than the pipelined strategy in many cases. The materialized and re-planned strategy materializes the intermediate results and re-plans at the end of each fragment whenever the cardinality of the actual value differs from its estimate by at least a factor of two. Among these three strategies, only the last strategy is an adaptive query processing strategy. From the figure, we can see that the materialized and re-planned strategy is the fastest in executing all the chosen plans, with a total speed-up of.42 over the pipelined strategy and.69 over the materialize strategy []. This is possibly because most join operations in the figure are 22

25 Figure 4.: Comparison of static pipelined, materialized, and materialized plus replanned strategy [] Figure 4.2: Comparison of static optimization, adaptive query processing, and plan partitioning [2] Figure 4.3: Comparison of Cape-4 s two migration strategies: Migration Time w.r.t. window size W [2] Figure 4.4: Cape-4 s output rate over time given insufficient processing power [2] given insufficient memory, and poor selectivity estimates require them to overflow. Tukwila-4 The experiment of Tukwila-4 is performed on both a uniform (TPC-H) dataset and a skewed dataset (TPC-D), and the queries are mostly TPC-H queries with slight variations. There are four queries selected for computation, which are 3A (removing the date-based selection predicates from the standard TPC-H query 3), (standard), A (removing the date-based selection predicates from the standard TPC-H query ), and 5 (standard). This setup generates a workload with several levels of optimization complexity: a join of 3 relations (query 3A), two joins of 4 relations (queries and A), and a join of 5 relations (query 5). The system is configured to run all the experiments completely in memory with initially 2 M buffer size and more if needed. This setup isolates the computation costs from the disk I/O costs. It reduces the performance penalty which is caused by inaccurate estimates. In Figure 4.2, three approaches are compared. They are the static query processing approach, 23

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

An Adaptive Query Execution Engine for Data Integration

An Adaptive Query Execution Engine for Data Integration An Adaptive Query Execution Engine for Data Integration Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington Presented by Peng Li@CS.UBC 1 Outline The Background

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical

More information

Incremental Evaluation of Sliding-Window Queries over Data Streams

Incremental Evaluation of Sliding-Window Queries over Data Streams Incremental Evaluation of Sliding-Window Queries over Data Streams Thanaa M. Ghanem 1 Moustafa A. Hammad 2 Mohamed F. Mokbel 3 Walid G. Aref 1 Ahmed K. Elmagarmid 1 1 Department of Computer Science, Purdue

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch Advanced Databases Lecture 4 - Query Optimization Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

WHITE PAPER Application Performance Management. The Case for Adaptive Instrumentation in J2EE Environments

WHITE PAPER Application Performance Management. The Case for Adaptive Instrumentation in J2EE Environments WHITE PAPER Application Performance Management The Case for Adaptive Instrumentation in J2EE Environments Why Adaptive Instrumentation?... 3 Discovering Performance Problems... 3 The adaptive approach...

More information

Overview of Implementing Relational Operators and Query Evaluation

Overview of Implementing Relational Operators and Query Evaluation Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

WHITE PAPER: ENTERPRISE AVAILABILITY. Introduction to Adaptive Instrumentation with Symantec Indepth for J2EE Application Performance Management

WHITE PAPER: ENTERPRISE AVAILABILITY. Introduction to Adaptive Instrumentation with Symantec Indepth for J2EE Application Performance Management WHITE PAPER: ENTERPRISE AVAILABILITY Introduction to Adaptive Instrumentation with Symantec Indepth for J2EE Application Performance Management White Paper: Enterprise Availability Introduction to Adaptive

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Black-box Testing Techniques

Black-box Testing Techniques T-76.5613 Software Testing and Quality Assurance Lecture 4, 20.9.2006 Black-box Testing Techniques SoberIT Black-box test case design techniques Basic techniques Equivalence partitioning Boundary value

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Network-Adaptive Video Coding and Transmission

Network-Adaptive Video Coding and Transmission Header for SPIE use Network-Adaptive Video Coding and Transmission Kay Sripanidkulchai and Tsuhan Chen Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

Principles of Data Management. Lecture #9 (Query Processing Overview)

Principles of Data Management. Lecture #9 (Query Processing Overview) Principles of Data Management Lecture #9 (Query Processing Overview) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s Notable News v Midterm

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

Scheduling Strategies for Processing Continuous Queries Over Streams

Scheduling Strategies for Processing Continuous Queries Over Streams Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Scheduling Strategies for Processing Continuous Queries Over Streams Qingchun Jiang, Sharma Chakravarthy

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Framework for replica selection in fault-tolerant distributed systems

Framework for replica selection in fault-tolerant distributed systems Framework for replica selection in fault-tolerant distributed systems Daniel Popescu Computer Science Department University of Southern California Los Angeles, CA 90089-0781 {dpopescu}@usc.edu Abstract.

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

data parallelism Chris Olston Yahoo! Research

data parallelism Chris Olston Yahoo! Research data parallelism Chris Olston Yahoo! Research set-oriented computation data management operations tend to be set-oriented, e.g.: apply f() to each member of a set compute intersection of two sets easy

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Optimization of Queries with User-Defined Predicates

Optimization of Queries with User-Defined Predicates Optimization of Queries with User-Defined Predicates SURAJIT CHAUDHURI Microsoft Research and KYUSEOK SHIM Bell Laboratories Relational databases provide the ability to store user-defined functions and

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

Chapter 8 Memory Management

Chapter 8 Memory Management 1 Chapter 8 Memory Management The technique we will describe are: 1. Single continuous memory management 2. Partitioned memory management 3. Relocatable partitioned memory management 4. Paged memory management

More information

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0.

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0. IBM Optim Performance Manager Extended Edition V4.1.0.1 Best Practices Deploying Optim Performance Manager in large scale environments Ute Baumbach (bmb@de.ibm.com) Optim Performance Manager Development

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

TotalCost = 3 (1, , 000) = 6, 000

TotalCost = 3 (1, , 000) = 6, 000 156 Chapter 12 HASH JOIN: Now both relations are the same size, so we can treat either one as the smaller relation. With 15 buffer pages the first scan of S splits it into 14 buckets, each containing about

More information

Review. Support for data retrieval at the physical level:

Review. Support for data retrieval at the physical level: Query Processing Review Support for data retrieval at the physical level: Indices: data structures to help with some query evaluation: SELECTION queries (ssn = 123) RANGE queries (100

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols Preprint Incompatibility Dimensions and Integration of Atomic Protocols, Yousef J. Al-Houmaily, International Arab Journal of Information Technology, Vol. 5, No. 4, pp. 381-392, October 2008. Incompatibility

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Traditional Query Optimization

Traditional Query Optimization Chapter 2 Traditional Query Optimization This chapter sets the stage for the work covered in the rest of the thesis. Section 2.1 gives a brief overview of the important concerns and prior work in traditional

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University Query Optimization Shuigeng Zhou December 9, 2009 School of Computer Science Fudan University Outline Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance Data Warehousing > Tools & Utilities Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance By: Rod Vandervort, Jeff Shelton, and Louis Burger Table of Contents

More information

Thwarting Traceback Attack on Freenet

Thwarting Traceback Attack on Freenet Thwarting Traceback Attack on Freenet Guanyu Tian, Zhenhai Duan Florida State University {tian, duan}@cs.fsu.edu Todd Baumeister, Yingfei Dong University of Hawaii {baumeist, yingfei}@hawaii.edu Abstract

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Query processing and optimization

Query processing and optimization Query processing and optimization These slides are a modified version of the slides of the book Database System Concepts (Chapter 13 and 14), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

6. Results. This section describes the performance that was achieved using the RAMA file system.

6. Results. This section describes the performance that was achieved using the RAMA file system. 6. Results This section describes the performance that was achieved using the RAMA file system. The resulting numbers represent actual file data bytes transferred to/from server disks per second, excluding

More information

CS330. Query Processing

CS330. Query Processing CS330 Query Processing 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull interface: when an operator is `pulled for

More information

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon The data warehouse environment - like all other computer environments - requires hardware resources. Given the volume of data and the type of processing

More information

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC

CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein. Student ID: UCSC CMPS 181, Database Systems II, Final Exam, Spring 2016 Instructor: Shel Finkelstein Student Name: Student ID: UCSC Email: Final Points: Part Max Points Points I 15 II 29 III 31 IV 19 V 16 Total 110 Closed

More information

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing CS 4604: Introduction to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing Outline introduction selection projection join set & aggregate operations Prakash 2018 VT CS 4604 2

More information

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,

More information

SQL Tuning Reading Recent Data Fast

SQL Tuning Reading Recent Data Fast SQL Tuning Reading Recent Data Fast Dan Tow singingsql.com Introduction Time is the key to SQL tuning, in two respects: Query execution time is the key measure of a tuned query, the only measure that matters

More information

CSE 544, Winter 2009, Final Examination 11 March 2009

CSE 544, Winter 2009, Final Examination 11 March 2009 CSE 544, Winter 2009, Final Examination 11 March 2009 Rules: Open books and open notes. No laptops or other mobile devices. Calculators allowed. Please write clearly. Relax! You are here to learn. Question

More information

Query Processing Strategies and Optimization

Query Processing Strategies and Optimization Query Processing Strategies and Optimization CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/25/12 Agenda Check-in Design Project Presentations Query Processing Programming Project

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Maintaining Mutual Consistency for Cached Web Objects

Maintaining Mutual Consistency for Cached Web Objects Maintaining Mutual Consistency for Cached Web Objects Bhuvan Urgaonkar, Anoop George Ninan, Mohammad Salimullah Raunak Prashant Shenoy and Krithi Ramamritham Department of Computer Science, University

More information

Mining Frequent Itemsets in Time-Varying Data Streams

Mining Frequent Itemsets in Time-Varying Data Streams Mining Frequent Itemsets in Time-Varying Data Streams Abstract A transactional data stream is an unbounded sequence of transactions continuously generated, usually at a high rate. Mining frequent itemsets

More information

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection

More information

CSE 344 MAY 7 TH EXAM REVIEW

CSE 344 MAY 7 TH EXAM REVIEW CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice

More information

Module 9: Selectivity Estimation

Module 9: Selectivity Estimation Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock

More information

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000

192 Chapter 14. TotalCost=3 (1, , 000) = 6, 000 192 Chapter 14 5. SORT-MERGE: With 52 buffer pages we have B> M so we can use the mergeon-the-fly refinement which costs 3 (M + N). TotalCost=3 (1, 000 + 1, 000) = 6, 000 HASH JOIN: Now both relations

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Mid-Query Re-Optimization

Mid-Query Re-Optimization Mid-Query Re-Optimization Navin Kabra David J. DeWitt Ivan Terziev University of Pennsylvania Query Optimization Ideally an optimal plan should be found. However, this is not the case especially for complex

More information

Verification and Validation of X-Sim: A Trace-Based Simulator

Verification and Validation of X-Sim: A Trace-Based Simulator http://www.cse.wustl.edu/~jain/cse567-06/ftp/xsim/index.html 1 of 11 Verification and Validation of X-Sim: A Trace-Based Simulator Saurabh Gayen, sg3@wustl.edu Abstract X-Sim is a trace-based simulator

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in four lectures. In case you

More information

An Improved Priority Ceiling Protocol to Reduce Context Switches in Task Synchronization 1

An Improved Priority Ceiling Protocol to Reduce Context Switches in Task Synchronization 1 An Improved Priority Ceiling Protocol to Reduce Context Switches in Task Synchronization 1 Albert M.K. Cheng and Fan Jiang Computer Science Department University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu

More information

Overview of Query Evaluation

Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

A Capacity Planning Methodology for Distributed E-Commerce Applications

A Capacity Planning Methodology for Distributed E-Commerce Applications A Capacity Planning Methodology for Distributed E-Commerce Applications I. Introduction Most of today s e-commerce environments are based on distributed, multi-tiered, component-based architectures. The

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability

More information

Query Optimization to Meet Performance Guarantees for Wide Area Applications

Query Optimization to Meet Performance Guarantees for Wide Area Applications Query Optimization to Meet Performance Guarantees for Wide Area Applications Vladimir Zadorozhny Louiqa Raschid University of Maryland College Park, MD 2742 Abstract Recent technology advances have enabled

More information