AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania.

Size: px

Start display at page:

Download "AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS. Mengmeng Liu. Computer and Information Science. University of Pennsylvania."

Annabelle Blankenship
5 years ago
Views:

1 AN OVERVIEW OF ADAPTIVE QUERY PROCESSING SYSTEMS Mengmeng Liu Computer and Information Science University of Pennsylvania WPE-II exam Janurary 28, 2 ASTRACT Traditional database query processors separate query optimization from query execution: query plans are chosen by query optimizers and sent to execution engines for processing, until all query results are completely computed. However, new applications such as data streams, large-scale and data integration applications, require query processors to adapt to unpredictable data characteristics and dynamic environments. Query optimization and query execution need to be interleaved to constantly find new plans and replace obsolete plans, as new information is discovered about the data being processed. The main challenges of adaptive query processing lie in ensuring correct results, eliminating duplicates and maintaining good performance. In this report we survey three different systems on adaptive query processing: Tukwila-99 [], Tukwila-4 [2], and Cape-4 [2]. We then divide adaptive query processing into five stages: plan pre-optimization, plan monitoring, plan analysis, plan re-optimization and plan migration. In each of these stages, different approaches are examined, analyzed, and compared when applicable. Empirical evaluations of different systems are also discussed in this report.

2 Contents Introduction 2 Overview of Adaptive Query Processing Systems 3 2. Tukwila-99: Plan-partitioning Adaptivity for Data Integration Tukwila-4: Data-partitioning Adaptivity for Data Integration Cape-4: Data-partitioning Adaptivity for Data Streams Adaptive Query Processing in Five Stages 9 3. Plan Pre-optimization Plan Monitoring Plan Analysis Plan Re-optimization Plan Migration Cape-4 s moving state strategy Cape-4 s parallel track strategy Tukwila-4 s stitch-up strategy Analysis Evaluations Adaptive Query Processing vs. Static Query Processing Comparison of Plan Migration Strategies Summary Conclusion 27 ii

3 Chapter Introduction Traditional database query processors decouple query optimization from query execution. Query optimizers compute optimal plans, and send them to query engines for execution until all query results are completely computed. Commercial database systems, such as Oracle, D2, and Microsoft SQL Server, still use this model today. However, such a model is not applicable to new database applications, such as data streams, federated systems, large-scale distributed applications, and data integration systems. Hence, a new query processing model, adaptive query processing, has been proposed and studied [3, 2,, 2, 2, 4] over the last decade. In adaptive query processing, query optimizers are interleaved with query executors to constantly find new plans and to replace obsolete plans, as new properties of the data that are available to the system are discovered. Two of the main reasons that adaptive query processing is needed in these new applications are listed below. Missing or inaccurate statistics In federated and data integration applications, statistics on source data are often missing or inaccurate. This is mainly because data sources are heterogeneous, remote, sometimes unreliable, and connected by wrappers which may have little knowledge about data statistics. It is known that query optimizers rely on data statistics (e.g., cardinality, value distributions) to estimate plan costs in choosing optimal plans. Hence, if data statistics are missing or inaccurate, query optimizers may mistakenly estimate plan costs, resulting in choosing inferior plans for execution. Therefore, query optimizers have to use heuristics to find optimal plans, decreasing the likelihood of the selected plan being the optimal plan. With adaptive query processing, data statistics may be better estimated after queries are executed; in consequence, a better plan may be chosen to replace the old plan, improving the overall performance of executing that query. In data stream applications, data may arrive after query processors start execution. Under such circumstances, traditional query processors may not work. In this highly dynamic environment, data characteristics fluctuate and data arrival rates are unknown; hence, it is hard to make good estimations on plan costs. Adaptive query processors allow systems to incorporate run-time information to assist finding good plans after a query is executed. Even in standard static databases, traditional query optimizers may sometimes make bad estimations. Indeed, query optimizers estimate the costs of query plans using a cost model that relies

4 heavily on the cardinality information, which in turn depends on estimates of selectivities of the predicates in the query. Most query optimizers estimate selectivities of individual predicates by summaries of data distributions, such as histograms. However, selectivities of the conjunction of predicates sometimes cannot be correctly estimated because of correlations between these conjunctive predicates. In fact, in worst case, estimates of conjunctive predicates may have exponential growth in errors [9]. Standard query optimizers [7, 6, 6] typically make the assumption that the attributes of any relation are independent. This is problematic if a query has a conjunction of predicates over correlated attributes. For example, it is quite likely that people s ages and salaries are correlated. Assume the percentage of people who are older than 3 and of the people who earn more than 3 dollars a year are both 5 percent, and the percentage of people who satisfy both these conditions is 5 percent. A query like SELECT count(*) FROM P WHERE P.age 3 AND P.salary 3 may cause the query optimizer to make wrong estimations on the cardinality of the output, if it assumes that these two predicates are independent. It would estimate the output cardinality as 25 percent of the size of the whole source table which is in fact 5 percent. Under such circumstances, adaptive query processing may be helpful. If a query optimizer chooses a poor plan, the statistics and the correlations can be monitored by the adaptive query processor; hence, cost estimations can be constantly updated to help finding the correct plan. Unpredictable system behavior Another big challenge in new applications is unpredictable system behaviors. In large-scale and shared-nothing parallel systems, unpredictable behaviors may be prevalent, such as machine failures, communication errors, or other system faults. Adaptive query processing has the benefit of changing to a feasible plan solution when an unpredictable system behavior occurs, in order to ensure the availability of query processing. Adaptive query processors also can monitor the system state trend at run-time to predict system behaviors before system faults occur. This can help systems to make early adaptations, avoiding losing query results. In other applications such as data streams, system resources may be exhausted in the middle of execution. For instance, the system may run out of memory, or the network may be clogged by messages. Adaptive query processing can make timely changes to alleviate these problems and choose future plans to avoid such problems from occurring again. Adaptive query processing has been extensively studied from the 9 s. Initial efforts include query re-optimization at materialized points [3], CPU scheduling-based adaptation [8, 8] and redundantly-computed adapting methods []. Recently there has been work in addressing the moving state problem [2, 2, 4]. Apart from these plan-based methods, a tuple-based routing scheme, Eddy [2, 5], has been proposed to re-route tuples to plans in highly dynamic environments. There are also a few surveys that summarize various aspects of adaptive query processing [5, 3, 4]. In this paper, we examine three different adaptive query processing systems: Tukwila-99 [], Tukwila-4 [2], and Cape-4 [2]. In Chapter 2 we briefly describe each system. In Chapter 3, we divide adaptive query processing into five consecutive stages: plan pre-optimization, plan monitoring, plan analysis, plan re-optimization and plan migration. We study different approaches of the systems in each of these five stages, and compare them when applicable. We examine experimental evaluations of these systems in Chapter 4, and conclude in Chapter 5. 2

5 Chapter 2 Overview of Adaptive Query Processing Systems In this chapter, we introduce three adaptive query processing systems: Tukwila-99 [], Tukwila- 4 [2], and Cape-4 [2]. For each system, we briefly discuss the motivating applications and challenges, the overall system architecture, and the main proposed techniques. 2. Tukwila-99: Plan-partitioning Adaptivity for Data Integration Motivation Traditional query processing models may not be applicable to most data integration applications for several reasons: the absence of statistics, unpredictable data arrival rates, and redundancy among sources. We discussed the first two reasons in the previous chapter. The third one, redundancy among sources, may cause the query processor to waste time in processing duplicate tuples from duplicate sources. The Tukwila-99 [] system aims to address these challenges by adopting an adaptive query processing model, and by incorporating run-time information into the decision engine. Architecture Figure 2. features the high-level system architecture of the Tukwila-99 [] system. The query optimizer and the execution engine are interleaved to answer queries. Users pose queries to the system, and they go through a Query Reformulation phase to be reformulated into queries over data sources. This is because user queries may be formulated over virtual mediated schemas, and they need to be reformulated over source schemas. The job of the query optimizer is to transform the reformulated query into a physical query execution plan for the execution engine. The optimizer in Tukwila-99 is able to create partial plans if data statistics are incomplete, and it produces rules to define adaptive behavior. The execution engine processes the query plans that are produced by the optimizer. It includes an event handler for dynamically interpreting rules generated by the optimizer and supports a few re-planning techniques when adaptation is triggered. Finally, the query execution engine communicates with data sources through a set of 3

Figure 2.: System Architecture of Tukwila-99 ([]) wrappers. Wrappers are responsible for transforming the data from the format used in sources to the format used in the Tukwila-99 system.

6 Figure 2.: System Architecture of Tukwila-99 ([]) wrappers. Wrappers are responsible for transforming the data from the format used in sources to the format used in the Tukwila-99 system. Main techniques Tukwila-99 s adaptive techniques are mostly plan-partitioning based. That is, the re-optimization or re-scheduling of plans can only occur at the end of fragments, e.g., the end of pipelined units or blocking operators, where all the source data must be processed together. The intermediate results, which are output by the fragments, must be materialized before being processed by the re-optimized portion of the plan. The optimizer must decide how many fragments to complete, balancing the potential performance penalty of materialization versus the potential benefit of being able to adapt the plan if the original plan is poor. Since there is very little information to go by, this is by necessity heuristics-driven. We will later discuss the other two systems s data-partitioning techniques, in which different portions of data can be processed by different plans in parallel, which are not necessarily materialized at intermediate points. The Tukwila-99 system exploits a rule-based framework that is built into the core of the adaptation decision engine. Generally, the rules in Tukwila-99 are produced by the query optimizer and interpreted and executed by the query execution engine. These rules have the form WHEN event IF conditions THEN actions. They usually specify when and how to modify the run-time behavior of certain operators and which conditions to check in order to detect opportunities for re-optimization. For example, a rule can be written as follows. WHEN closed(join) IF monitored-card(join) > 2 * estimated-card(join) THEN reoptimize This rule means that when a join operator finishes execution, if the run-time monitored cardinality of this join operator is twice as large as the estimated cardinality of the join operator, then the system chooses to trigger the re-optimization procedure at this join operator. However, this join operator must be the end of a fragment. This rule may help the system to alleviate the bad effects of inaccurate cardinality estimations. In Tukwila-99, these rules are written in procedural languages such as C/C++ to facilitate manipulation and event handling, although they resemble 4

7 active rules in deductive databases [9]. Examples of events include opening or closing an operator, failing to connect to sources, running out of memory, and having processed n tuples by an operator. Once an event has triggered a set of rules, the conditions of each rule are evaluated in parallel. A condition can be a comparison of a monitored state, an estimated state or a threshold. After all conditions for a given event have been evaluated, actions are executed. Tukwila-99 s actions consist of setting the overflow method of a pipelined join, deactivating an operator, rescheduling the remaining operator tree, and re-optimizing the remaining plan. As discussed before, these actions are all plan-partitioning methods. There are a few restrictions on this rule-based system to avoid common mistakes. For example, all of a rule s actions must be executed before another event is processed, and two rules that might affect each other can not be executed in parallel. Tukwila-99 also features two novel adaptive operators that can be invoked when certain conditions hold. One of them is a double pipelined hash join operator with memory overflow mechanisms. It is symmetric and incremental, which has the benefit of avoiding blocking. It fetches an input, probes the input against the hash table on the other side, and outputs the result immediately when it matches. The only trade-off over a non-pipelining hash join is that it has to buffer the state, e.g., the hash tables of both joined relations. If the system runs out of memory, then a pipelined hash join operator with memory overflow mechanisms can flush portions of its state to a disk. The other adaptive operator introduced in Tukwila-99 is the dynamic collect operator. In essence, it is a union operator except that it fetches only the necessary data sources; as data sources may be redundant. Furthermore, if at run-time sources become slow or unavailable, the collect operator can adapt to a new back-up source. Conclusion Tukwila-99 introduces a number of plan-partitioning adaptive mechanisms. Adaptivity is designed into the core to facilitate interleaving of query optimization and query execution. Tukwila-99 proposes new query operators, such as the double pipelined join operator and the dynamic collect operator, to respond to insufficient memory and redundant source data. It also provides a rule-based platform that can incorporate different adaptive mechanisms. 2.2 Tukwila-4: Data-partitioning Adaptivity for Data Integration Motivation Tukwila-99 proposes a nice framework for adaptive query processing, whereas Tukwila- 4 extends this framework to allow for data-partitioning adaptivity. Data-partitioning adaptivity can route different parts of the data to different plans, without materializing intermediate results or requiring adaptive activity to be limited to fragment points. However, data-partitioning introduces several new challenges. For instance, the state of operators in the old plan may be required in the new adapted plan; hence, how to identify the common state and maintain effective book-keeping information to avoid re-creation of the entire state is important and challenging. On the other hand, merely collecting query results from the old plan and the new plan may not be sufficient. The query results computed by joining data from different plans must be included in an efficient and effective way. In summary, data-partitioning adaptivity can be formalized according to the rules of the relational algebra. Plan migration is a means of accomplishing that in a way that is 5

8 Figure 2.2: System Architecture of Tukwila-4 ([]) guaranteed to result in correct answers. Architecture The architecture of the Tukwila-4 system is shown in Figure 2.2. This architectural figure focuses on the plan re-optimization and plan migration components. The query reformulation and wrapper interfaces are similar to those of the Tukwila-99 system, but they are not shown in this figure. Here, the query optimizer and the query execution engine are again interleaved for adaptation purposes. However, some components in the query optimizer and the execution engine are more refined compared to those in Tukwila-99 (shown in Figure 2.). First, there is a separate thread that monitors the run-time statistics and consequently updating the statistics in the cost estimator of the query optimizer as well as the global statistics. This thread maintains statistics collected at all phases. Second, the query re-optimization procedure triggered by the query execution engine is executed by the query optimizer, which chooses a new query plan. Finally, and probably more importantly, Tukwila-4 allows for data-partitioning adaptivity, where different portions of the data can be sent to different phrases of the plan. The adaptive process in Tukwila-4 continues until all the source data are completely processed. Main techniques The major contribution of the Tukwila-4 system is its plan migration techniques in computing the stitch-up plan that complements the old plan and the new plan in generating complete query results. We will describe the approaches to generating and computing stitch-up plans in detail later in Section 3.5. In addition to plan migration, Tukwila-4 s platform also allows for many other mechanisms: re-using the state from previous plan phases (avoiding the need to recompute certain query subexpressions), monitoring the process of execution, and re-estimating plan costs. We will discuss the monitoring process and query re-optimization in Section 3.2 and Section 3.4 respectively. Here we briefly discuss Tukwila-4 s state sharing techniques. The internal state (e.g., hash tables) of stateful operators (e.g., join, aggregate) can be shared for equivalent subexpressions. In Tukwila-4, state structures (e.g., sorted lists, hash tables) and iterator modules (e.g., build-then-probe, mergedriven) are decoupled to allow sharing state structures across operators in different plans. For example, an adaptation from a pipelined hash join to a nested-loop join can benefit from this decoupling. Another important issue in state sharing is that the operator state with the same subexpression 6

9 but with different hashed structures cannot be shared. For example, in (A ) C, relation may join relation A on attribute x. However, in a logically equivalent expression A ( C), relation may join relation C on attribute y where y x. Hence the state of relation must be re-hashed to be shared with the other plan. Conclusion Tukwila-4 extends Tukwila-99 s platform to allow for data-partitioning adaptivity. It assumes that query plans can be continuously adapted at mid-execution and that source data are partitioned to these different phases of the plans. This data-partitioning assumption imposes new plan migration challenges, and complete non-redundant query results have to be computed efficiently during the process of plan adaptation. Tukwila-4 addresses this problem by computing stitch-up plans and discusses state sharing techniques to facilitate this computation. Tukwila- 4 emphasizes plan re-optimization and plan migration techniques that complement Tukwila-99 s techniques, and they can be incorporated into the same Tukwila framework. 2.3 Cape-4: Data-partitioning Adaptivity for Data Streams Motivation Cape-4 [2] aims to address a similar plan migration problem as Tukwila-4 does. However, Cape-4 targets data stream applications instead of data integration applications. Coincidently, these two papers were presented at the same conference; hence, they can be regarded as independent. In data stream applications, data may arrive after query execution, and queries may exist forever since there is no bound on the size of the data. Therefore, this situation poses some new challenges to the plan migration problem. First, a windowing model should be defined to bound the life span of input tuples in operators, since input data may be unbounded in length. Second, correct query results should be defined based on the specific windowing model. Third, the data arriving after adaptation should be computed correctly, and must be processed by the system in order. As with Tukwila-4, approaches to plan migration should ensure correct results, eliminate duplicates, and maintain good performance. Architecture Cape-4 is built around plan migration. The Cape-4 paper [2] assumes that the old plan and the new plan are given, and the main task is to design algorithms to migrate the state properly. As a result, there are no plan monitoring or plan re-optimization techniques discussed in the paper. On the other hand, since Cape-4 aims for data stream applications, its data source model is very different from Tukwila s. In Tukwila, data sources are connected through wrappers to the system, and a query reformulation component near the front-end is used to reformulate data integration queries. In contrast, there is no need for query reformulation in Cape-4 for data stream sources, but the queries may be continuously executed with unbounded input data. Main techniques Cape-4 proposes two approaches addressing the plan migration problem: the moving state strategy and the parallel track strategy. The moving state strategy moves the state inside operators (e.g., hash tables inside the join operator) from the old plan to the new plan and feeds the new data (data that arrives after the adaptation point) only to the new plan to be processed. In contrast, the parallel track strategy sends new data to both the old plan and the 7

10 new plan without moving the state. oth strategies ensure correct results and eliminate duplicates; however, they have different performance overhead under different conditions. The details of these two strategies will be described in Section 3.5. Another important issue is a clear definition of the window semantics over stream data and a clear definition of the correct results under such model. In Cape-4, every stateful operator is associated with a window size. For example, a join operator A with window size W means that for every tuple a of stream A, it only joins tuple b of with T b T a W where T x is the timestamp of a tuple x. This timestamp is the arriving time recognized by the local machine. Under such semantics, the size of the state inside the join operator can be bounded. Since tuples from stream A are strictly ordered on timestamp, a tuple b can be purged from the state of at the arrival of a if and only if T a T b > W. This is based on the intuition that any tuple b that satisfies this condition can not possibly join with any tuple from stream A that arrives after a; hence, this tuple b can be discarded when a arrives. Given these window semantics and purging rules, the correct results can be defined. Note that more complex cases, such as a combined tuple (intermediate tuple such as A tuples) which has been purged by a combined tuple, are also discussed. Conclusion Cape-4 focuses on the plan migration problem in adaptive query processing. It assumes a data streaming model, in which data sources arrive at the system continuously, even after execution. This assumption poses more challenges to Cape-4; the plan should be migrated correctly and efficiently under the clear window semantics over stream data. Cape-4 proposes two techniques to plan migration: the moving state strategy and the parallel track strategy. It also develops a cost model to estimate the cost of plan migration. Several experiments are shown in the paper to evaluate the two strategies under different system configurations and stream workloads. 8

11 Chapter 3 Adaptive Query Processing in Five Stages In this chapter, we propose a framework that divides adaptive query processing into five consecutive stages. Each stage is an integral part of the whole process. As shown in Figure 3., these five stages are Plan Pre-optimization (generating an initial plan for a query), Plan Monitoring (monitoring the plan status, system performance, as well as data characteristics), Plan Analysis (analyzing how well the current plan functions and deciding whether an adaptation is needed), Plan Re-optimization (finding a new plan that is better than the current plan), and Plan Migration (migrating the current plan to the new plan). These five stages form a loop and are continuously executed until query results are computed. We define a process that considers each stage in sequence and each sequence forms a phase. A complete execution may span multiple phases. In the next five sections, we will discuss each stage in detail. In each stage, we examine the techniques used in the systems described in the last chapter: Tukwila-99, Tukwila-4, and Cape-. Table 3. lists the stages that each paper discusses. For those stages in which only one system is discussed, such as Plan Pre-optimization, we describe the approach used in the system; for other stages in which at least two systems are discussed, such as Plan Migration, we describe their approaches and analyze the differences among these systems. An Example Throughout the discussions in this chapter, we will use an example, when applicable, to illustrate the main technical points. Suppose there are in total three data source relations: A(x, y), (x, z), and C(y, z), and an example query over these data sources is as follows. select * from A,,C where A.x =.x and A.y = C.y and.z = C.z This query asks for natural joins over relations A,, and C. There are in total three predicates in the where clause; however, two join operators are sufficient to execute this query. Figure 3.2 shows three equivalent yet different logical query plans that can be used to execute this query. The three plans are computed by using algebraic transformation rules, e.g., associativity or commutativity of 9

12 Figure 3.: Five stages of adaptive query processing Ay. Cy., z. Cz. Ax. x., Ay. Cy. Ax. x., z. Cz. A C A C AC Ax. x. z. Cz. Ay. C. y A C A C ( A ) C A ( C ) ( A C) Plan : Plan 2: Plan 3: Figure 3.2: Three possible plans for the example query joins. The left plan joins A and before C; the middle plan joins and C before A; and the right plan joins A and C before. If we assume that all join operators are symmetric, then these three plans are exactly the three possible logical query plans of this query. 3. Plan Pre-optimization Traditional query optimizers (e.g., Starburst [7], Volcano [6], and System-R [6]) generally use a dynamic programming algorithm to find an optimal plan over the plan search space. This optimal plan is a plan with the minimal cost based on a cost model combining performance factors, such as CPU, I/O, memory, and bandwidth, which in turn depends largely on the accuracy of statistics over source data (e.g., cardinality and data distributions). Hence, if statistics are incomplete, a plan pre-optimizer must rely on heuristics to find the initial plan for the system. Among the three systems, Tukwila-99 and Tukwila-4 discuss plan pre-optimization. We describe their approaches below. Tukwila-99 Tukwila-99 only allows for plan-partitioning adaptivity; hence, re-optimization or re-scheduling can only take place at the end of fragments. The data must be materialized before re-optimization. Furthermore, the initial plan can be a partial plan as long as it is a complete fragment. For instance, suppose in our example above, relation A has tuples, relation has, tuples, but relation C is of unknown size. Tukwila-99 s pre-optimizer would return A in this stage as a partial plan because relations A and are known to be the smallest and can be joined together. When the system reaches the end of the fragment, the whole intermediate results of A are materialized before the re-optimization of the remaining portion of the plan.

13 System Pre-optimization Monitoring Analysis Re-optimization Migration Tukwila-99 [] yes yes yes yes no Tukwila-4 [2] yes yes yes yes yes Cape-4 [2] no no no no yes Table 3.: The stages that each paper discusses Tukwila-4 The main difference of the pre-optimizer between Tukwila-99 and Tukwila-4 is that the initial plan must be complete even when statistics are missing. This is because the datapartitioning adaptivity of Tukwila-4 requires an initial plan to be responsible for the execution of the old data. In order to find the complete query, Tukwila-4 extends a standard top-down optimizer (recursion with memorization) with a guess for each relation with missing statistics. These heuristics may return a poor plan, but at least they give a complete plan based on which adaptation can be invoked. Discussion Tukwila-99 s allows for a partial plan to be chosen at this stage. However, Tukwila- 4 requires that the initial plan must be complete. When statistics are missing or incomplete, the plan pre-optimizer needs to make a guess for the missing value. Under such circumstances, the plan pre-optimizer is at best heuristics-based. 3.2 Plan Monitoring When a query plan has been chosen by the optimizer, adaptive query processing can be taken advantage of only if plans are monitored during the process of query execution. Information should be gathered at this stage to guide adaptation. Among the three systems, Tukwila-99 and Tukwila-4 discuss plan monitoring. We describe their approaches below. Tukwila-99 Tukwila-99 monitors events in response to important changes in the execution state, such as open/close (e.g., starting or completing an operator), error (e.g., unable to contact source), timeout (e.g., data source has not responded in n msecs), out-of-memory (e.g., join has insufficient memory), or threshold (n tuples processed by an operator). The execution system monitors these events which might trigger adaptation. On the other hand, Tukwila-99 also monitors dynamic information in the system that may be compared to estimated values to check if they satisfy certain conditions. For example, it monitors state (the operator s current state), cardinality (the number of tuples produced so far), time (the wait time since the last tuple), and memory (the memory used so far). Tukwila-4 Tukwila-4 also monitors operator-level information to aid the runtime decisionmaking. Every operator maintains a counter to indicate how many tuples it has output. (Unlike [7], it is observed with no measurable performance penalty.) It also monitors information exposed by the state structures of stateful operators (such as join and aggregation). Such information includes keys, ordering, size, and cardinality. Finally, I/O delay and tuple availability delay (the wait time

14 since the last tuple) should be monitored to facilitate the re-scheduling of operators; operators (e.g., pipelined hash join) react to such delays in scheduling the work during idle cycles. Discussion Tukwila-99 and Tukwila-4 monitor similar run-time system information as well as operator-level statistics. Tukwila-4 observes that plan monitoring is sometimes expensive because continuous monitoring consumes CPU cycles without contributing to computing results. Therefore, it is important to monitor only the information that is necessary for adaptation and to lower the granularity of monitoring as much as possible. 3.3 Plan Analysis In this stage, the progress of the current plan is analyzed for adaptation. ased on the analysis, decisions can be made on when to adapt and how often to adapt. Since Cape-4 does not specifically discuss plan analysis, we introduce Tukwila-99 and Tukwila-4 s approaches to plan analysis below. Tukwila-99 Tukwila-99 s analysis on when to adapt is based on rules generated by the optimizer. Usually these rules include conditions in which the monitored state deviates from the expected state, e.g., monitored cardinality is twice as large as the estimated cardinality; or unexpected run-time behavior occurs, e.g., the wait time for a tuple exceeds a threshold. On the other hand, Tukwila-99 decides how often to adapt based on milestones, e.g., an operator has processed n tuples. Note that Tukwila-99 can only allow re-optimization at the end of fragments; hence, only the conditions that are evaluated at the end of fragments can invoke the next re-optimization stage. Tukwila-4 In Tukwila-4, a global and cost-based evaluation of plan progress is performed by a low-priority background thread which re-optimizes the query. Therefore, its plan analysis is performed simultaneously with the next plan re-optimization stage, which we will discuss in Section 3.4. The decisions on when to adapt and how often to adapt are similar to Tukwila-99; that is, it is guided by rules generated by the optimizer. However, the main difference is that, here, adaptation can be invoked whenever the state is stable (e.g., for a pipelined hash join, whenever each tuple finishes probing, the state is stable), and not necessarily at the end of fragments. Discussion Tukwila-4 s plan analysis differs from Tukwila-99 s in that it is more flexible in determining when to invoke adaptation. Conditions for when to adapt and how often to adapt are generally guided by rules pre-specified in the query optimizer. On the other hand, the granularity of adaptation may largely depend on the granularity of plan analysis; hence, the specifications of milestones or pre-defined intervals have a large impact on how often to adapt. 3.4 Plan Re-optimization When the system confirms that the current plan is not functioning properly, a re-optimization process is invoked to find the next optimal plan. Among the three systems, Tukwila-99 and Tukwila- 4 discuss plan re-optimization. We describe their approaches below. 2

15 Tukwila-99 Tukwila-99 s adaptive techniques are mostly plan-partitioning based. That is, the re-optimization or re-scheduling of plans can only change the portion of a plan that has not yet been executed. Therefore, the re-optimizer in Tukwila-99 is limited in that the portion of the plan that has already executed data cannot ever be changed. It also requires the materialization of intermediate results before re-optimization, which adds overhead to the overall performance. For example, suppose the initial plan is as shown as Plan in Figure 3.2. After the execution of fragment A, an adaptation is invoked. Tukwila-99 can only adapt the remaining portion of the plan, e.g., change the join implementation of the operator that joins A with C. It can never adapt Plan to either Plan 2 or Plan 3 in Figure 3.2. In addition, Tukwila-99 requires that the intermediate state A must be materialized before adaptation, which is not necessary in this example. Tukwila-4 In Tukwila-4, data-partitioning adaptivity allows for different portions of data to be processed by different plans. In addition, in every phase of plan re-optimization, the optimizer refines its cost estimates with the most recent monitored state. For the same example as shown in Figure 3.2, suppose during the execution of Plan, the monitoring information suggests that the size of relation A is much larger than its expected size. Therefore, Tukwila-4 invokes its reoptimizer. However, the size of relation C is still possibly unknown at that point. In order to get a good cost estimate of all the candidate plans, the re-optimizer may still need heuristics. Tukwila- 4 proposes several heuristics that utilize the monitored run-time information. For example, the selectivity of a logical operator in the plan is shared by all logically equivalent subexpressions. This means that whatever physical join implementation algorithm is used, the selectivity of a logical operator monitored at run-time can be re-used. For another example, suppose the system wants to estimate the cardinality of the intermediate relation C in which the size of C is still unknown. Assume that the cardinality of A C and the cardinality of A is known during the execution. In this case, Tukwila-4 uses a heuristics which assumes that the join between A and C is a key-foreign key join and C is the foreign-key relation. Hence, the cardinality of C can be estimated as the same as the cardinality of A C. It may not be the case for other examples, but the optimizer gives a conservative estimate for unknown relations. Generally, Tukwila-4 is able to adapt to any candidate plan because of its support for data partitioning. Discussion Tukwila-99 s plan re-optimizer is limited in that it can only re-optimize the remaining portion of a plan at the end of a fragment. Tukwila-4 s plan re-optimizer is more general in that it may adapt to any candidate plan plus it is given the run-time monitored statistics. It is also worth noting that an optimal plan is possibly more expensive to migrate than a less optimal plan. To address this problem, Tukwila-4 (and Cape-4 as well) proposes a cost model to estimate the cost of migration and the cost of the new plan. The re-optimizer needs to take both costs into account when searching for an optimal plan. 3

16 Adaptation Point A C A C A C Old Plan: ( A ) C New Plan: A ( C) Source Data A C A A C C Figure 3.3: A motivating example of plan migration 3.5 Plan Migration Plan migration is the final stage in the loop of adaptive query processing. Note that only datapartitioning methods require this stage because at the adaptation point, not all data have been processed. Different portions of the data have to be processed by different phases of the plans. Plan migration is concerned with the mid-execution transition of the state from one query plan (old plan) to a semantically equivalent yet more efficient query plan (new plan). Each tuple from data sources can only be processed by a unique plan; hence, connecting and sharing the state across plans is extremely important, especially when query plans contain stateful operators such as joins. Approaches to plan migration need to address several important issues: How can we share the state among different plans? How can we ensure not losing tuples in the process? How can we avoid duplicated results that are generated by different plans? How can we take advantage of old plans when migrating to new plans? Let us go back to our example shown in Figure 3.2. Suppose the query pre-optimizer chooses an old plan, (A ) C, as the initial plan to start with, and suppose both join operators are implemented as pipelined hash joins. We denote A as the data of relation A, which are processed before adaptation by the old plan, and similarly for and C. Suppose now the system decides to adapt to a new plan A ( C) with both join operators implemented as pipelined hash joins. The data that has not been processed by the old plan are sent to the new plan for execution, and we denote these new data as A for the relation A (similarly for and C ). In this example, adaptation occurs once. Hence, the whole source data is a combination of the old data and the new data, i.e., A = A + A. For brevity, in the discussions below, we use a + to represent a union over two relations. We may omit the symbol in joining two relation, e.g., A represents A. Figure 3.3 shows the case in which this old plan is adapted to the new plan. From the figure, we can see that the old data, A,, C, are sent to the old plan, and the new data, A,, C 4

17 Tuples generated after adaptation: A ( C ) ( A A )( C C C ) C C C A C A C A A C C Old Plan New Plan Figure 3.4: Cape-4 s moving state strategy: status after state movement are sent to the new plan. Without any state sharing or state movement, the old plan outputs (A ) C and the new plan outputs A ( C ). However, combining these two results are not sufficient to compute the complete results of the query, which should be A C. The reason can be explained in the equation below. A C = (A + A ) ( + ) (C + C ) = A C + A C + (A C + A C + A C + A C + A C + A C ) (3.) Those delta items in the bracket are what the plan migration algorithms need to correctly compute. They are generally computed by joining the data across different plans. In this section, we use the above example to illustrate three different approaches to state migration: Tukwila-4 s stitch-up strategy, Cape-4 s moving state strategy, and Cape-4 s parallel track strategy. We demonstrate how these approaches ensure complete results and eliminate duplicates during the process of plan migrating. We also quantitatively analyze these three strategies with respect to different performance metrics and finally compare them against each other Cape-4 s moving state strategy Cape-4 proposes two strategies to solve the plan migration problem. The first one is called the moving state strategy. The basic idea is to move the appropriate operator state from the old plan to the new plan to facilitate joining old data with new data there. Figure 3.4 shows the status of both the old and new plan, when executing our example query as in Figure 3.3, after the old state as been moved to the new plan at appropriate operators. In this example, state A,, and C are moved from the old operator state to the new operator state inside new join operators. Note 5

18 that not all of the state is transferred, e.g., intermediate state A is not moved to the new plan because it is not useful there. Next, the processor of the new plan checks the intermediate state, e.g., C, which was not computed before but is necessary for computation and re-computes the state. After state matching and state re-computation has been performed, the new tuples can be sent to the new plan for processing. The operations of this strategy for the example is summarized below.. Move matched state A,, and C from the old plan to the new plan. 2. Recompute state C at the new plan. 3. New tuples from A,, and C are sent to the new plan. Each new tuple probes the current state on the other side of the join operator, and if there is a match, the joined result is output and is propagated to the next operator. Let us check whether this strategy ensures complete results and guarantees no duplicates. We strictly need to check whether all the terms in Equation 3. are generated by this strategy exactly once, and no extra tuples are generated. We omit a formal proof here, but give an intuitive explanation. In the new plan shown in Figure 3.4, the lower operator that joins tuples from relations and C finally produce a new result C + C + C. Suppose we represent this new result as C. Then the new results generated by the new plan in the end should be A ( C ) + (A + A )( C ), which is A ( C ) + (A + A )( C ) = A C + (A + A )( C + C + C ) = A C + A C + A C + A C + A C + A C + A C. (3.2) This is essentially equal to all the terms in Equation 3. excluding A C. This strategy computes all necessary tuples at the new plan Cape-4 s parallel track strategy Another strategy discussed in the Cape-4 system is called the parallel track strategy. The basic idea is to perform most computations at the old plan. This is enabled by sending new data to both the old plan and the new plan in parallel. Figure 3.5 shows the status of both the old and new plan at the adaptation point. This strategy performs the following operations for our example.. New tuples from A,, and C are sent to both the old plan and the new plan. 2. The following operations are executed in parallel. At both the old plan and the new plan, each new tuple probes the current state on the other side of the join operator. If there is a match, the joined output is propagated to the next operator. There is a difference in the join algorithm of the old plan. At the top join operator of the old box, where A and C gets joined, the output that joins merely new tuples such as A C should be excluded (as indicated in Figure 3.5). 6

19 ( )( ) Tuples generated after adaptation: A C C C A A A C A A C A A A A Exclude C A C A C A A C Old Plan New Plan Figure 3.5: Cape-4 s parallel track strategy: status at the adaptation point Here we discuss the reason why Cape-4 s parallel track strategy can ensure complete results and can eliminate duplications as well. We provide an intuitive explanation here. First, the tuples of A C are computed at the new plan in parallel when the old plan is executed; hence, those tuples must not be produced by the old plan due to duplication concerns. Second, at the old plan, as shown in Figure 3.5, the lower operator that joins tuples from relation A and will produce the new result A + A + A. Suppose we denote this new result as A. Then the new result generated by the old plan in the end should be (C + C )(A ) + C A excluding A C, that is, (C + C )(A ) + C A A C = (C + C )(A + A + A ) + C A A C = A C + A C + A C + A C + A C + A C. (3.3) This is equal to all the terms in Equation 3. excluding A C and A C. It ensures correct results, because after adaptation, the new plan computes A C, and before adaptation, the old plan had computed A C. Equation 3.3 also explains the reason why the topmost join in the old plan should exclude A C. In this strategy, most computations are performed at the old plan. Cape-4 also discusses cases of data streams. For static databases, the parallel execution process continues until all the tuples, A,, and C, have been processed by both plans. However, for data stream applications, two tuples that can be joined in an operator should not have timestamps apart larger than the window size W (as discussed in Section 2.3). For example, when a new tuple Exclusion or minus must be used carefully, e.g., the bigger expression must include the smaller expression, so that the bigger one can extract the smaller one from it 7

20 Tuples generated after adaptation: A C ( A A )( )( C C ) A C A C Exclude x x x A C A C A C A A C A C C C Exclude C Old plan New Plan Stitch-Up Plan Figure 3.6: Tukwila-4 s stitch-up strategy: status at the point when the new plan finishes execution from A has the time stamp larger than W plus the time stamp of the newest tuples in, then all the tuples in are no longer eligible to join with future tuples from A ; hence, can be purged. When all the old state A,, C and A are purged, the old plan stops execution because this is when the old plan can no longer generate any new results Tukwila-4 s stitch-up strategy In contrast to Cape-4 s two strategies, Tukwila-4 proposes a different strategy, the stitch-up strategy, to perform most of the computations at a new stitch-up plan. The stitch-up plan is shown as the stitch-up plan in Figure 3.6. Generally, the stitch-up plan selects the best plan available as a basis and generates a similar logical operator tree with some previous state re-used and incorporated (notice that the union operator in the figure facilitates incorporating state C without sacrificing equivalence). It also requires sharing state from the old plan as well as the new plan, so that all the terms in Equation 3. can be properly computed. For our example, this strategy works as follows.. Generate a stitch-up plan with the best available plan as a basis, and modify the plan in order to re-use the previous state. 2. Perform computations of new tuples from A,, and C at the new plan Move the state A,, and C from the old plan to the stitch-up plan. 4. Move the state A,, C, and C from the new plan to the stitch-up plan. 2 The stitch-up plan is usually performed either in parallel to executing the new plan (in which new tuples must be sent to both plans) or after completing the new plan (in which new tuples can only be sent to the new plan), where the former is required for streaming applications and the latter is good for static applications. In the following discussions, we use the latter case. 8

21 5. Perform computations of moved state at the stitch-up plan. Note that the lower join should exclude C from the output, and the higher join should exclude A x x C x from the output where x is either all or all. Here we briefly discuss why Tukwila-4 s stitch-up strategy can ensure complete results without generating duplicates. In this approach, the new plan generates A C as output results. The stitch-up plan, on the other hand, computes the complete tuples of (A + A )( + )(C + C ) except for A C and A C. Hence, the combined results of the new plan and the stitch-up plan ensures that complete and unique terms are computed. Tukwila-4 also discusses a number of heuristics to improve the computation in the stitch-up plan. First, every join operator maintains an exclusion list specifying which patterns are to be excluded. Second, the exclusion can be done at the structure-level, rather than at the tuple level. For example, the join operator that excludes the pattern C prevents the entire state from probing the C state. Third, the intermediate state that is pre-computed by the other plans can be shared, e.g., there is no need to compute C at the stitch-up plan. The more the state can be shared, the lower the computation cost Analysis ased on the three strategies discussed above, we analytically compare their performance executing our example query in Table 3.2. We compare the performance of different strategies in each of the three plans; the old plan, the new plan, and the stitch-up plan, in terms of communication (number of tuples sent or received from the other plans), computation (number of tuples probed in joins), and output cardinality (number of tuples output from the plan). We denote A as the number of tuples in relation A, and we use A to represent A for brevity; hence, A is not necessarily equal to A. As an example in estimating the number of tuples probed in joins, joining relations A and requires A probes. It can then be inferred that in estimating the computation cost of the new plan in the parallel track s strategy, joining and C requires C probes, and joining A with C requires A C probes; hence, in total it requires C + A C probes. Most of the numbers in Table 3.2 can be inferred by such computations. It is worth noting that in the table, (A,, C) = A C + A C + A C + A C + A C + A C. ased on this quantitative analysis shown in Table 3.2, we summarize our observations in Table 3.3. We briefly list the reasons for those observations. Note that we use MS, PT, and SU to represent moving state strategy, parallel track strategy, and stitch-up strategy respectively. Communication Cost MS only migrates the old state from the old plan to the new plan. If the old state is small, then this is the best strategy available, since it avoids extra sending and receiving of the new state. PT receives the new state at the old plan; hence, it places more bandwidth burden on the old plan. SU receives both the old state and the new state at the stitch-up plan; hence, it not only places more bandwidth burden on the stitch-up plan, but also requires more bandwidth at the new plan if the new state is large. Computation Cost Note that each strategy receives A + + C tuples at the new plan by default; hence, we do not list them in the table. MS performs the least computation in terms of the 9

22 Strategy Metric Old Plan New Plan Stitch-up Plan moving state Comm. Send A + + C tuples; Receive A + + C tuples; N/A Comp. N/A Probe A C N/A A C + C tuples; Output N/A Output (A,, C) + N/A A C tuples; parallel track Comm. Receive A + + C tuples; N/A N/A Comp. Probe A A + Probe C + N/A A C A C tuples; A C tuples; Output Output (A,, C) tuples; Output A C tuples; N/A stitch-up Comm. Send A + + C tu- Send C tuples; Receive A + + ples; Comp. N/A Probe C + A C tuples; C + C tuples; Probe C + A C C tuples; Output N/A Output A C tuples; Output (A,, C) tuples; Table 3.2: Quantitative analysis of three strategies to plan migration for the example strategy moving state parallel track burden on the old plan Stitch Up burden on the new plan and the stitch-up plan CommunicationComputation cost Cost migrating best w.r.t. old state the old plan best w.r.t. the new plan best w.r.t the old and the new plans Output Cardinality all at the new plan delta tuples at the old plan, and new tuples at the new plan new tuples at the new plan, and delta tuples at the stitch-up plan Steady Output the new plan waits for state transfer to finish steady output from the old plan steady output from the new plan, but the stitch-up plan waits for the new plan to finish Multiple Adaptations most computations done at the newer plan most computations done at the older plan most computations done at the final stitch-up plan Table 3.3: A comparison of three strategies based on the analysis in Table 3.2 2

23 old plan, PT (or SU) performs the least computation in terms of the new plan, and SU performs the least computation in terms of the combination of the old and the new plans. This analysis can help a system determine which strategy to choose under different conditions. For example, if there is not sufficient computing power to support with the old plan, it is good to choose MS or SU over PT. Output Cardinality All strategies output the same size of tuples concerning the combination of all plans. However, MS outputs all tuples at the new plan; PT outputs (A,, C) at the old plan and A C at the new plan; and SU outputs A C at the new plan and (A,, C) at the stitch-up plan. Steady Output Here we examine whether for data stream applications, these strategies can output results steadily around the adaptation point, as output steadiness might affect user satisfaction. In MS, the new plan can only perform computations when the old state has been transferred; hence, there might be a period of silence. In PT, the old plan and the new plan can immediately output results because of parallel execution. In SU, the new plan can immediately output results; however, the stitch-up plan needs to wait for the new plan to finish to fetch the state; hence, some tuple results may not be computed steadily. Multiple Adaptations It is quite possible that the adaptive process is performed in multiple phases. Here we examine the behavior of each strategy when there are multiple adaptation phases. MS always migrates the state from the old plan to the new plan; hence, it always performs most computations at the newest plan of them all. PT, on the other hand, always performs most computations at the oldest plan of them all. SU, only performs computations at a final stitch-up plan, as a matter of fact. If there are multiple adaptations, only one stitch-up plan is used to mix different phases of state. This stitch-up plan is based on the operator tree of the best plan available, depending on how much state can be re-used to facilitate computation. 2

24 Chapter 4 Evaluations In this chapter, we review two empirical studies based on the the three systems we have discussed. In the first set of studies, we examine the differences between adaptive query processing and static query processing under similar workloads and configurations. The second set of studies focuses on the performance comparison among different plan migration techniques given specific parameters. Since different experiments use different workloads and different system configurations to evaluate their performance, we excerpt a few representative studies and analyze each study individually. We finally conclude that the observations suggested in these experimental studies validate our analysis in Chapter Adaptive Query Processing vs. Static Query Processing In this section, we present two figures that show the benefits of adaptive query processing over static query processing in Tukwila-99 (Figure 4.) and Tukwila-4 (Figure 4.2), respectively. Tukwila-99 The experiment of Tukwila-99 is performed on a scaled version of the TPC-D M dataset, and seven queries are computed over four base tables of the dataset with the exception of the Lineitem table. The optimizer is given correct source cardinalities, but no histograms are available; hence, the optimizer has to compute its intermediate result cardinalities based on estimations of the join selectivities. All the joins are implemented as pipelined hash joins. Figure 4. shows the benefits of adaptive query processing over static query processing on the execution time. The pipelined strategy executes the query statically. The materialized strategy simply materializes the output at each join. This is even worse than the pipelined strategy in many cases. The materialized and re-planned strategy materializes the intermediate results and re-plans at the end of each fragment whenever the cardinality of the actual value differs from its estimate by at least a factor of two. Among these three strategies, only the last strategy is an adaptive query processing strategy. From the figure, we can see that the materialized and re-planned strategy is the fastest in executing all the chosen plans, with a total speed-up of.42 over the pipelined strategy and.69 over the materialize strategy []. This is possibly because most join operations in the figure are 22

25 Figure 4.: Comparison of static pipelined, materialized, and materialized plus replanned strategy [] Figure 4.2: Comparison of static optimization, adaptive query processing, and plan partitioning [2] Figure 4.3: Comparison of Cape-4 s two migration strategies: Migration Time w.r.t. window size W [2] Figure 4.4: Cape-4 s output rate over time given insufficient processing power [2] given insufficient memory, and poor selectivity estimates require them to overflow. Tukwila-4 The experiment of Tukwila-4 is performed on both a uniform (TPC-H) dataset and a skewed dataset (TPC-D), and the queries are mostly TPC-H queries with slight variations. There are four queries selected for computation, which are 3A (removing the date-based selection predicates from the standard TPC-H query 3), (standard), A (removing the date-based selection predicates from the standard TPC-H query ), and 5 (standard). This setup generates a workload with several levels of optimization complexity: a join of 3 relations (query 3A), two joins of 4 relations (queries and A), and a join of 5 relations (query 5). The system is configured to run all the experiments completely in memory with initially 2 M buffer size and more if needed. This setup isolates the computation costs from the disk I/O costs. It reduces the performance penalty which is caused by inaccurate estimates. In Figure 4.2, three approaches are compared. They are the static query processing approach, 23

Chapter 12: Query Processing

Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation