Slicing Long Running Queries - PDF Free Download

licing Long unning Queries Nicolas Bruno Microsoft esearch nicolasb@microsoft.com Vivek Narasayya Microsoft esearch viveknar@microsoft.com avi amamurthy Microsoft esearch ravirama@microsoft.com ABTACT The ability to ecompose a complex, long-running query into simpler queries that prouce the same result is useful for many scenarios, such as amission control, resource management, fault tolerance, an loa balancing. In this paper we propose query slicing as a novel mechanism to o such ecomposition. We stuy ifferent ways to exten a traitional query optimizer to enable query slicing an experimentally evaluate the benefits of each approach. 1. INTODUCTION New application scenarios have significantly increase the complexity of queries that are submitte to a atabase server. In this context, it is common for queries to run for a long time an consume significant server resources. These long-running queries, in turn, introuce new challenges to aminister an tune the unerlying atabase system, as illustrate by the following examples: Amission control: Many systems rely on strict amission control policies to prevent long-running queries from monopolizing system resources. In such systems, a query is accepte only if its estimate cost is below a threshol. Examples inclue traitional atabase systems [11] as well as emerging clou ata services [14]. Although such limits appear restrictive, they are necessary to ensure the overall scalability an performance of the share infrastructure for all users. No matter what threshol is use for amission control, however, there will still be vali queries that are too expensive to run completely. In such systems, application evelopers nee to manually transform a query that is not amitte into simpler queries that iniviually pass the amission test. esource management: In aition to amission control, an important component of resource management is scheuling, which maintains an manages a queue of pening tasks [9, 10]. Designing robust resource management policies in the presence of multiple long-running queries remains a challenging task. For instance, techniques that abort a long-running query in favor of another with higher priority face the challenge of restarting the aborte query from scratch, potentially wasting consierable work. Pause/restart techniques [3, 4] partially eal with these issues, but o not hanle Permission to make igital or har copies of all or part of this work for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page. To copy otherwise, to republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee. Articles from this volume were presente at The 36th International Conference on Very Large Data Bases, eptember 13 17, 2010, ingapore. Proceeings of the VLDB Enowment, Vol. 3, No. 1 Copyright 2010 VLDB Enowment 2150 8097/10/09... $ 10.00. all scenarios gracefully. The ability to ecompose a complex query into simpler fragments can be an important step in aressing resource management challenges. Fault tolerance: Conceptually similar to the case of manually aborte queries, a long-running query that fails before completion has to be restarte from scratch [3, 4, 16]. If a query is ecompose into simpler components, these can be restarte at a finer granularity, thus minimizing the amount of waste computation. Loa Balancing: Parallel systems attempt to istribute computation across noes in such a way that each noe performs roughly the same amount of work. This task becomes more challenging when the units of istribution are long-running an complex. The ability to ecompose a long-running query into many pieces of similar cost can thus contribute in aapting loa balancing techniques to new scenarios. In this paper we propose query slicing as a novel mechanism to complement existing work in the context of managing long-running queries. The iea is to enable a query optimizer to ecompose a complex query into slices that are execute to prouce the original result. pecifically, in this paper we stuy the following version of the query slicing problem. For an input query q an a given cost threshol, we attempt to ecompose q into a set of queries {q i} such that (i) all {q i } together prouce the original result, an (ii) the cost of each iniviual query is boune by the cost threshol (we formally efine the problem in ection 2). Consier the following query q: q = ELECT.a,.b FOM JOIN ON.x=.y WEE.c<10 AND.>20 uppose that the cost threshol for a slice is smaller than the original cost of the query. In this case, if is a large table an.c<10 returns a small fraction of, q can be rewritten as two queries q 1 an q 2 as shown in Figure 1. Although the combine cost of q 1 an q 2 is larger than that of q ue to an intermeiate table creation, each q 1 an q 2 might iniviually satisfy the cost threshol. While this extension to traitional query optimization seems natural, there are significant challenges in implementing such functionality. For example, even for such a simple query, there can be several other alternatives to consier. uppose that there is an inex on.. In that case, we can partition into two fragments by aing preicates on. an rewrite q into q 3 an q 4 as shown in Figure 1. In this case, query q can be ecompose into two slices q 3 an q 4 that can be efficiently execute using inex-base plans. As a final example, suppose that both.c<10 an.>20 are not very selective (i.e., they return most of an respectively). If there are inexes on.x an.y, another alternative to evaluate q is given by q 5 an 530

q 1 = INET INTO T q 3 = ELECT.a,.b q 5 = ELECT.a,.b ELECT.a,.x FOM JOIN ON.x=.y FOM JOIN ON.x=.y FOM WEE.c<10 AND.>20 WEE.c<10 AND.>20 AND.x<500 WEE.c<10 AND. < 100 q 2 = ELECT T.a,.b q 4 = ELECT.a,.b q 6 = ELECT.a,.b FOM T JOIN ON T.x=.y FOM JOIN ON.x=.y FOM JOIN ON.x=.y WEE.>20 WEE.c<10 AND.>=100 WEE.c<10 AND.>20 AND.x>=500 Figure 1: Different ways to ecompose an input query into two slices. q 6 in Figure 1. In this case, q 5 an q 6 implement a partitione - join strategy, an their results together are the same as those of q. Even when there are no inexes on.y, if table is small, q 5 an q 6 woul each join a fragment of with the whole, proucing results efficiently. The examples above illustrate that there can be multiple ways to ecompose a query into components that satisfy the cost threshol an together prouce the same original result. In this paper we introuce a comprehensive approach to ecompose such longrunning queries into multiple slices, such that each slice satisfies a cost threshol an the global execution is as efficient as possible. The rest of the paper is structure as follows. In ection 2 we formalize our problem statement an the minor extensions to an execution engine that are require for our techniques. In ection 3 we present a family of optimization strategies that traeoff optimization time an quality of the resulting solutions. In ection 4 we report an experimental evaluation of our approaches. Finally, in ection 5 we review relate work. 2. QUEY LICING In this paper we consier QL queries an the optimizer s cost moel as the estimator for query costs. We then state the query slicing problem as follows. Let cost(q) be the optimizer s estimate cost for query q, an the cost threshol for any query slice (note that if cost(q), the original plan is optimal). licing q for prouces a partially orere set of queries {q 1,..., q n } such that: 1. Executing all q i (while respecting the partial orer) prouces a table containing the same result 1 as q. 2. i cost(q i ). 3. i cost(qi) is minimal. We require that the final result be written into a table, which can then be rea by the user at any time. Otherwise, any query slice that involves memory intensive operators like hash joins, coul be opene by the client an processe very slowly, using significant server resources. This requirement oes not affect our algorithms, which can be easily aapte to stream results of such query slices to the client without the last materialization. In Figure 1 we showe ifferent ways to slice queries, which inclue writing intermeiate results into temporary tables an horizontally partitioning the input tables. We next formalize these alternatives using the notion of extene execution plans. 2.1 Extene Execution Plans Extene execution plans enable reasoning with collections of query slices very similarly to what is one with a traitional query, thus leveraging existing work in query optimization. In aition to the traitional relational operators, extene execution plans can contain partitione spools. Partitione spools are a useful formalism to reason with query slices, are expressive enough to hanle scenarios incluing those in ection 1, an can be implemente in 1 We assume that no upates occur across executions of q i. existing system with minimal or even no changes at all. We next escribe ifferent variants of the partitione spool operator. The pool Operator: The pool operator (use in almost all DBM engines) writes an intermeiate result into a temporary table. It takes a single relational input an a temporary table name, an bulk-loas the temporary table with the result of evaluating. pool operators can be place on top of any execution sub-plan, an the resulting temporary table can be subsequently rea in an extene execution plan. If so, we connect the pool operator an the consumer scan with a otte line. Queries q 1 an q 2 in ection 1 can be implemente using a pool operator as shown in Figure 2(a). The scan operator above the pool operator reas from the temporary table, calle T (omitte when it is clear in context). The Input-Partitione pool Operator: The input-partitione spool operator, or ipool for short, extens the pool operator by introucing iteration. The relational input of an ipool operator is parameterize by a preicate of the form $l<c $h, where c is a column efine in. It aitionally efines an expression of the form (c, {r 1, r 2,..., r k }), where r i =(l i, h i ] are ranges that form a partition of c s omain. To process an ipool operator (c,{ri })() we instantiate for each range l i < c h i, enote by [l i, h i], an evaluate ([l i, h i])). Note that the pool operator appens the results of each iteration to the same temporary table. In an extene execution plan, we mark with ouble lines the eges that vary for each instantiate range. An extene execution plan has ouble lines for all operators in the path connecting the ipool operator to the base tables over which the range column is efine (there might be multiple such tables ue to join preicates), unless the path inclues another pool operator. Queries q 3 an q 4 in ection 1 are implemente using ipool operators in Figure 2(b). The Output-Partitione pool Operator: The ipool operator iterates over multiple relations an prouces a single temporary output table. Conversely, the output-partitione spool operator, or opool for short, takes a single input relation an partitions it into multiple temporary output tables. As with ipool, an opool operator takes a parameter (c, {r 1, r 2,..., r k }), where c is a column of the opool s relational input an {r i} forms a partition of c s omain. To process an opool operator (c,{r i}) () we maintain as many temporary tables as ranges in the operator 2. We then rea completely an appen each tuple to the appropriate table epening on the value of column c. The opool operator is similar to partitioning operators use in parallel atabases, an we iscuss this relationship in ection 5. We note that opool operators can be easily implemente by a small coe fragment that leverages querying an bulk-loaing capabilities of existing query engines. Figure 2(c) shows an extene execution plan that implements queries q 5 an q 6 with a partitione join on x an y using an ipool operator. The plan joins together tuples from an that satisfy each range of.x (respectively.y ue to the join preicate). The vali tuples from for each range are obtaine by an inex on.y. uppose, however, 2 opool arguments use superscripts an ipool arguments use subscripts. 531

T σ.c<10 σ.c<10 $l<.x $h σ.>20 (a).x, {(-,500],(500, )} σ.>20 $l<.y $h σ.c<10 T i σ.c<10., {(-,100],(100, )} σ.>20 $l<. $h (b).x, {(-,500],(500, )}.x, {(-,500],(500, )} σ.>20 $l<.y $h (c) () Figure 2: Extene execution plans to reason with query slices. that there is no inex on.x, so processing σ.c<10 li <.x h i () for each range requires scanning the whole. We can improve this plan by introucing an opool operator, which reas σ.c<10 () once an writes two temporary tables T 0 an T 1 epening on.x values (see Figure 2()). These temporary tables contain tuples from satisfying both.c<10 an l i <.x h i (i.e., the tuples neee for each iteration of the ipool). The Input/Output-Partitione pool Operator: Finally, the input/output-partitione spool, or iopool for short, efficiently combines ipool an opool while scanning the input ata once. An iopool takes two expressions (c in, {r i }) an (c out, {s j }), an a relational input parameterize by a range preicate on c in. To process an iopool (c out,{s j }) (c in,{r i }) () we evaluate, for each range l i < r i h i, expression (c out,{s j }) (σ cin r i ()), where the opool operator shares the temporary tables across iterations. uppose, in Figure 2(), that evaluating σ.c<10 () is too expensive. By changing the opool in the figure to an iopool (.x,{r i}) (.c,{r j }), an assuming that an inex on.c is available, we obtain an amissible execution plan. In this paper we focus on range partitions for simplicity, but our approach can be extene to consier hash partitioning as well. 2.2 Vali Extene Execution Plans To evaluate an extene execution plan P, we first obtain query slices by breaking-up P on all otte-line eges. Each query slice epens on base or intermeiate tables, which inuce a partial orer among slices. We then execute query slices respecting this partial orer. A vali extene execution plan satisfies some restrictions on the placement of spool operators. We say that an opool (or iopool) operator with output parameter (c, {r i}) closes another ipool (or iopool) operator with input parameter (c, {s j}) if is a escenant of, c=c an {r i}={s j}. For an extene execution plan to be vali, every time there is an ipool (or iopool) operator with input parameter (c, {r i}) an we follow the path from to the base table(s) that efines c (moulo column equivalence) the first spool operator in the path (if any) has to close. 2.3 Cost Moel for Extene Execution Plans The cost moel in a traitional optimizer nees to be extene to reason with spool variants, query slices an cost threshols. The local cost of an operator ρ, LC(ρ), is given by traitional cost formulas of query optimizers (spool variants are seen as table insertions an thus coste appropriately). Aitionally, we nee to exten the cost moel by efining, for each execution subplan, a tuple (C, DC), where C is the shallow cost of the subplan (which moels the cost of a query slice an shoul fit in the cost threshol), an DC is the eep cost of the subplan (which shoul be minimize). Consier a scan or a seek operator over a table in an execution plan. If ρ s table is a base table, we efine C(ρ)=LC(ρ), an DC(ρ)=LC(ρ). If ρ s table is a temporary result from executing subplan P, the scan operator resets the shallow cost C of its subplan to its local cost, an the overall cost is kept in DC. That is, C(ρ)=LC(ρ), an DC(ρ)=LC(ρ)+DC(P ). Consier an execution plan P with root operator ρ an subtrees ρ 1,..., ρ n. If ρ oes not partition its input (i.e., ρ is not an ipool or iopool operator), C(P )=LC(ρ) + i C(ρi), an DC(ρ)=LC(ρ) + i DC(ρi). uppose now that ρ= (c,{r i }) with an input parametric plan ρ (the case for an iopool is efine analogously). Then, the shallow cost for ρ is the maximum, over all ranges r i, of executing the parametric plan ρ [r i] an writing the partial result to the temporary table. The eep cost for ρ is the sum of the local an eep costs for the first range r 1, an the local an shallow costs of subsequent ranges. The reason is that we only incur a eep cost once (to materialize intermeiate results own in the execution plan) but subsequent iterations of the ipool operator woul rea from the temporary tables, therefore incurring only the shallow cost (if ρ has no spool operators, DC(ρ [r i ]) = C(ρ [r i ])). More formally, C(ρ)= max(lc((ρ )[r i ]) + C(ρ [r i ]) i n n DC(ρ)= LC((ρ )[r i ]) + DC(ρ [r 1 ]) + C(ρ [r i ]) i=1 We now reformulate the query slicing problem. Let q be a query an be the cost threshol for any query slice. licing q for prouces an extene execution plan P so that (i) C(p) for every subplan p of P, an (ii) DC(P ) is minimal. 3. FINDING OPTIMAL QUEY LICE In this section we introuce several optimization strategies to solve the query slicing problem. Our approach results in a spectrum of alternatives that balance optimization cost an quality of the resulting plans. We focus on PJ queries, an exten the class of queries that we can hanle in Appenix B. To explain our algorithms, we first show, in Figure 3, a simplifie top-own 3 version of a ynamic programming algorithm that obtains the best execution plan for an PJ query. We assume that a global Memo associative array is available, which takes a subset of tables an a sort orer, an returns the best plan for such combination (, ). The optimization of a query starts by calling optimize(, null) or optimize(, c) if an orer by c is require ue to an orer-by clause. Line 1 implements memoization an calculates the best plan in lines 2-15 once for each istinct (, ) (otherwise, it simply returns the cache version). To compute the best plan, lines 2-4 try to implement a caniate plan CP using an enforcer plan if an 3 The top-own approach with on-eman interesting orers is very similar to the traitional bottom-up ynamic programming approach of ystem-[13], but avois explicitly enumerating all interesting orers upfront, or otherwise generating unneee alternatives. i=2 532

upatememo (:tables, :orer, P:plan) 01 if (P null an (Memo[, ] = null or cost(p) < cost(memo[,])) 02 Memo[, ] = P optimize (:tables, :orer) returns best plan for satisfying 01 if (Memo[, ] was not yet calculate) 02 if ( null) 03 CP = ort (optimize(, null)) 04 upatememo(,, CP) 05 if ( = 1) 06 CP = best single-table plan uner orer 07 upatememo(,, CP) 08 else for each vali partition (1, 2) of 09 for each join algorithm JA 10 1,2 = require orers of 1,2 for JA 11 CP1 = optimize(1, 1) 12 CP2 = optimize(2, 2) 13 if (CP1 null an CP2 null) 14 CP = JA(CP1, CP2) 15 upatememo(,, CP) 16 return Memo[, ] Figure 3: Top-own ynamic programming join reorering. orer is requeste (i.e., null). In that case, line 3 recursively calculates the best plan for the same tables without requesting any orer, an inserts a top-most sort operator which woul enforce the require orer. Line 4 calls upatememo with the resulting plan, which upates the best plan foun so far for (, ). For any value of, lines 5-15 calculate the best plan satisfying the require sort orer. Lines 5-7 hanle the case of a single table in, obtain the best single-table plan satisfying orer, an upate the memo with such caniate plan. For the general case of >1 line 8 obtains all vali partitions of into 1 an 2 (e.g., if only consiering left-eep trees, the partitions must satisfy 2 =1). For each such partition an join algorithm JA, line 10 calculates the require orers of the join inputs (e.g., a merge join operator requires both inputs to be sorte on the respective join columns). Lines 11-12 recursively obtain the best plans for 1 an 2, an lines 14-15 assemble the join plan an upate the memo. After all partitions an join alternatives have been evaluate, line 16 returns the actual content of Memo[,], which contains the best plan for the input set of tables an require orer. 3.1 anling pool Operators We next escribe a simple extension to the algorithm of Figure 3 that consiers spool operators (ection 2.1). To that en, every time we create a caniate plan an call upatememo in lines 4, 7, an 15, we aitionally consier spooling such intermeiate results by aing after line 15 (an also after 4 an 7): 15.1 upatememo(,, can(pool(cp))) We also nee to consier only vali execution plans (i.e., those that satisfy the cost threshol ). Thus, we moify the preicate cost(p) < cost(memo[,]) in line 1 of upatememo as follows: DC(P) < DC(Memo[,]) an p P: C(p) In other wors, we reject plans that contain a subplan with shallow cost exceeing the threshol, an keep the one with the smallest eep cost. These changes are necessary, but unfortunately not sufficient to obtain the optimal slicing strategy. uppose, as a very simple example, that we call optimize({},null) an that there is a single-table preicate.a < 10 on table. The algorithm woul then generate the following two plans: - P 1 = Filter.a<10 (can()) - P 2 = can(pool(filter.a<10 (can()))) upatememo (:tables, :orer, P:plan) 01 if (P null an p P: C(p) ) 02 Memo[, ] = skyline(memo[, ] P) optimize- (:tables, :orer) returns skyline of plans for satisfying 01 if (Memo[, ] was not yet calculate) 02 if ( = null) 03 for each (CP optimize-(, null)) 04 upatememo(,, ort (CP)) 05 upatememo(,, can(pool(ort (CP)))) 06 if ( = 1) 07 CP = best single-table plan uner orer 08 upatememo(,, CP) 09 upatememo(,, can(pool(cp))) 10 else for each vali partition (1, 2) of 11 for each join algorithm JA 12 1,2 = require orers of 1,2 for JA 13 CP1 = optimize-(1, 1) 14 CP2 = optimize-(2, 2) 15 for each (pcp1, pcp2) CP1 CP2 16 CP = JA(pCP1, pcp2) 17 upatememo(,, CP) 18 upatememo(,, can(pool(cp))) 19 return Memo[, ] Figure 4: anling pool operators for query slicing. Assume that C(P 1 )=DC(P 1 )=100. Because the pool operator only materializes the tuples that satisfy.a < 10, an only the columns that are relevant upwars in the tree, the cost of reaing the temporary table woul be smaller than that of scanning the original table. That is, it coul be that C(P 2 )=20 an DC(P 2 )=150. In this case, it is not clear which one among P 1 an P 2 we shoul keep in Memo[{},null]. uppose that we keep P 1. In that case, if = 110 an the local cost of joining with any of the remaining query tables is over 10 units, we woul get an infeasible solution because we cannot join P 1 without violating the cost threshol. a we kept P 2 we coul have obtaine a solution. owever, if we keep P 2 an is higher, we coul return a suboptimal solution that uses P 2 rather than the more efficient P 1. The main problem is that the traitional principle of optimality oes not hol in our scenario. That is, a subplan that is suboptimal in terms of eep cost might be part of the optimal execution plan ue to having a smaller shallow cost. To correctly hanle spool operators, we nee to generalize the Memo ata structure, so that it keeps all caniate plans that might become part of the optimal solution. pecifically, Memo[,] must contain, not just the plan P with the smallest value of DC(P ), but instea all plans in the two-imensional skyline [1] of (C, DC). Therefore, we exten the Memo ata structure so that it returns a set of plans for each input (,) pair, an moify upatememo to: 01 if (P null an p P: C(p) ) 02 Memo[, ] = skyline(memo[, ] P) The last change we nee to make to the algorithm in Figure 3 has to o with the search space itself. ince Memo[,] (an hence optimize) returns a set of plans rather than a single plan, we nee to consier all ifferent ways to combine such intermeiate results into larger execution plans. For instance, lines 11 an 12 returns sets of plans in CP1 an CP2. Therefore, we change lines 13-14 to: 13 foreach (pcp1, pcp2) CP1 CP2 14 CP = JA(pCP1, pcp2) an we make similar changes in lines 3-4. The resulting algorithm (enote optimize- in Figure 4) fins the optimal query slicing for a given threshol when using pool operators. 533

3.2 Local Partitione pools A rawback of optimize- is that it might fail to fin any feasible solution for some values of. uppose, as a trivial example, that just scanning a base table alreay excees. In this case, no matter where we place pool operators, there woul always be a subplan p for which C(p) >, an thus optimize- woul not return any vali solution. In general, for a query q with k joins, it can be shown that optimize- will not fin a solution for < cost(q)/(2k), where cost(q) is the cost of the query obtaine by calling optimize(q, null) (i.e., without constraints). x, {...} finpartitions (P:parametric plan) returns :set of ranges 01 =, L =, = 02 while (L < ) 03 rmin = L, rmax = 04 fmin = C(P[L, rmin]), fmax = C(P[L, rmax]) 05 while (rmax-rmin > ϵ) 06 rmi = (rmin + rmax) / 2 07 fmi = C(P[L, rmi]) 08 if (fmi > ) rmax = rmi 09 else rmin = rmi 10 = {(L, rmi] 11 L = rmi 12 return Figure 6: Fining ranges for partitione spools. Tx Ty x, {...} y, {...} Figure 5: Local partitione-spools. To aress the above shortcoming, we exten optimize- to inclue local partitione-spools. The iea is to also consier surrouning each operator with partitione spools, an thus avoi having a single operator that is too big to fit in the threshol. Figure 5 shows an example of partitione spools surrouning a join. The join operator, which might be too large to fit the threshol, is moifie into a partitione join, which woul fit by ajusting the column ranges appropriately. We next iscuss the two main challenges to incorporate these alternatives into the search strategy, namely, how to instantiate a local partitione spool with the proper column ranges, an how to enumerate the larger space of plans. Obtaining column ranges. uppose we are given a parametric plan like the one at the right of Figure 5, an we have to fin the right ranges to instantiate in the ipool an opool operators. ince the parametric plan contains opool operators right below the join, the cost of the sub-plans below such opool operators are inepenent of the actual ranges for the spool column (in that sense, the choice of column ranges is local). We can then leverage the cost moel of the optimizer an search for partitions that minimize the overall execution cost. As in [12], we assume the fewer the partitions (an therefore the larger the work one per partition), the better the overall cost (however, see the iscussion in Appenix B.2). Therefore, we always choose the largest possible ranges that result in a query slice instance that fits. Figure 6 shows a simple proceure base on binary search that incrementally fin ranges that make each ipool iteration fit in. Note that the actual technique to fin ranges is orthogonal to the enumeration strategy itself, an thus we can replace the algorithm in Figure 6 by more sophisticate alternatives such as interpolation search or the optimal-splitter technique of [12]. Enumerating local partitione spools. The original algorithm optimize- consiers in the search space all relevant plans with the template shown in Figure 5. Also, optimize- consiers putting a pool operator on top of every plan it consiers. Thus, given a join operator, it will consier execution plans for its chilren that are spoole at the top (those plans woul be part of the Memo skyline, because the shallow cost of scanning a temporary table is minimal an cannot be ominate). Consier line 18 in Figure 4: 18 upatememo(,, can(pool(cp))) Plan CP is efine as JA(pCP1, pcp2) in line 16 for some join algorithm JA an subplans pcp1 an pcp2. Whenever both pcp1 an pcp2 are themselves scans over temporary tables prouce by a spool operator, the resulting plan can(pool(cp)) matches the template that we consier for local partitione spools. We consier local partitione plans by aing the following logic to optimize-: 18.1 if (tempcan(pcp1) an tempcan(pcp2)) 18.2 c = join column from pcp1 18.3 lpcp = can(ipool c(changepools(cp, c))) 18.4 finpartitions(lpcp) 18.5 upatememo(,, lpcp) ere, tempcan(p) etermines whether p scans a temporary table prouce by a spool operator. Therefore, in aition to regular pool operators, the logic in lines 18.1-18.5 consiers all possible local partitione spools in the search space. It oes so by picking every suitable plan pattern CP, an calling changepools(cp, c), which replaces the top-most pool or ipool operator with opool or iopool operators in path from the root of CP to the leaf noe that contains column c (moulo column equivalences). It then as a new ipool operator at the root, an calls finpartitions to instantiate a suitable partitioning strategy for the local partitione spool. Thus, we can always fin query slices for given values of. We call the resulting algorithm optimize-lp. Aitional Details: We next iscuss some etails that we omitte earlier for simplicity. The first complication arises ue to ata skew. uppose that value.x=10 in table is repeate so many times that the local cost of performing a partitione join with value.x =.y = 10 alreay excees. ince we cannot further subivie.x=10, optimize-lp woul return no solution. imilar to techniques use in parallel atabase systems, we can exten the partitioning algorithm so that it also consiers seconary partitioning columns in case of extreme skew. For instance, we can subivie.x=10 into.x=10 an.i {(, 100], (100, ]}, where.i is another column in (preferably a key). Each seconary partition of.x=10 has to join with the partition.y=10 in. If both an are subivie for the same value, the cross prouct of joins is performe. The secon etail in lines 18.1-18.5 above is that it assumes a single join preicate between the lpcp1 an lpcp2. In general, if the join graph contains cycles or the search space inclues bushy trees, there might be more than a single join preicate. In such a case, we execute lines 18.2-18.5 for each join preicate. Finally, a subtle etail is relate to the cost moel for opool operators. The cost of an opool operator is not the same as that of a regular pool operator. An opool operator nees to aitionally evaluate range preicates to etermine the temporary table over which the current input tuple shoul be appene. The number of range preicates epens on the number of temporary tables, but this number is not known in avance, as it is only etermine after calling finpartitions. Algorithm optimize-lp 534

.x, {...} i T i.x, {...} (a) eek $l<a $h w=z.a, {...} T w=z T.x, {...} eek $l<x $h T i eek$l<.a $r w=z w=z y, {...} T.a, {...} (c) T () Figure 7: General partitione spools. assumes that a single partition woul be require when constructing the skyline bottom-up, an then moifies pool operators into the require opool operators in lines 18.3 using changepools. A corner case happens when the optimal pcp1 (respectively pcp2) (barely) fits, but the moifie subplan which uses opool oes not when aing the require range preicates. This woul result in missing a vali alternative plan pcp1 that, while ominate by pcp1, has the possibility to perform the require range preicates within. To aress this limitation, we relax the ominance conition in the skyline computation of upatememo, so that whenever p 1 ominates p 2, both p 1 an p 2 are scans over temporary tables, an p 2 s shallow cost of its spool chil is smaller than that of p 1, we o not prune p 2 from the skyline. 3.3 General Partitione pools Although local partitione spools always return feasible solutions, there are scenarios (e.g., when leveraging existing inexes) for which optimize-lp returns suboptimal plans. Consier the local partitione spool for a three-way join on tables, an T (see Figure 7(a)). Assume that a covering inex on.x is available (i.e., an inex that contains all require columns from ). We can replace the opool (x,{...}) operator by an access path that irectly retrieves the tuples in satisfying each range preicate over.x (see Figure 7(b)). If the remaining single-table preicates on are not very selective, this alternative can be more efficient than materializing intermeiate results. Now suppose that is originally accesse in Figure 7(a) using an inex over column.a (say there is a singletable preicate on such column). An alternative similar to the plan in Figure 7(b) is shown in Figure 7(c). This plan partitions table, not on the join column.x, but instea on.a (an thus it is not a partitione join). Then, for each range in.a, the join is performe with the whole right-sie relation (which in the figure is spoole into temporary table T). Figure 7() shows another alternative that uses a eep partitioning of column.a. Each partition of is joine with both an T before the partial result is written into the common temporary table. Depening on carinality values an inex availability, each alternative in Figure 7 might be optimal. The plans in Figure 7(b-), however, are not foun by optimize-lp, since there is no opool that immeiately closes the top-most ipool operator. T (b) We next iscuss how to exten optimize-lp to exploit arbitrary placement of all spool variants. ince optimize-lp places spools tightly surrouning join operators, there is no nee to hanle parametric selection preicates on execution subplans (the implicit parametrization is one locally by the corresponing opool operators an the choice of column ranges is local). When consiering the full space of plans, however, we nee to explicitly create an propagate parametric plans. Function paramcols in Figure 11 returns the set of columns that a given plan is parameterize upon. For plans that o not have a spool operator at the root, paramcols always returns a single column (since we consier single-column partitione spools), or null if the plan is not parameterize. In contrast, if the plan p oes have a spool operator at the root, the set of parameterize columns are all those in join preicates between a table in p an a table not in p. These columns woul eventually be use by changepools to instantiate opool operators. ash eek $l<c $h can Inex eek $l<c $h eek Figure 8: Parametric plan ominance. A istinguishing feature of parametric plans is that we o not know their costs until we instantiate the parameters. For that reason, we cannot prune away a parametric plan p unless we are sure that p will be ominate by other plans for all possible range instances. Figure 8 shows a simple example where a hash-join an an inex-join alternatives might ominate each other epening on the number of tuples satisfying the preicate on the outer table. pecifically, the ominance conition on the skyline operator nees to be extene so that (i) plans parameterize on ifferent columns o not ominate each other, an (ii) parametric plan p 1 ominates parametric plan p 2 (parameterize on the same column) whenever p 1 ominates p 2 for every parameter instance. The main algorithm for ealing with arbitrary spool variants, which we call optimize-p is iscusse in etail in Appenix A.1. pecifically, we show how to generalize the ominance conition on the skyline operator, how to generate parametric plans for interesting columns, an how to generate join combinations. 3.4 LP with ingle Table Optimization The generic algorithm optimize-p iscusse above traverses the full space of extene execution plans an consiers all spool variants. owever, ue to the large number of parametric plans that are generate (an thus generally not prune), optimize-p is usually much more expensive than the restricte variants iscusse in ections 3.1 an 3.2. At the same time, resulting plans by optimize-p are of better quality because of the extene search space that is consiere. We next introuce optimize-lp*, a technique that generalizes optimize-lp (an uses slightly more resources), but gives results closer to those of optimize-p. As motivation, consier again the examples in Figure 7(b-c). A common property of these extene plans is that whenever an ipool is place on top of an operator p, either it is close by an opool operator immeiately below p, or else the partitioning column (moulo join equivalences) is efine over a single-table subplan of p. This is important because parametric plans are therefore only efine for single-table expressions, an therefore o not propagate arbitrarily upwars in the enumeration strategy. ince we can check for ominance of such parametric plans easily, complex skyline computations (or heuristic approximations) are not neee. 535

Figure 7() shows a plan that oes not fall in the category explaine above, because it uses a eep partitioning of column.a. owever, note that such a plan necessarily executes multiple joins between partitions of an the whole of an T. If the joins are hash- or merge-base, an T woul be rea multiple times. If the joins are inex-base, it means that some intermeiate result is small, an we coul materialize such result earlier with a relatively small penalty. Therefore, the plan in Figure 7() requires rather specific circumstances to be significantly better than alternatives. This analysis motivates optimize-lp*, which extens optimize-lp by allowing single-table parametric plans that can take avantage of inex strategies. We can obtain optimize-lp* by restricting the classes of joins that we consier in optimize-p, as shown in Appenix A.2. In general, optimize-lp* prouces plans that are comparable to those given by optimize-p at a fraction of optimization time. 3.5 P with Plan Pattern Optimization We previously explaine how optimize-lp* reuces the overhea of optimize-p by restricting the places on which spool operators can be locate (e.g., we forbi eep partitione columns). In this section we explore an alternative approach, in which we restrict the plans on which spool operators can be place (without restricting spool placement on such plans whatsoever). Our technique, which we call optimize-p*, can be seen as a generalization of the post-processing techniques in parallel atabases that sprinkle parallelism over the best serial plan. pecifically, optimize-p* consiers spool operators over plans that share the same pattern with the optimal plan foun without constraints. Two plans share the same pattern if the join tree is the same moulo commutativity (join algorithms can change, though). Therefore, optimize-p* starts by calling optimize (see Figure 3) an obtaining the optimal plan P opt inepenent of. Then, it procees very similarly to optimize-p, but only exploring the relevant plan fragments that appear in the optimal plan. The simple extensions require to implement optimize-p* are iscusse in Appenix A.3. Algorithm optimize-p* is much faster than optimize-p because it only consiers a small number of execution plans. It might miss opportunities, however, since slicing the optimal plan is not the same as obtaining the optimal query slicing. 3.6 ummary of Techniques Table 9 summarizes both the search space enumerate by the techniques (in orer of generality), an also the istinguishing features involve in their solutions. These strategies balance optimization time with the quality of resulting extene execution plans. Note that throughout this section we focuse on PJ queries to simplify the presentation. Appenix B iscusses several important extensions an optimizations, such as hanling GOUP BY) an other operators, more etails on partitioning strategies, an various performance improvements. LP LP* P / P* pace +local c o ci +single tables Full / Optimal Features (C,DC) +binary +single-table +cost skyline an skyline search parametric plans parametric plans Figure 9: ummary of optimization strategies. 4. EXPEIMENTAL EVALUATION In this section we report an experimental evaluation of the techniques escribe in this paper. We implemente the ifferent query slicing algorithms of ection 3 by extening the exhaustive optimizer in [2]. The optimizer cost moel was porte from that of Microsoft QL erver s optimizer. Unless explicitly state otherwise, we use binary search for etermining range partitions, an an early search bailout of 0.1% (see Appenix B.3). We use the workloa generator iscusse in [2] to prouce a synthetic queries, which allowe us to vary ifferent factors like the number of tables an their sizes, join topologies, preicate selectivities an availability of inexes. Query templates follow chain, snowflake, an star schemas with foreign-key joins, optionally inclue single-table local selection preicates (with ranom selectivity in the range 0.1%-10%) an group-by clauses. Table sizes range from kilobytes to gigabytes. For the case of snowflake schemas, workloas look similar to those in a typical 10GB TPC- atabase. 4.1 An Illustrative Example To illustrate the ifferent plans consiere by our techniques, we took a four-way star-join query an explore how the overall cost of the query varies with ecreasing values of using optimize-p (see Figure 10). When = the overall cost is 12.5 units. As we ecrease, the overall execution time graually increases up to 25 units for = 0.2. The figure also shows selecte extene execution plans for certain values of. The query result size is rather small, so when is slightly below the cost of the optimal plan, the best extene plan in Figure 10(a) puts a pool operator at the root. For = 11.1 intermeiate results become too expensive, so a secon pool operator is place on top of the first join in Figure 10(b). For even smaller = 7.1, there is no plan that exclusively uses pool operators, an the optimal plan in Figure 10(c) introuces a top-most ipool with a eep partitioning attribute on table T 0. The materialize table T 23 is rea multiple times, once per partition on T 0.c. When we further ecrease own to 1.9 units, the pool operator on top of tables T 2 an T 3 is transforme into a secon ipool operator that inuces a partitione join. owever, T 2 cannot be rea completely uner an therefore a thir iopool operator, which repartitions T 2, is introuce. The cost of the optimal plan gracefully egraes for smaller values of, an the resulting plans leverage all variants of pool operators. 4.2 ummary of Experimental esults We next summarize our experimental results, an refer to Appenix C for quantitative information that supports our finings. Optimizer Efficiency: In our experiments, optimize-p becomes prohibitively expensive for queries with aroun or over 8 joins. All other alternatives are practical for the whole range of workloas, taking less than 400 msec. on average to optimize the most expensive 10-way star-join workloa. Also note that optimize-lp* is cheaper than optimize-p* for chain queries, but the tren reverses for more complex join topologies, an for star queries with 8 or more tables, optimize-p* is the cheapest alternative overall. Plan Quality: For each query in the workloa an threshol, we efine the overhea ratio as the optimizer cost of the optimal extene execution plan of our techniques ivie by the optimizer cost of the optimal execution plan with no threshol. An overhea ratio of 1.25 for = C/4 means that the optimal query slicing P is 25% worse than the optimal unslice plan P U, when no slice in P is allowe to use more than 25% of the overall cost of P U. optimize- oes not prouce a plan for the vast majority of cases. optimize-lp is the simplest technique that results in vali queries for arbitrary values. owever, the overhea ratios are significantly higher than those of the more avance strategies. Finally, optimize-lp* an optimize-p* are almost ientical in quality to the optimal optimize-p (uner 2% ifference). 536

30 I I I T0.c, {} I T0.c, {} Overall Estimate Cost 25 () (c) (b) (a) 20 15 10 5 0 0 2 4 6 8 10 12 14 Cost Threshol Δ T0 T2 T1 T3 (a) Δ=12.4, C=12.5 T0 T2 T23 T1 T3 (b) Δ=11, C=12.6 σ $l<c $r T0 T2 T23 T1 T3 (c) Δ=7, C=13.1 σ $l<c $r T0 T2i T23 T1 T2.jc, {} T3 T2.jc, {} T2.a, {} Figure 10: Optimizing a 4-way join for varying threshols. T2 () Δ=1.9, C=18.1 5. ELATED WO Managing long-running queries is an important problem in ata warehousing. A stuy of current workloa management policies is presente in [9, 10]. Techniques can be classifie into amission control, scheuling, an execution control. While most systems use combinations of these techniques to manage long-running queries, esigning a truly robust technique remains an open research problem. Our query slicing techniques are applicable to various aspects of resource management by slicing complex queries into pieces that respect a cost threshol. There has been recent work on new server mechanisms to pause an resume a long-running query (e.g., [3, 4]). These techniques are an interesting aition to the repertoire of execution control mechanisms. In general, amission control techniques nee to be use in conjunction with execution control mechanisms an it is interesting to examine how to best combine query slicing techniques propose in this paper with appropriate execution control techniques (e.g., Pause/esume). The partitione spool operator use in this paper is similar to the split operator use in parallel atabase systems [6]. The split operator partitions its output stream (using a split table) to an appropriate process while the opool operator partitions its output stream to temporary tables. While the problem of choosing an appropriate partitioning of an intermeiate result in a query tree has been previously stuie in the context of parallel query optimization [7, 8], there are a number of ifferences. First, we nee to hanle the aitional constraint of a cost threshol, which significantly impacts the resulting techniques. econ, physical esign plays an important role in our search space. Typically, in parallel query optimization, the set of columns that are interesting for partitioning are usually the columns on which the join preicates are efine. In contrast, a column on which there is a covering inex for a relation coul still serve as an interesting partitioning column (see Figure 7(b)) because it can potentially lea to a plan in which all the slices respect the cost constraint with no materialization. Finally, some techniques in parallel atabases exploit the current layout of ata (e.g., using small tables that are replicate in all noes for join processing). owever, these techniques o not consier whether to replicate a table uring optimization. The search space of our techniques inclue an generalize the equivalent of these strategies by placing spool operators over small intermeiate results. 6. CONCLUION In this paper we introuce the iea of query slicing, or iviing a complex long-running query into components that are estimate to run in a preefine amount of time. We stuie a spectrum of techniques for query slicing that exten the traitional optimization search space with ifferent traeoffs between optimization time an the quality of the slice plan. Our experimental results inicate that optimize-lp* an optimize-p* are almost unistinguishable in terms of quality an result in the best traeoff between optimization runtime an quality of resulting extene execution plans. 7. EFEENCE [1]. Borzsonyi, D. ossmann, an. tocker. The skyline operator. In Proceeings of the International Conference on Data Engineering (ICDE), 2001. [2] N. Bruno, C. Galino-Legaria, an M. Joshi. Polynomial heuristics for query optimization. In Proceeings of the International Conference on Data Engineering (ICDE), 2010. [3] B. Chanramouli, C. Bon,. Babu, an J. Yang. Query suspen an resume. In Proceeings of the ACM International Conference on Management of Data (IGMOD), 2007. [4]. Chauhuri et al. top-an-restart style execution for long running ecision support queries. In Proceeings of the International Conference on Very Large Databases (VLDB), 2007. [5]. Chauhuri an V. Narasayya. Automating statistics management for query optimizers. In Proceeings of the 16th International Conference on Data Engineering, 2000. [6] D. J. DeWitt an J. Gray. Parallel atabase systems: The future of high performance atabase systems. In Communications of the ACM, 35(6), 1992. [7]. Ganguly, W. asan, an. rishnamurthy. Query optimization for parallel execution. In Proceeings of the ACM International Conference on Management of Data (IGMOD), 1992. [8] W. asan an. Motwani. Coloring away communication in parallel query optimization. In Proceeings of the International Conference on Very Large Databases (VLDB), 1995. [9]. rompass, U. Dayal,. A. uno, an A. emper. Dynamic workloa management for very large ata warehouses: Juggling feathers an bowling balls. In Proceeings of the International Conference on Very Large Databases (VLDB), 2007. [10]. rompass et al. Managing long-running queries. In Proceeings of the International Conference on Extening Database Technology (EDBT), 2009. [11] Microsoft Corporation. QL erver 2008 Books Online. Accessible at http://msn.microsoft.com/en-us/library/ms190419.aspx. [12]. A. oss an J. Cieslewicz. Optimal splitters for atabase partitioning with size bouns. In Proceeings of the International Conference on Database Theory, 2009. [13] P. G. elinger et al. Access path selection in a relational atabase management system. In Proceeings of the ACM International Conference on Management of Data (IGMOD), 1979. [14] C. D. Weissman an. Bobrowski. The esign of the Force.com MultiTenant Internet Application Development Platform. In Proceeings of the ACM International Conference on Management of Data (IGMOD), 2009. [15] W. Yan an P. Larson. Eager aggregation an lazy aggregation. In Proceeings of the International Conference on Very Large Databases (VLDB), 1995. [16] C. Yang et al. Osprey: Implementing mapreuce-style fault tolerance in a share-nothing istribute atabase. In Proceeings of the International Conference on Data Engineering (ICDE), 2010. 537

APPENDIX A. ALGOITMIC DETAIL A.1 General Partitione pools In this section we iscuss etails for optimize-p, which can eal with arbitrary spool variants. Function upatememo in Figure 11 generalizes that of optimize-lp in two aspects. First, line 1 checks C(p) only for non-parametric plans. econ, the ominance conition on the skyline operator is extene so that (i) plans parameterize on ifferent columns o not ominate each other, an (ii) parametric plan p 1 ominates parametric plan p 2 (parameterize on the same column) whenever p 1 ominates p 2 for every parameter instance 4. This conition can be very ifficult to test an in general involves etaile knowlege of the cost moel. A heuristic that works very well in practice is to try extreme selectivity ranges (say ϵ an 1 ϵ) for the parametric preicate, an eclare that p 1 ominates p 2 if it oes it for both ata points (similar to the MNA technique in [5]). This is correct when cost lines of both plans o not intersect more than once, an a heuristic otherwise. We now iscuss the main algorithm for ealing with arbitrary spool variants, which we call optimize-p in Figure 11. The first ifference with respect to optimize-lp is on lines 6-13, which generate single-table execution plans. In aition to plans obtaine by previous techniques, lines 10-13 generate parametric plans for every interesting column. A column is interesting if it is either part of a join preicate in the query, or it is a key column of an inex. There coul be more than a single plan for a given column, to cover the whole range of selectivity values. Consier a subquery σ.a<10 () an column.b. Line 11 woul generate a plan that seeks I b for $l b<$h, fetches the remaining columns an then applies.a<10 on the fly (for low selectivity ranges on b). Aitionally, it will generate a plan that uses an inex on.a to obtain the tuples that satisfy.a<10 an then apply the range preicate on.b on the fly. If the query processor hanles inex intersection plans, aitional plans might be generate in line 11. All such parametric plans are store in Memo[,], as any of them coul be part of the overall optimal plan. The secon ifference is how joins are generate in lines 14-20. ather than just consiering plain spools an the extensions of optimize-lp for local partitione spools, optimize-p calls function generatejoins for each combination of plans pcp1 an pcp2 an join algorithm JA (contrast generatejoins with lines 16-18.5 in optimize-lp). Function generatejoins consiers each combination of parametric column for input plans P1 an P2 (recall that except for plans with root spools, each plan has a single parametric column or is null). If at most one of P1 an P2 has a non-null parametric column, or both have parametric columns that are joine together in P1 P2, we can generate a new plan that is either parametric on one column or not parametric at all (epening whether either P 1 or P 2 are parametric to begin with). In that case, lines 3-4 generate the potentially parametric plan CP, an lines 5-10 the corresponing plan that uses spool variants. A.2 LP with ingle Table Optimization As iscusse earlier, optimize-lp* extens optimize-lp by allowing single-table parametric plans that can take avantage of inex strategies. We obtain optimize-lp* by simply restricting the consiere join classes in generatejoins in Figure 11: 1.1 if (c1 null an valiforlp*(p1)) continue 1.2 if (c2 null an valiforlp*(p2)) continue 4 trictly speaking, p 2 is ominate whenever there is a plan (not necessarily the same) in the skyline that ominates p 2 for every parameter instance. paramcols (P:plan) returns columns for which P is parameterize 01 C = parametric(p)? {parameter(p)} : {null} 02 if (tempcan(p)) 03 C = C {c in cols(p):(c=c ) is join preicate} 04 return C upatememo (:tables, :orer, P:plan) 01 if (P null an p P: parametric(p) C(p) ) 02 Memo[, ] = skyline(memo[, ] P) generatejoins (P1,P2:plan, JA:join algorithm) 01 for each (c1,c2) in paramcols(p1) paramcols(p2) 02 if (c1=null or c2=null or (c1=c2) are joine) 03 c = (c1=null)? c2 : c1 04 CP = JA(pCP1, pcp2) 05 upatememo(,, CP) 06 if (c=null) 07 pcp = can(pool(cp)) 08 else 09 pcp = can(ipool c (changepools(cp, c))) 10 finpartitions(pcp) 11 upatememo(,, pcp) optimize-p (:tables, :orer) returns skyline of plans for satisfying 01 if (Memo[, ] was not yet calculate) 02 if ( = null) 03 for each (CP optimize-p(, null)) 04 upatememo(,, ort(cp)) 05 upatememo(,, can(pool(ort (CP)))) 06 if ( = 1) 07 CP = best plan uner orer 08 upatememo(,, CP) 09 upatememo(,, can(pool(cp))) 10 for each interesting column C // see ection 3.3 11 CP = parametric plans for σ $l C<$h () using IC uner orer 12 for each CP in CP 13 upatememo(,, CP) 14 else for each vali partition (1, 2) of 15 for each join algorithm JA 16 1,2 = require orers of 1,2 for JA 17 CP1 = optimize-p(1, 1) 18 CP2 = optimize-p(2, 2) 19 for each (pcp1, pcp2) CP1 CP2 20 generatejoins(pcp1, pcp2, JA) 21 return Memo[, ] Figure 11: anling all pool variants for query slicing. where valiforlp* accepts plans that either have a spool operator at the root, are efine over a single table, or else are not parametric. That is, valiforlp*(p) is equivalent to: tempcan(p) singletable(p) parametric(p) A.3 P with Plan Pattern Optimization Algorithm optimize-p* is similar to optimize-p, but only explores plan fragments that appear in the optimal plan. For that purpose, we nee to slightly moify the search strategy in algorithm optimize-p, which originally iterates over all possible partitions of the input tables, so that only trees that share their patterns with the optimal P opt are explore. pecifically, we nee to change line 14 in optimize-p to: 14 else for each (1, 2) sharing P opt s pattern 538