Selecting and Using Views to Compute Aggregate Queries

Size: px
Start display at page:

Download "Selecting and Using Views to Compute Aggregate Queries"

Transcription

1 Selecting and Using Views to Compute Aggregate Queries Foto Afrati National Technical University of Athens Athens, Greece Rada Chirkova Computer Science North Carolina State University Abstract We consider the problem of obtaining equivalent rewritings of aggregate queries using views. We assume conjunctive views and rewritings, with or without aggregation; in each rewriting, only one view contributes to computing the aggregated query output. Our focus is on minimizing the cost of computing a query workload; we look at query rewriting using existing views and at view selection. In the queryrewriting problem, we give sufficient and necessary conditions for a rewriting to exist. For view selection, we prove complexity results. We also give algorithms for obtaining rewritings and selecting views. 1 Introduction The problem of answering and rewriting queries using views for conjunctive queries and views has received considerable attention (see, e.g., [ALU01, CDGLV03, CHS02, Hal01] and references therein). However, a small amount of work addresses the case where the language is extended with aggregation [ACN00, CNS99, GHQ95, GHRU97, NSS98, SDJL96]. Few complete algorithms are known for finding rewritings; moreover, the existing results address special cases. At the same time, using materialized views to compute aggregate queries results in potentially greater benefits than for purely conjunctive queries, as a view with aggregation Due to space limitations, we do not provide proofs in the text. Selected proofs are in the appendix. Contact author: NCSU, 900 Main Campus Dr, Venture III Ste 165-C Rm 196, Raleigh, NC 27695, USA; tel ; fax ; chirkova@csc.ncsu.edu precomputes some of the grouping/aggregation on some of the query s subgoals. Also, because aggregate queries are often computed on large amounts of data, in many applications it is beneficial to use previously cached results as views to answer a new query [HRU96, GHRU97, ACN00]. We consider aggregate queries and views and address the problems of (1) how to answer the queries using the views and of (2) how to optimally select views to materialize. EXAMPLE 1.1 This is a simple motivating example. On a database with schema {P (A, B), S(B, C, D), T (C, G), U(A, H)} we consider three queries, Q 1, Q 2, and Q 3 : q 1 (A, B, max(c)) : p(a, B), s(b, C, D), t(c, G), u(a, H). q 2 (B, C, sum(h)) : p(a, B), s(b, C, D), u(a, H). q 3 (B, count) : s(b, C, D), t(c, G). We consider the following views: v 1 (B, max(c)) v 2 (A, B, sum(h)) v 3 (B, C) v 4 (C, count) :- s(b, C, D), t(c, G). :- p(a, B), u(a, H). :- s(b, C, D). :- t(c, G). We can rewrite the three queries as Q 1, Q 2, Q 3 using the four views: q 1 q 2 q 3 (A, B, W ) :- (B, C, sum(w )) :- (B, sum(w )) :- v 1(B, W ), v 2 (A, B, X). v 2(A, B, W ), v 3 (B, C). v 4(C, W ), v 3 (B, C). Each rewriting uses more than one view, and all views in a rewriting are not necessarily of the same type, i.e., some are without aggregation (view V 3 ), and some use aggregation different than the aggregation of the query they rewrite (view V 4 in Q 3 ). However, in each rewriting only 1

2 one view (the first) in the body contributes to the value of the aggregated attribute in the head; we call it the central view. We call these rewritings central rewritings. Also rewritings Q 2, Q 3 are themselves aggregate queries, whereas rewriting Q 1 is not. Finally, the grouping attributes in the head of the rewriting are, in general, different than the ones in the views used in the body. It is not straightforward how to argue that these rewritings are indeed equivalent to the queries. To see that, take Q 2 slightly modified as q 2 (B, C, W ) : v 2(A, B, W ), v 3 (B, C). Interestingly, rewriting Q 2 is not equivalent to query Q 2, although its body is the same as Q 2 and the head contains the same attributes. Also Q 2 (Q 3 ) is equivalent to Q 2 (Q 3 ) only if the view V 3 is computed under bag semantics [CV93]. One contribution of this paper is a complete algorithm which constructs central rewritings given a query and a set of views. The aggregate operators we consider are the common operators max, min, count, sum, count( ). As aggregation is not a relational operator, proving equivalence of queries to rewritings is more complicated than when queries have no aggregation. Thus, we investigate this problem first and use the results we obtain to develop our algorithm. When addressing the view-selection problem, we consider also multiaggregate views and queries with the HAVING clause, as in the following example. EXAMPLE 1.2 Consider a database with three relations, one relation that stores transactions, and two that store information about store branches: P(storeId, product, saleprice, profit, dayofsale, monthofsale, yearofsale); T(storeId, storechain); W(storeId, storecity). We consider three queries. Query Q 1 gives maximal profit per store chain per product for year Query Q 2 gives total sales per product per year per city, for all stores. Query Q 3 uses a HAVING clause in its definition and returns all product names, together with total sales, for each year after 1997 and only for the city of Seattle. Here is one possible SQL expression for Q 3 : SELECT product,yearofsale,sum(saleprice) FROM P,W WHERE P.storeId = W.storeId GROUP BY product, yearofsale, storecity HAVING yearofsale > 1997 AND storecity = Seattle ; These three queries can be rewritten using a single multiaggregate view. In our datalog rule notation the queries, the view and the rewritings can be written as: q 1 (S, Y, max(t )):- p(x, Y, Z, T, N, L, 02), t(x, S). q 2 (Y, M, U, sum(z)):- p(x, Y, Z, T, N, L, M), w(x, U). q 3 (Y, M, F ):- q 2 (Y, M, U, F ), M > 97, U = Seattle. v 1 (X, Y, M, sum(z), max(t )):- p(x, Y, Z, T, N, L, M). q 1(S, Y, max(k)):- v 1 (X, Y, 02, F, K), t(x, S). q 2(Y, M, U, sum(j)):- v 1 (X, Y, M, J, K), w(x, U). q 3(Y, M, F ):- q 2(Y, M, U, F ), M > 97, U = Seattle. View V 1 can be used as a central view to rewrite all three queries. Our second main result is an algorithm that selects multiaggregate central views optimally given a query workload. We also prove complexity results for the view-selection problem. The structure of this paper is as follows. Section 2 defines aggregate queries and equivalence among aggregate queries. Section 3 presents our framework, in particular the types of rewritings we consider, the cost model for view selection, and a more technical presentation of our results. In section 4, we prove necessary and sufficient conditions for a type of rewriting to exist and provide also negative results. In section 5, we prove that the view-selection problem in NP-complete for sum, count, and provide an exponential-time lower bound on the complexity of view selection for max, min. In Section 6, we give algorithms for obtaining rewritings given a query and views and for selecting views given a query workload. Related Work and Comparison to Ours The problems of rewriting queries using views and of view selection for aggregate queries have 2

3 been considered in papers related with data warehouses and datacubes [GCB + 97, Wid95]; in general, the problem considered in this context was to answer each query (or part of a query) using a single view [ACN00, GHQ95, GHRU97, SDJL96]. Recent work [CNS99] has considered the problem of rewriting a query with aggregation using multiple views with aggregation; to determine whether a rewriting that uses views is equivalent to a query with aggregation, the method is to determine whether the rewriting s unfolding (defined similarly to expansion [Ull97]), which uses base relations only, is equivalent to the query [NSS98]. Thus complete algorithms are obtained that construct rewritings that use multiplication as an aggregate operator and use only aggregate views in the body of the rewritings. In the present paper, we use unfoldings to determine equivalence of a central rewriting to a query and obtain complete algorithms. Our central rewritings use only standard aggregation operators and use any views in the body, including multiaggregate views. On view selection, considerable work has been done on efficiently selecting views such as in the datacube context (e.g., [GHRU97]), where the focus was on getting efficient algorithms for interesting special cases of the problem. Here we focus on obtaining results on the complexity of the view-selection problem for central rewritings in a framework similar to [CHS02]. Other related work on aggregate query rewriting includes [GT03], which considers rewriting aggregate queries using multiple aggregate views over a single relation, and [AAD + 96], which presents fast algorithms for computing the cube operator. [YW01] considers the problem of using views with aggregation to compute queries in temporal databases. Work related to query languages with aggregate capabilities can be found in [BL02], [RSSS98], [ÖÖM87], [LSV02]. [PDST00] proposes a new method for generating alternative query plans, using an interaction of indexes, materialized views, semantic optimization, and query minimization. Finally, results on equivalence of aggregate queries are presented in [CNS99], which establishes that checking the equivalence of unions of sum or count-queries is GI-hard and in PSPACE. (GI is the class of problems that are many-one reducible to the graph isomorphism problem.) It is also shown in [CNS99] that checking equivalence of unions of max-queries is Π p 2-complete, whereas checking equivalence of unions of conjunctive queries without aggregation is NP-complete. 2 Preliminaries A database is a collection of relations. A query is a mapping from databases to databases, where usually the output database (the answer) is a database with a single relation. A relation is viewed as either a set or a bag (a.k.a. multiset) of tuples. A bag can be thought of as a set of elements (we call it the core-set of the bag) with multiplicities attached to each element. A conjunctive query is of the following form: h( s) : g 1 ( s 1 ),..., g k ( s k ). In each subgoal g i ( s i ), predicate g i is a base relation, and every argument in the subgoal is either a variable or a constant. We shall denote the part on the right-hand side of the : (called the body) by A. The part in the left-hand side is called the head. An attribute or variable which is not in the head is called a nondistinguished attribute or variable. An assignment γ for A is a mapping of the variables appearing in A to constants, and of the constants appearing in A to themselves. Assignments are naturally extended to tuples and atoms. For a tuple of variables s = (s 1,..., s k ) we let γ s denote the tuple (γ(s 1 ),..., γ(s k )). Satisfaction of atoms (and of conjunctions of atoms) by an assignment w.r.t a database is defined as follows: g(γ s) is satisfied if the tuple γ s is in the relation that corresponds to the predicate of subgoal g. Under set semantics, a conjunctive query q( s) A defines a new relation q D, for a given set database D, as follows: q D := {γ s γ satisfies A w.r.t. D}. Under bag-set semantics [CV93], a 3

4 conjunctive query q( s) A defines a new multiset relation {{q}} D, for a given set database D, as follows: {{q}} D := {{γ s γ satisfies A w.r.t. D}}. We say that the query is computed under bag semantics [CV93] if both the input database and the answer are bags. In this case, the collection of satisfying assignments is viewed as a multiset. We define equivalence under each of the three types of semantics. Two queries are setequivalent (bag-set-equivalent, bag-equivalent, respectively) if they produce the same set (multiset, respectively) of answers on every database (every set database for the first two cases, every bag database for the third case). When we compute a query, we will say whether we compute it as a bag or as a set, unless obvious from the context. We assume in this paper that the data we want to aggregate are real numbers, R. If S is a set, then M(S) denotes the set of finite multisets over S. A K-ary aggregate function is a function α : M(R k ) R that maps multisets of k-tuples of real numbers to real numbers. An aggregate term is an expression built up using variables and aggregate functions. Every aggregate term gives rise to an aggregate function in a natural way. We use α(y) as an abstract notation for an aggregate term, where y is the variable in the term. The aggregate queries that we consider here have the aggregate functions count, count( ), sum, max, and min. Note that count is over an argument whereas count( ) is the only function that we consider here that takes no argument. In the rest of the paper, we will not refer again to this distinction as our resutls carry over. An aggregate query is a conjunctive query augmented by an aggregate term in its head. Thus it has the syntax: q( s, α(y)) A, (1) where A is a conjunction of predicate atoms that represent relations; α(y) is an aggregate term; s are the grouping attributes of the query; y does not appear among s; all the variables in the head occur in the body. With each aggregate query q as in Equation 1, we associate its core q, which is a conjunctive query: q( s, y) A. (2) For the semantics of an aggregate query we think as follows: Let D be a database and q an aggregate query as in Equation 1. When q is applied on D it yields a new relation q D that is defined by the following three steps: First, we compute the core q on D as a bag B. In the second step, we form equivalence classes in B. Two tuples belong to the same equivalence class if they agree on the values of the grouping attributes. This is the grouping step. The third step is aggregation; it associates with each equivalence class a value that is the aggregate function computed on a bag which contains all values of the input argument of the aggregated attribute in this class. For each class, it returns one tuple which contains the values of the grouping attributes and the computed aggregated value. We say that an aggregate function α is duplicate-insensitive if the result of α computed over a bag of values is the same as the result of α computed over the core set of this bag. Otherwise α is duplicate-sensitive [GHQ95]. We say that an aggregate function α is distributive [GCB + 97] if there is a function γ such that α(a) = γ(α(a)), where A is a multiset. All the four functions we consider are distributive. In fact, for all α, γ = α, except that for count, γ = sum. The following are useful observations. Proposition 2.1 Let Q be an aggregate query with X the grouping tuple and Y the aggregated attribute. Then the following hold: (1) There is a functional dependency X Y ; (2) the answer to Q is set-valued; (3) the projection of the answer to Q on X is set-valued. 4

5 Now we define equivalence between aggregate queries. As two aggregate queries with different aggregate functions may be equivalent but we don t want to treat such cases here, we define equivalence only among compatible queries. Definition 2.1 (Compatible queries) [NSS98] Two queries are compatible if they have identical heads, up to variable renaming. Definition 2.2 (Equivalence of compatible aggregate queries) [NSS98] For two compatible aggregate queries Q( x, α(y)) B( s) and Q ( x, α(y)) B ( s ), Q Q if Q(D) = Q (D) for every database D. Equivalence among aggregate queries is investigated in [CNS99, NSS98] where it is shown that: (1) Two conjunctive queries are bag-set equivalent if and only if they are isomorphic; (2) equivalence of sum-queries and count-queries can be reduced to bag-set equivalence among their cores; (3) equivalence of max-queries can be reduced to set-equivalence between their cores. 3 Our Framework and Contributions 3.1 Rewritings for Aggregate Queries Suppose V is a set of views defined on a database schema S, and suppose D is a database instance with schema S. Then by D V we denote the database obtained by computing all the view relations in V on the database D: D V = V (D). V ɛv Definition 3.1 (Equivalent Rewriting) Let Q be a query defined on database schema S, and let V be a set of views defined on S; let R be a query defined in terms of the views in V. Then Q and R are equivalent, denoted Q R, if and only if for any database D, Q(D) = R(D V ). We say that a view V is set-valued if V is computed and stored to be accessed as a set, and we say that V is bag-valued if V is computed and stored to be accessed as a bag. Whenever in a rewriting, a bag-valued view V will be denoted by an adornement as V b. The following example shows that equivalence of a rewriting to a query is affected depending on whether conjunctive views are set- or bag-valued. EXAMPLE 3.1 We have the following query and one view which is the core of the query. Q(X, count) V (X) Q (X, count) : p(x, Y, Z). : p(x, Y, Z). : V b (X). The rewriting is equivalent to the query as it is, i.e., when the view is bag-valued. However, if the view is set-valued, then there is no equivalence. (Consider the following database: P = {(1, 3, 4), (1, 5, 6)}. On P, the answer to Q has one tuple (1, 2), the answer to the view computed as a set has one tuple (1), and hence the answer to Q has one tuple (1, 1).) 3.2 Central Rewritings Finding rewritings for aggregate queries introduces additional complications when compared to finding rewritings for conjunctive queries without aggregation: Now a decision has to be made as to the following parameters: (1) What kinds of queries are the views. (2) What kind of query is the rewriting. (3) Whether the views are computed under set or bag-set semantics. (4) Moreover, as a consequence of the choice we make, the aggregate function may or may not depend on some aggregated attributes of the views. Our choice is to depend only on the aggregated attribute of a single view, which we call central view. The rest of the views in the rewriting are called noncentral views. Aggregate queries (and views that are defined by aggregate queries) are not symmetrical w.r.t. all their attributes. We call the aggregated attribute the output argument of the query. We do not allow joins on output arguments. Thus in the setting of our paper, we make the following assumptions on the rewritings we consider: 5

6 1. The argument of aggregation in the head of the rewriting comes from exactly one (central) view in the body of the rewriting. We call central aggregate operator the aggregate operator of the central view that contributes to the aggregation in the head (there might be several in the case of multiaggregate central view) and (in the case the central view is purely conjunctive) the aggregate operator in the head of the rewriting. 2. Aggregated outputs of noncentral views are not used in the head of the rewriting. 3. There is no join on output arguments of views. We call such types of rewritings central rewritings. In all our results, we will assume that we consider only central rewritings. We may view our problem now as belonging to one of the following three classes: CQ/CQA when the central view is purely conjunctive and the rewriting has aggregation, CQA/CQ when the central view has aggregation and the rewriting is purely conjunctive, and CQA/CQA when both the central view and the rewriting have aggregation. It is easier to state our results for each class separately. Our rewriting template R for all three rewritings is r( x, α(y)) v 0 ( x 0, y), v b 1( x 1, y 1 ),..., v b k( x k, y k ). (3) where α is a nontrivial aggregate operator in cases CQ/CQA and CQA/CQA, and is an identity in case CQA/CQ (i.e., the head is r( x, y)). Also in the case CQ/CQA, we assume a central view too which covers all subgoals that contain the variable y. Our contribution presented in Section 4 is: For each central rewriting, we obtain sufficient and necessary conditions for a rewriting to exist. This is achieved by using unfoldings of rewritings as explained in the following section. 3.3 Unfoldings of Rewritings Unlike the case of conjunctive queries without aggregation, where it is straightforward how to define and use expansions [Ull97] (unfoldings reduce to expansions in this case), in presence of aggregation there are more complications. Sometimes, unfoldings are not equivalent to the rewritings as we will prove in the section that follows. Here we define unfoldings. We are given a set of views defined as conjunctive aggregate queries over the base predicates, and are given a conjunctive query R over the views. We use to refer to R as a rewriting even in the case when we have not associated it with any particular query (whose rewriting is to be obtained). The unfolding R u of R is a join of all the subgoals of the views in R, followed by some grouping/aggregation. If we denote by B vi the body of a view V i, then an unfolding R u of R is defined as follows: r u ( x, β(y)) B v0 & B v1 &... & B vk. (4) where (1) β is the aggregate operator of the central view of R, if the central view is aggregated, or else is the aggregate operator in the head of R; (2) the variables in the B vi s that are also contained in the x i are retained the same as in the rewriting, whereas the other (non-distinguished variables of the view definition) are replaced by fresh variables that are not used in any other B vj with j i. Moreover, y is the attribute which is aggregated in the definition of the central view V 0 of R (in case V 0 has aggregation). In the purely conjunctive case, the unfolding is equivalent to the expansion [Ull97] of the rewriting. In our framework, we also consider multiaggregate queries and views. In this case, we assume again that only one aggregated attribute from one (central) view is used to compute the aggregated value in the head of the rewriting. Our central rewritings are extended naturally. 3.4 View Selection and Cost Model We want to design minimal-cost views, i.e., those views whose use in the rewriting of a query results in the cheapest computation of the query. We take the assumption that the view relations have been precomputed and stored in the 6

7 database. Thus, we don t assume any cost on computing the views. We assume that the size of a database relation is the number of tuples in it, and that the cost of computing a join is the sum of the sizes of the input relations and of the output relation (this faithfully models the cost of, e.g., hash joins). For conjunctive queries, we measure the cost of query evaluation as the sum of the costs of all the joins during the computation of the query. (We assume that all selections are pushed down as far as they go, and consider only left-linear query trees for joins.) For queries with aggregation, our sum-cost model measures the cost of evaluating a query as the sum of the costs of the three steps in the computation of the query: computation of the conjunctive core, grouping, aggregation. (Let N be the size of the input relation to a unary operator. Then the cost of the grouping operator, which is the same as sorting, is proportional to N log N; the cost of the aggregate operator, which can be computed in a single scan, is N.) Now we present our formulation of the viewselection problem. We assume that we must satisfy a bound (storage limit) on the sum of the sizes of the relations for the views that will be selected to be materialized. Definition 3.2 (view-selection problem) Given a query workload, an oracle that gives view sizes 1, and a storage limit (a positive integer), return a set of view definitions, such that: the views in the set give an equivalent rewriting (of one of our three central rewriting types) of each query in the workload, the view relations satisfy the storage limit, and the total cost of computing the queries using the rewritings is minimum among the view sets that satisfy the previous two conditions. 1 alternatively, given a specific database For the view-selection problem, we prove the following (in section 5): (1) Decidability. (2) NP-hardness, even in the case of queries and views without aggregation. (3) Membership in NP for sum and count aggregate queries. (4) Exponential-time lower bound on the complexity for min and max aggregate queries. 4 Results on Equivalence of Unfoldings and Rewriting We present results that prove that the unfoldings defined in Section 3 are equivalent to the rewritings. We also present negative results that show that our constraints that need to be satisfied for this to hold are tight. As a consequence of the results in this section, equivalence of a rewriting to a query is reduced to equivalence between two aggregate queries (which is known how to check [NSS98]). In brief, for the cases where we prove that the rewriting is equivalent to the query, it suffices to check whether the unfolding is equivalent to the query. 4.1 Case CQ/CQA: central view CQ and rewriting CQA Theorem 4.1 Let R be a CQ/CQA rewriting. Suppose that all noncentral views are without aggregation and are bag-valued. Then R R u. Proof: Here all views are without aggregation. Given a database D on the base relations, the result of computing the bag-join of all views in the body of R is equivalent to computing each view relation separately as a bag and then computing the bag-join of all the views in the body of the rewriting. After that, the same grouping and aggregation is applied in both R and R u. The following result relaxes the requirement for noncentral views in the case of duplicateinsensitive functions. Theorem 4.2 For a CQ/CQA rewriting R with central aggregation max (min), R R u. 7

8 Negative Results Proposition 4.1 Let R be a CQ/CQA rewriting with central aggregation sum or count. Suppose that either there is a noncentral view with aggregation, or there is a set-valued noncentral view. Then the unfolding is not set-equivalent to the rewriting. 4.2 Case CQA/CQ: central view CQA and rewriting CQ Lemma 4.1 For every CQA/CQ rewriting R, if R is equivalent to its unfolding R u, then all grouping attributes of the central view of R appear in the head of R. Query Q 1 in example 1.1 is rewritten using a view whose grouping atributes are a proper subset of the arguments in the head of the rewriting. The following theorem proves equivalence of a rewriting to its unfolding for all aggregate functions that we consider, under some restrictions on the view definitions and on the form of the rewriting. Theorem 4.3 Consider a CQA/CQ rewriting R. Suppose that (i) all noncentral views of R have no aggregation, (ii) R does not have nondistinguished attributes in its body (except possibly noncentral aggregated arguments in R s central view in case of multiaggregate views), (iii) noncentral views do not have nondistinguished attributes in their definition, and (iv) all grouping attributes of the central view appear in the head of R. Then the following hold: R is equivalent to its unfolding R u, and the answer to R on any set-valued database is a set. Although, as we prove in the negative-results section, none of the conditions in Theorem 4.3 can be relaxed for sum or count queries, they can be relaxed for max and min queries: Theorem 4.4 Let R be a CQA/CQ rewriting with central aggregation max (min). Suppose that all the grouping arguments of the central view of R appear in the head of R. Then R is set-valued and is equivalent to its unfolding R u. 8 Negative Results The question arises whether we can extend Theorem 4.3 by relaxing one of the restrictions. Here we prove that it is not possible for aggregate functions sum and count. A counterexample is rewriting Q 2 in Example 1.1. However, it might seem that there could be cases where the unfolding we defined in the previous section does work. In the following proposition, we prove that, for aggregate operators sum or count the following holds: For any rewriting and its unfolding (the way we define unfoldings), such that some of the restrictions in Theorem 4.3 are relaxed, the unfolding is not equivalent to the rewriting. Proposition 4.2 Consider a CQA/CQ query R with central aggregation sum or count. Suppose that noncentral aggregated arguments of R cannot be used in the head of R or in joins in the body of R. Moreover, suppose that at least one of the following holds: 1. There is a noncentral view in R defined by an aggregate query. 2. There is a noncentral view in R defined by a query with nondistinguished variables. 3. There are nondistinguished variables in R (other than noncentral aggregation in the central view of R). Then R is not set-equivalent to its unfolding R u (the way we define R u ). We prove this proposition in three propositions, each relaxing one of the restrictions. The proof techniques are similar in all three cases. 4.3 Case CQA/CQA: central view CQA and rewriting CQA Here, to prove R R u, we choose to prove that the standard query plans for R and R u can be transformed to the same plan R int. We give an example to show our technique. EXAMPLE 4.1 Consider the following rewriting and its unfolding: r(x, T, sum(w )) : v 4 (X, Z, W ), v 5 (Z, T ). v 4 (X, Z, count( )) : p(x, Y, Z).

9 v 5 (Z, T ) r u (X, T, count( )) Let R int be defined as follows: : u(z, T, L). : p(x, Y, Z), u(z, T, L). r int (X, T, sum(w )) : R int (X, T, Z, W ). r int (X, T, Z, count( )) : p(x, Y, Z), u(z, T, L). We show that R R u by showing that R R int and R int R u. Theorem 4.5 Let R be a CQA/CQA rewriting. Suppose that noncentral views are without aggregation and are bag-valued. Then R R u. Negative Results Proposition 4.3 Let R be a CQA/CQA rewriting with central aggregation sum or count. Suppose that either there is a noncentral view with aggregation, or there is a set-valued noncentral view. Then the unfolding is not set-equivalent to the rewriting. 5 View Selection 5.1 Decidability Theorem 5.1 The view-selection problem under the storage limit is decidable for finite workloads of conjunctive queries with aggregation and for conjunctive views and rewritings, with or without aggregation, for the three central rewritings we consider. The query workloads we consider may contain queries both with and without aggregation. 5.2 NP-completeness for sum or count In this section we present an NP-completeness result for the view-selection problem for workloads of sum or count queries. As the proof also works for purely conjunctive queries, views, and rewritings under bag semantics, the viewselection problem for that case is also NPcomplete. (Interestingly, under bag-set semantics, the view-selection problem for conjunctive queries, views, and rewritings has an exponential-time lower bound; cf. [CHS02].) Theorem 5.2 The view-selection problem under the storage limit is NP-complete for finite 9 workloads of conjunctive queries with sum- or count- aggregation and for conjunctive views and rewritings, with or without aggregation, for the three central rewritings we consider. 5.3 Lower Bound for Workloads of max or min We prove an exponential-time lower bound for view selection under a storage limit for max- and min-queries. Theorem 5.3 The view-selection problem under the storage limit has an exponential-time lower bound for finite workloads of conjunctive queries with max- or min- aggregation and for conjunctive views and rewritings, with or without aggregation, for the three central rewritings we consider. 6 Algorithms As a consequence of the results in Section 4, we obtain algorithms which are based on the following observations. Proposition 6.1 In a CQA/CQ rewriting, the set of all grouping attributes of the central view is a subset of the set of all grouping attributes of the rewriting. We call this central view groupingcomplete. In a CQA/CQA rewriting, the set of the grouping attributes of the rewriting is a union of subsets of the grouping attributes in the central view and the non-aggregated attributes in noncentral views. We call this central view groupingincomplete. We consider a rewriting R and define its reduced-core rewriting R r to be a conjunctive rewriting whose head attributes are R s grouping attributes only, and whose body uses reducedcore views. Given an aggregate view V, we define its reduced-core view V r to be a view whose body is the body of V and whose head is a new predicate name V r ; the arguments in the head of V r are all the grouping attributes of V. The reduced-core rewriting is a conjunctive query, and the following holds:

10 Proposition 6.2 Let R r be a reduced-core rewriting of a CQA/CQA or CQA/CQ rewriting R. Then R r is an equivalent rewriting of the reduced-core query using the reduced-core views. 6.1 Constructing Rewritings In this section, given a query and a set of views, we construct all equivalent rewritings of the query using the views. The problem is actually reduced to the problem of obtaining rewritings for purely conjunctive queries. For lack of space, we describe only the case for max queries and CQA/CQA or CQA/CQ rewritings. The other cases are similar with the additional observation that, in the duplicate-sensitive cases, we find rewritings for the purely conjunctive queries whose unfolding is isomorphically mapped on the query. In the following algorithm, Q r and V r are the reduced-core queries of a query Q and of views, respectively. We use an algorithm in the literature [ALU01] to find all rewritings Q r using V r. Procedure Find-R. Input: query Q, set of views V Consider Q r,v r. Find all rewritings of Q r using V r. For each rewriting R r do: Consider the expansion R r exp For each cont. mapping from Q r to R r exp do: If there is a view in the rewriting such that its aggregated attribute is the image of the aggregated attribute of the query, do: Call this the central view. If the central view is grouping-incomplete then construct CQA/CQA rewriting If the central view is grouping-complete then construct CQA/CQ rewriting end end end Theorem 6.1 If there is a central rewriting of a query Q using views V, then the algorithm will find it. 6.2 Selecting Views We present an algorithm that selects multiaggregate views to be used as central views, given a query workload. It is particularly efficient in the case of queries with the HAVING clause, where a single multiaggregate central view saves on using joins on several aggregate views. The algorithm selects all maximal such views. For a query workload, a view is maximal if there does not exist another multiaggregate view with more aggregated arguments which can replace it in all the rewritings in the workload. The algorithm considers each query Q in the workload and constructs a pair of views (Vc Q, Vn Q ) which essentially represent a central minimal view and a collective noncentral view. We may think of the pair (Vc Q, Vn Q ) as providing a rewriting for Q with the minimum number of subgoals in the central view Vc Q. We call them characteristic views of the query Q. In the next step, the algorithm considers all combinations of those pairs and finds compatible pairs of characteristic views. Two pairs are compatible if (1) the two central views can be combined in a single multiaggregate view V m, and (2) V m can be used to rewrite both queries. Proposition Each query has a bounded number of characteristic views. 2. In any central rewriting of a query Q, the views used in the rewriting can also be used to produce central rewritings of characteristic views. 3. It is decidable to tell whether two pairs of characteristic views are compatible. Theorem 6.2 The algorithm finds all maximal multiaggregate views for a query workload. References [AAD + 96] S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proceedings of VLDB, pages ,

11 [ACN00] S. Agrawal, S. Chaudhuri, and V.R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In Proceedings of VLDB, pages , [ALU01] F. Afrati, C. Li, and J.D. Ullman. Generating efficient plans for queries using views. In Proceedings of ACM SIGMOD, [BL02] M. Benedikt and L. Libkin. Aggregate operators in constraint query languages. JCSS, 64: , [CDGLV03] D. Calvanese, G. De Giacomo, M. Lenzerini, and M.Y. Vardi. View-based query containment. In Proc. PODS, pages 56 67, [CHS02] R. Chirkova, A.Y. Halevy, and D. Suciu. A formal perspective on the view selection problem. VLDB Journal, 11(3): , [CNS99] [CV93] S. Cohen, W. Nutt, and A. Serebrenik. Rewriting aggregate queries using views. In Proceedings of PODS, pages , S. Chaudhuri and M. Vardi. Optimization of real conjunctive queries. In Proc. PODS, pages 59 70, [GCB + 97] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, and M. Venkatrao. Data cube: A relational aggregation operator generalizing Group-by, Cross-Tab, and sub totals. Data Mining and Knowledge Discovery, 1(1):29 53, [GHQ95] A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. In Proceedings of VLDB, pages , [GHRU97] [GT03] [Hal01] H. Gupta, V. Harinarayan, A. Rajaraman, and J.D. Ullman. Index selection for OLAP. In Proceedings of ICDE, pages , S. Grumbach and L. Tininini. On the content of materialized aggregate views. JCSS, 66: , Alon Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4): , [HRU96] V. Harinarayan, A. Rajaraman, and J. Ullman. Implementing data cubes efficiently. In Proceedings of SIGMOD, pages , [LSV02] J. Lechtenbörger, H. Shu, and G. Vossen. Aggregate queries over conditional tables. Journal of Intelligent Information Systems, 19(3): , [NSS98] [ÖÖM87] [PDST00] W. Nutt, Y. Sagiv, and S. Shurin. Deciding equivalences among aggregate queries. In Proceedings of PODS, pages , G. Özsoyoglu, Z.M. Özsoyoglu, and V. Matos. Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. TODS, 12: , L. Popa, A. Deutsch, A. Sahuguet, and V. Tannen. A chase too far? SIGMOD Record, 29(2), [RSSS98] K.A. Ross, D. Srivastava, P.J. Stuckey, and S. Sudarshan. Foundations of aggregation constraints. Theoretical Computer Science, 193(1-2): , [SDJL96] D. Srivastava, S. Dar, H.V. Jagadish, and A.Y. Levy. Answering queries with aggregation using 11

12 [Ull97] views. In Proceedings of VLDB, pages , Jeffrey D. Ullman. Information integration using logical views. In Proceedings of ICDT, [Wid95] Jennifer Widom. Research problems in data warehousing. In Proceedings of CIKM, [YW01] J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. In Proceedings of ICDE, pages 51 62, A From Section 4 A.1 Proof of Theorem 4.3 Theorem 4.3 Consider a CQA/CQ rewriting R. Suppose that (i) all noncentral views of R have no aggregation, (ii) R does not have nondistinguished attributes in its body (except possibly noncentral aggregated arguments in R s central view in case of multiaggregate views), (iii) noncentral views do not have nondistinguished attributes in their definition, and (iv) all grouping attributes of the central view appear in the head of R. Then the following hold: R is equivalent to its unfolding R u, and the answer to R on any set-valued database is a set. Proof (sketch): The proof has two parts: Part 1: Suppose the central view V of the rewriting has just one aggregated argument (i.e., we do not consider multiaggregate views). We first show that the answer to R is a set on any setvalued database; thus, it is enough to show set equivalence of R and R u on set-valued databases. We then transform the standard query plan for R into a set-equivalent query plan that is the standard query plan for R u, as follows. We fix a set-valued database D. We observe that V can be computed on D by taking a bag projection of the body of V on the head attributes of V, and by then doing V s grouping and aggregation on the result. We use this observation to argue that we can compute R on D as follows: (1) take a join of the bodies of all the views in R; (2) project the resulting relation on the head attributes of R under bag semantics; (3) group the resulting tuples into equivalence classes, based on the union of the grouping arguments of V and of the head arguments of R, and then aggregate using V s aggregation function; as a result, we obtain the value of V s aggregation for each equivalence class w.r.t. the grouping attributes of the view V. Because the grouping attributes of V are a subset of the head arguments of R, the result of this computation is the relation for R on D. We then observe that it is trivial to transform this plan into standard computation for R u. Part 2 (multiaggregate central view V ): We reduce this case to the previous case by projecting out extra aggregate arguments of the central view V and thus obtaining a new rewriting R. We then argue that R and R have the same unfolding, and use transitivity of set equivalence to show R R u. A.2 Proof of Proposition 4.2 Proposition 4.2 is proven in three parts with similar proofs, each for one of the clauses in the statement. We give here one of the three proofs. Proposition A.1 Consider a CQA/CQ query R with central aggregation sum or count. If at least one noncentral view in R has aggregation (with any aggregation function(s)), and if noncentral aggregated arguments of R cannot be used in the head of R or in joins in the body of R, then R is not set-equivalent to its unfolding R u (the way we define R u ). Proof (sketch): Consider an arbitrary CQA/CQ query R with central aggregation sum or count, such that R has a noncentral view with aggregation; let R u be the unfolding of R. We prove the Proposition by assuming R R u and 12

13 by then constructing a database on which the answers to R and R u are different as sets; we thus arrive at a contradiction. Recall that, by definition of R u, the head variables of R and R u are the same. Here s the idea of what we show on a counterexample database D, for the case where R has a noncentral aggregate view. For a fixed assignment x of the grouping attributes in the head of R u, we ascertain that the answer to R u on D has a tuple, with some value z of the aggregated argument Z of R u. We argue that, for the same assignment x, none of the tuples in the answer to R on D has a value of Z that is equal to z. Thus, the answers to R and R u on D are different as sets. To produce this counterexample, we build a database D in such a way that the body of the aggregate noncentral view V 1 in R has exactly two tuples that correspond to a fixed assignment x of the grouping arguments X of R u ; we build the rest of the database D to ensure that the answer to each of R and R u on D has at least one tuple whose values of X are x. (We build the database D as a union, on each base relation separately, of two canonical databases for R, which result from assigning two different variable names to the argument to be aggregated in V 1.) Now, because V 1 has aggregation, the answer to V 1 on D has exactly one tuple that corresponds to this assignment x; recall that the body of V 1 has two tuples for x. For this reason, when we compute R and R u on the database D, the result of joining all the subgoals of R u has at least two copies of each tuple in the body of the central view of R. Recall that the aggregated argument Z in the head of R u is also the aggregated argument in the head of the central view V of R. We argue that, for this reason, for the assignment x of the grouping arguments of R u, the (only) tuple in the answer to R u on D has the value of Z that is at least twice the value of Z in any tuple for x in the answer to R on D. Indeed, let there be j tuples in the body, on the database D, of the central view V of R. We construct D in such a way that each tuple in the body of V has the value 1 of the argument Y that is aggregated in the head of V. Therefore, the value of Z = α(y ) in the head of V is exactly j (recall that α is either sum or count). By definition, the answer to R on D is obtained by taking a projection, on the head arguments of R, of the result of joining all the subgoals of R. As the subgoals of R include the view V, any tuple for x in the answer to R on D has Z = j. On the other hand, the answer to R u on D is the result of performing R u s aggregation which is the central aggregation (sum or count) of the central view of R on the body of R u. (The body of R u is the result of joining all its subgoals.) The body of R u has at least 2j tuples; the value, in each tuple, of the argument to be aggregated is 1. Thus, the tuple for x in the answer to R u on D has the value of Z that is at least 2j. A.3 Proof of Theorem 4.5 Theorem 4.5 Let R be a CQA/CQA rewriting. Suppose that noncentral views are without aggregation and are bag-valued. Then R R u. Proof (sketch): We show that each of R and R u is equivalent to a query R int (see Example 4.1) whose definition is based on the definitions of R and R u ; then R R u follows from transitivity of equivalence. For a rewriting R defined as r( x, α(y)) v 0 ( x 0, y), v b 1( x 1, y 1 ),..., v b k( x k, y k ). and for its unfolding R u, r u ( x, β(y)) B v0 & B v1 &... & B vk. R int is defined as r int ( x, α(z)) r int ( x x 0, z). (5) r int ( x x 0, β(y)) B v0 & B v1 &... & B vk. Here, α is the aggregate function of R, and β is the aggregate function of R s central view V. We give an intuition for the proof for the case where the aggregation function of the central view V in R is count( ); the proof carries over in a straightforward way to any distributive ag- 13

14 gregation function [GCB + 97]. In the computation of R on an arbitrary database, consider any group G(t) that results, after grouping and aggregation, in a tuple t in the answer to R. Any tuple p in G(t) is the result of joining tuples in views in R, one tuple from each view. Consider a tuple s in V (central view of R) that contributes to the tuple p, and let k be the aggregated value in s. As V s aggregation is count( ), s corresponds to k tuples in the body of V. Thus, each tuple p (with some value k) in each group in R corresponds to k tuples in the body of the central view V of R. We use this observation to see that we can use a query plan for R int to compute R. For each tuple p (with some value k) in the body of R, we have k tuples in the body of Rint. After doing R int s aggregation (the same as V s aggregation) on the union of the grouping attributes of R and V, we obtain, from these k tuples, exactly the tuple p in the body of R int. As the grouping and aggregation are the same in the heads of R and R int, we conclude that R and R int have the same answer on any database. To show that R u and R int have the same answer on any database, we first observe that they are computed on the same relation B = B v0 &... & B vk. We then use the fact that R s aggregate function is distributive, to argue that the two grouping/aggregation steps in computing R int result in the same answer, on the relation B, as the single grouping/aggregation step in computing R u. B From Section 5 B.1 Proof of Theorem 5.1 Theorem 5.1 The view-selection problem under the storage limit is decidable for finite workloads of conjunctive queries with aggregation and for conjunctive views and rewritings, with or without aggregation, for the three central rewritings we consider. Proof (sketch): The proof is a consequence of the fact that views in equivalent rewritings have definitions whose length is bounded by the size of the query. This is true for conjunctive queries and carries over to aggregate queries and central rewritings because of our results on equivalence of unfoldings and rewritings (proved in Section 4) an the results on equivalence of aggregate queries. The combination of these results obtain that the core of the rewriting should be equivalent to the core of the query. Then we argue as in the purely conjunctive case under either semantics. B.2 NP-hardness proof (Theorem 5.2) Proposition B.1 The view-selection problem under the storage limit is NP-hard for finite workloads of conjunctive queries with sum- or count- aggregation and for conjunctive views and rewritings, with or without aggregation, for the three central rewritings we consider. Proof (sketch): We prove the Proposition by reducing an NP-complete problem Partition to the problem of view selection for a single query with sum- or count- aggregation, for each of our three central rewritings. Consider an instance I of Partition, which has n elements a 1,..., a n. We construct an instance J of view selection, in time at most polynomial in the size of I. The instance J has: 1. A sum or count query Q, with n subgoals p i that correspond to the elements in I, and with an extra subgoal p 0 that provides an aggregation argument of Q. 2. An oracle, which gives the size of the relation for each subgoal of the query Q, as 1 for p 0 and as 2 s(ai) for each p i, i > 0; here the integer s(a i ) is the size of the element a i in the instance I of Partition. For any view defined on a subset of the subgoals of the query Q, the oracle gives the size of the relation for the view as a product of the sizes of the relations for the relevant subgoals; the size of Q is 2 S(A), where S(A) is the sum of sizes of all the elements in I. In the full 14

15 proof we argue that the oracle gives view sizes consistently on some database. 3. A central rewriting type (one of CQ/CQA, CQA/CQ, and CQA/CQA). The problem for J is: For the query Q and the set of databases for which the oracle gives the sizes of views as described above, does there exist a rewriting R of the specified type, such that the sum cost (see section 3) of answering the query Q on these databases using the rewriting R does not exceed a numeric value M, which depends on the type of the rewriting R. We then show that an instance I of Partition has a solution if and only if the corresponding instance J of view selection has a solution. Consider the value of M in J : The component of M that represents the cost of computing the body of the rewriting (i.e., all except the final grouping/aggregation) is M = 2 S(A)/2 + 2 S(A)/2 + 2 S(A). The remainder of the proof is an argument that on the databases described by the oracle, the cost of computing the body of a rewriting does not exceed M only if there are exactly two views that have the same size M 0, as given by the oracle, and such that the join of the two views gives the body of the query Q. Now the size M 0 of any such view can only be 2 S(A)/2, otherwise the (sum) cost of joining the views cannot be M. But by construction of the query and of the oracle, the size of a view can be 2 S(A)/2 only if in the instance I of Partition there is a subset A of the set A, such that the total size of the elements of A is S(A)/2. B.3 Proof of Theorem 5.3 Theorem 5.3 The view-selection problem under the storage limit has an exponential-time lower bound for finite workloads of conjunctive queries with max- or min- aggregation and for conjunctive views and rewritings, with or without aggregation, for the three central rewritings we consider. Proof (sketch): We use the construction given in the proof of Theorem 6 in [CHS02]; we take the conjunctive query in the construction and modify it to obtain definitions of queries with aggregation. We then consider rewritings of these queries one for each of the three rewriting types we consider and prove that in each case, an exponential number of fixed views (some of them with aggregation) are the only possible viewset that satisfies a chosen storage limit and gives a minimal-cost rewriting of the query. Here are some details. We take the database schema (relations S 1 through S n ) from the construction in the proof in [CHS02], and change the schemas of two of the relations, to accommodate attributes that would justify the aggregation in each type of central rewritings that we consider; we then construct a database D on which to compute our queries and rewritings. After defining the queries and rewritings on the new schema, we use our results on equivalence of queries with aggregation to their central rewritings to argue that each of the rewritings we produce is equivalent to the corresponding query. Each rewriting has an exponential number of filtering views [ALU01] that, when applied (i.e., joined) together to one of the nonfiltering views in the plan for computing the rewriting on the database, reduce the relation for the view in a way that minimizes the cost of the plan. Finally, for each rewriting we set a storage limit as the amount of space that is just enough to store the relations, on the database D, for an exponential number of views that we have fixed in each rewriting. Similarly to the proof in [CHS02], we show that (1) the cost of computing the queries using the chosen views and rewritings is lower than the cost of computing the queries without views, and (2) for any other viewset that could produce lower-cost plans to compute the queries on the database D, the relations for the viewset do not satisfy the storage limit. In particular, by construction of the database D, our fixed views with aggregation are more beneficial than views (without aggregation) that are their cores. 15

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons

Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons Foto Afrati 1, Rada Chirkova 2, Manolis Gergatsoulis 3, and Vassia Pavlaki 1 1 Department of Electrical and Computing Engineering,

More information

Designing and Using Views To Improve Performance of Aggregate Queries (September 9, 2004)

Designing and Using Views To Improve Performance of Aggregate Queries (September 9, 2004) Designing and Using Views To Improve Performance of Aggregate Queries (September 9, 2004) Foto Afrati 1, Rada Chirkova 2, Shalu Gupta 2, and Charles Loftis 2 1 Computer Science Division, National Technical

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data? Diego Calvanese University of Rome La Sapienza joint work with G. De Giacomo, M. Lenzerini, M.Y. Vardi

More information

Lecture 1: Conjunctive Queries

Lecture 1: Conjunctive Queries CS 784: Foundations of Data Management Spring 2017 Instructor: Paris Koutris Lecture 1: Conjunctive Queries A database schema R is a set of relations: we will typically use the symbols R, S, T,... to denote

More information

Schema Design for Uncertain Databases

Schema Design for Uncertain Databases Schema Design for Uncertain Databases Anish Das Sarma, Jeffrey Ullman, Jennifer Widom {anish,ullman,widom}@cs.stanford.edu Stanford University Abstract. We address schema design in uncertain databases.

More information

Provable data privacy

Provable data privacy Provable data privacy Kilian Stoffel 1 and Thomas Studer 2 1 Université de Neuchâtel, Pierre-à-Mazel 7, CH-2000 Neuchâtel, Switzerland kilian.stoffel@unine.ch 2 Institut für Informatik und angewandte Mathematik,

More information

The Inverse of a Schema Mapping

The Inverse of a Schema Mapping The Inverse of a Schema Mapping Jorge Pérez Department of Computer Science, Universidad de Chile Blanco Encalada 2120, Santiago, Chile jperez@dcc.uchile.cl Abstract The inversion of schema mappings has

More information

Using Views to Generate Efficient Evaluation Plans for Queries

Using Views to Generate Efficient Evaluation Plans for Queries Using Views to Generate Efficient Evaluation Plans for Queries Foto N. Afrati a,chenli b, and Jeffrey D. Ullman c a School of Electrical and Computing Engineering, National Technical University of Athens,

More information

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries.

Conjunctive queries. Many computational problems are much easier for conjunctive queries than for general first-order queries. Conjunctive queries Relational calculus queries without negation and disjunction. Conjunctive queries have a normal form: ( y 1 ) ( y n )(p 1 (x 1,..., x m, y 1,..., y n ) p k (x 1,..., x m, y 1,..., y

More information

Graph Theory Questions from Past Papers

Graph Theory Questions from Past Papers Graph Theory Questions from Past Papers Bilkent University, Laurence Barker, 19 October 2017 Do not forget to justify your answers in terms which could be understood by people who know the background theory

More information

Algorithms for Rewriting Aggregate Queries Using Views

Algorithms for Rewriting Aggregate Queries Using Views Algorithms for Rewriting Aggregate Queries Using Views Sara Cohen Computer Science Dept. The Hebrew University sarina@cs.huji.ac.il Werner Nutt German Research Center for Artificial Intelligence GmbH Werner.Nutt@dfki.de

More information

Query Containment for Data Integration Systems

Query Containment for Data Integration Systems Query Containment for Data Integration Systems Todd Millstein University of Washington Seattle, Washington todd@cs.washington.edu Alon Levy University of Washington Seattle, Washington alon@cs.washington.edu

More information

following syntax: R ::= > n j P j $i=n : C j :R j R 1 u R 2 C ::= > 1 j A j :C j C 1 u C 2 j 9[$i]R j (» k [$i]r) where i and j denote components of r

following syntax: R ::= > n j P j $i=n : C j :R j R 1 u R 2 C ::= > 1 j A j :C j C 1 u C 2 j 9[$i]R j (» k [$i]r) where i and j denote components of r Answering Queries Using Views in Description Logics Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini Dipartimento di Informatica e Sistemistica, Universit a di Roma La Sapienza" Via Salaria 113,

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

A Practical Algorithm for Reformulation of Deductive Databases

A Practical Algorithm for Reformulation of Deductive Databases A Practical Algorithm for Reformulation of Deductive Databases Michael Genesereth and Abhijeet Mohapatra Stanford University, Stanford, CA - 94305, USA {genesereth, abhijeet}@cs.stanford.edu Abstract.

More information

Quotient Cube: How to Summarize the Semantics of a Data Cube

Quotient Cube: How to Summarize the Semantics of a Data Cube Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)

More information

Incomplete Databases: Missing Records and Missing Values

Incomplete Databases: Missing Records and Missing Values Incomplete Databases: Missing Records and Missing Values Werner Nutt, Simon Razniewski, and Gil Vegliach Free University of Bozen-Bolzano, Dominikanerplatz 3, 39100 Bozen, Italy {nutt, razniewski}@inf.unibz.it,

More information

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered ynamic ata Warehouse esign? imitri Theodoratos Timos Sellis epartment of Electrical and Computer Engineering Computer Science ivision National Technical University of Athens Zographou 57 73, Athens, Greece

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht

More information

Superconcentrators of depth 2 and 3; odd levels help (rarely)

Superconcentrators of depth 2 and 3; odd levels help (rarely) Superconcentrators of depth 2 and 3; odd levels help (rarely) Noga Alon Bellcore, Morristown, NJ, 07960, USA and Department of Mathematics Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W. Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, 2009. MAINTENANCE OF RECURSIVE VIEWS Suzanne W. Dietrich Arizona State University http://www.public.asu.edu/~dietrich

More information

Algorithms for Rewriting Aggregate Queries Using Views Sara Cohen Werner Nutt Alexander Serebrenik Report CW 292, May 2000 Department of Computer Scie

Algorithms for Rewriting Aggregate Queries Using Views Sara Cohen Werner Nutt Alexander Serebrenik Report CW 292, May 2000 Department of Computer Scie Algorithms for Rewriting Aggregate Queries Using Views Sara Cohen Werner Nutt Alexander Serebrenik Report CW 292, May 2000 n Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Databases Lectures 1 and 2

Databases Lectures 1 and 2 Databases Lectures 1 and 2 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lectures 1 and 2 DB 2009 1 / 36 Re-ordered Syllabus

More information

Chapter 3. Set Theory. 3.1 What is a Set?

Chapter 3. Set Theory. 3.1 What is a Set? Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any

More information

Query Evaluation using Overlapping Views: Completeness and Efficiency

Query Evaluation using Overlapping Views: Completeness and Efficiency Query Evaluation using Overlapping Views: Completeness and Efficiency Gang Gou, Maxim Kormilitsin, Rada Chirkova Computer Science Department, North Carolina State University Campus Box 8206, Raleigh, NC

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Computing Data Cubes Using Massively Parallel Processors

Computing Data Cubes Using Massively Parallel Processors Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Abstract We present two parameterized algorithms for the Minimum Fill-In problem, also known as Chordal

More information

On the Hardness of Counting the Solutions of SPARQL Queries

On the Hardness of Counting the Solutions of SPARQL Queries On the Hardness of Counting the Solutions of SPARQL Queries Reinhard Pichler and Sebastian Skritek Vienna University of Technology, Faculty of Informatics {pichler,skritek}@dbai.tuwien.ac.at 1 Introduction

More information

Core Membership Computation for Succinct Representations of Coalitional Games

Core Membership Computation for Succinct Representations of Coalitional Games Core Membership Computation for Succinct Representations of Coalitional Games Xi Alice Gao May 11, 2009 Abstract In this paper, I compare and contrast two formal results on the computational complexity

More information

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989 University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science October 1989 P Is Not Equal to NP Jon Freeman University of Pennsylvania Follow this and

More information

Query Rewriting Using Views in the Presence of Inclusion Dependencies

Query Rewriting Using Views in the Presence of Inclusion Dependencies Query Rewriting Using Views in the Presence of Inclusion Dependencies Qingyuan Bai Jun Hong Michael F. McTear School of Computing and Mathematics, University of Ulster at Jordanstown, Newtownabbey, Co.

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

On Reconciling Data Exchange, Data Integration, and Peer Data Management

On Reconciling Data Exchange, Data Integration, and Peer Data Management On Reconciling Data Exchange, Data Integration, and Peer Data Management Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati Dipartimento di Informatica e Sistemistica Sapienza

More information

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}. Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Foundations of Databases

Foundations of Databases Foundations of Databases Free University of Bozen Bolzano, 2004 2005 Thomas Eiter Institut für Informationssysteme Arbeitsbereich Wissensbasierte Systeme (184/3) Technische Universität Wien http://www.kr.tuwien.ac.at/staff/eiter

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog

}Optimization Formalisms for recursive queries. Module 11: Optimization of Recursive Queries. Module Outline Datalog Module 11: Optimization of Recursive Queries 11.1 Formalisms for recursive queries Examples for problems requiring recursion: Module Outline 11.1 Formalisms for recursive queries 11.2 Computing recursive

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Virtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1)

Virtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1) Virtual views Incremental View Maintenance CPS 296.1 Topics in Database Systems A view is defined by a query over base tables Example: CREATE VIEW V AS SELECT FROM R, S WHERE ; A view can be queried just

More information

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline

}Optimization. Module 11: Optimization of Recursive Queries. Module Outline Module 11: Optimization of Recursive Queries Module Outline 11.1 Formalisms for recursive queries 11.2 Computing recursive queries 11.3 Partial transitive closures User Query Transformation & Optimization

More information

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY ALGEBRAIC METHODS IN LOGIC AND IN COMPUTER SCIENCE BANACH CENTER PUBLICATIONS, VOLUME 28 INSTITUTE OF MATHEMATICS POLISH ACADEMY OF SCIENCES WARSZAWA 1993 ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING

More information

Computing Full Disjunctions

Computing Full Disjunctions Computing Full Disjunctions (Extended Abstract) Yaron Kanza School of Computer Science and Engineering The Hebrew University of Jerusalem yarok@cs.huji.ac.il Yehoshua Sagiv School of Computer Science and

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

STABILITY AND PARADOX IN ALGORITHMIC LOGIC

STABILITY AND PARADOX IN ALGORITHMIC LOGIC STABILITY AND PARADOX IN ALGORITHMIC LOGIC WAYNE AITKEN, JEFFREY A. BARRETT Abstract. Algorithmic logic is the logic of basic statements concerning algorithms and the algorithmic rules of deduction between

More information

Line Graphs and Circulants

Line Graphs and Circulants Line Graphs and Circulants Jason Brown and Richard Hoshino Department of Mathematics and Statistics Dalhousie University Halifax, Nova Scotia, Canada B3H 3J5 Abstract The line graph of G, denoted L(G),

More information

Exact and Inexact Methods for Selecting Views and Indexes for OLAP Performance Improvement

Exact and Inexact Methods for Selecting Views and Indexes for OLAP Performance Improvement Exact and Inexact Methods for Selecting Views and Indexes for OLAP Performance Improvement (extended abstract) Zohreh Asgharzadeh Talebi Operations Research Program NC State University Raleigh, NC 27695

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

Lecture 2 - Introduction to Polytopes

Lecture 2 - Introduction to Polytopes Lecture 2 - Introduction to Polytopes Optimization and Approximation - ENS M1 Nicolas Bousquet 1 Reminder of Linear Algebra definitions Let x 1,..., x m be points in R n and λ 1,..., λ m be real numbers.

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Sanjay Gandhi G 1, Dr.Balaji S 2 Associate Professor, Dept. of CSE, VISIT Engg College, Tadepalligudem, Scholar Bangalore

More information

Parameterized Complexity of Independence and Domination on Geometric Graphs

Parameterized Complexity of Independence and Domination on Geometric Graphs Parameterized Complexity of Independence and Domination on Geometric Graphs Dániel Marx Institut für Informatik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. dmarx@informatik.hu-berlin.de

More information

XI International PhD Workshop OWD 2009, October Fuzzy Sets as Metasets

XI International PhD Workshop OWD 2009, October Fuzzy Sets as Metasets XI International PhD Workshop OWD 2009, 17 20 October 2009 Fuzzy Sets as Metasets Bartłomiej Starosta, Polsko-Japońska WyŜsza Szkoła Technik Komputerowych (24.01.2008, prof. Witold Kosiński, Polsko-Japońska

More information

Cube-Lifecycle Management and Applications

Cube-Lifecycle Management and Applications Cube-Lifecycle Management and Applications Konstantinos Morfonios National and Kapodistrian University of Athens, Department of Informatics and Telecommunications, University Campus, 15784 Athens, Greece

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Infinite locally random graphs

Infinite locally random graphs Infinite locally random graphs Pierre Charbit and Alex D. Scott Abstract Motivated by copying models of the web graph, Bonato and Janssen [3] introduced the following simple construction: given a graph

More information

Math 5593 Linear Programming Lecture Notes

Math 5593 Linear Programming Lecture Notes Math 5593 Linear Programming Lecture Notes Unit II: Theory & Foundations (Convex Analysis) University of Colorado Denver, Fall 2013 Topics 1 Convex Sets 1 1.1 Basic Properties (Luenberger-Ye Appendix B.1).........................

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

arxiv: v1 [cs.db] 23 May 2016

arxiv: v1 [cs.db] 23 May 2016 Complexity of Consistent Query Answering in Databases under Cardinality-Based and Incremental Repair Semantics (extended version) arxiv:1605.07159v1 [cs.db] 23 May 2016 Andrei Lopatenko Free University

More information

Basic Graph Theory with Applications to Economics

Basic Graph Theory with Applications to Economics Basic Graph Theory with Applications to Economics Debasis Mishra February, 0 What is a Graph? Let N = {,..., n} be a finite set. Let E be a collection of ordered or unordered pairs of distinct elements

More information

Lecture Notes on Program Equivalence

Lecture Notes on Program Equivalence Lecture Notes on Program Equivalence 15-312: Foundations of Programming Languages Frank Pfenning Lecture 24 November 30, 2004 When are two programs equal? Without much reflection one might say that two

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

Approximation Algorithms: The Primal-Dual Method. My T. Thai

Approximation Algorithms: The Primal-Dual Method. My T. Thai Approximation Algorithms: The Primal-Dual Method My T. Thai 1 Overview of the Primal-Dual Method Consider the following primal program, called P: min st n c j x j j=1 n a ij x j b i j=1 x j 0 Then the

More information

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

CPSC 536N: Randomized Algorithms Term 2. Lecture 10 CPSC 536N: Randomized Algorithms 011-1 Term Prof. Nick Harvey Lecture 10 University of British Columbia In the first lecture we discussed the Max Cut problem, which is NP-complete, and we presented a very

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Complexity of Answering Queries Using Materialized Views

Complexity of Answering Queries Using Materialized Views Complexity of Answering Queries Using Materialized Views Serge Abiteboul, Olivier Duschka To cite this version: Serge Abiteboul, Olivier Duschka. Complexity of Answering Queries Using Materialized Views.

More information

Foundations of Schema Mapping Management

Foundations of Schema Mapping Management Foundations of Schema Mapping Management Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile University of Edinburgh Oxford University marenas@ing.puc.cl jperez@ing.puc.cl juan.reutter@ed.ac.uk

More information

Monotone Paths in Geometric Triangulations

Monotone Paths in Geometric Triangulations Monotone Paths in Geometric Triangulations Adrian Dumitrescu Ritankar Mandal Csaba D. Tóth November 19, 2017 Abstract (I) We prove that the (maximum) number of monotone paths in a geometric triangulation

More information

8 Matroid Intersection

8 Matroid Intersection 8 Matroid Intersection 8.1 Definition and examples 8.2 Matroid Intersection Algorithm 8.1 Definitions Given two matroids M 1 = (X, I 1 ) and M 2 = (X, I 2 ) on the same set X, their intersection is M 1

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

V Locking Protocol for Materialized Aggregate Join Views on B-tree Indices Gang Luo IBM T.J. Watson Research Center

V Locking Protocol for Materialized Aggregate Join Views on B-tree Indices Gang Luo IBM T.J. Watson Research Center V Locking Protocol for Materialized Aggregate Join Views on B-tree Indices Gang Luo IBM T.J. Watson Research Center luog@us.ibm.com Abstract. Immediate materialized view maintenance with transactional

More information

Module 11. Directed Graphs. Contents

Module 11. Directed Graphs. Contents Module 11 Directed Graphs Contents 11.1 Basic concepts......................... 256 Underlying graph of a digraph................ 257 Out-degrees and in-degrees.................. 258 Isomorphism..........................

More information

PCP and Hardness of Approximation

PCP and Hardness of Approximation PCP and Hardness of Approximation January 30, 2009 Our goal herein is to define and prove basic concepts regarding hardness of approximation. We will state but obviously not prove a PCP theorem as a starting

More information

Aggregate Queries over Conditional Tables

Aggregate Queries over Conditional Tables ggregate Queries over Conditional Tables J. Lechtenbörger, H. Shu,. Vossen Bericht Nr. 9/00 I ggregate Queries over Conditional Tables Jens Lechtenbörger Hua Shu ottfried Vossen University of Münster Karlstad

More information

EDGE-COLOURED GRAPHS AND SWITCHING WITH S m, A m AND D m

EDGE-COLOURED GRAPHS AND SWITCHING WITH S m, A m AND D m EDGE-COLOURED GRAPHS AND SWITCHING WITH S m, A m AND D m GARY MACGILLIVRAY BEN TREMBLAY Abstract. We consider homomorphisms and vertex colourings of m-edge-coloured graphs that have a switching operation

More information

CS 512, Spring 2017: Take-Home End-of-Term Examination

CS 512, Spring 2017: Take-Home End-of-Term Examination CS 512, Spring 2017: Take-Home End-of-Term Examination Out: Tuesday, 9 May 2017, 12:00 noon Due: Wednesday, 10 May 2017, by 11:59 am Turn in your solutions electronically, as a single PDF file, by placing

More information

Element Algebra. 1 Introduction. M. G. Manukyan

Element Algebra. 1 Introduction. M. G. Manukyan Element Algebra M. G. Manukyan Yerevan State University Yerevan, 0025 mgm@ysu.am Abstract. An element algebra supporting the element calculus is proposed. The input and output of our algebra are xdm-elements.

More information

Computer Science Technical Report

Computer Science Technical Report Computer Science Technical Report Feasibility of Stepwise Addition of Multitolerance to High Atomicity Programs Ali Ebnenasir and Sandeep S. Kulkarni Michigan Technological University Computer Science

More information

An Overview of Cost-based Optimization of Queries with Aggregates

An Overview of Cost-based Optimization of Queries with Aggregates An Overview of Cost-based Optimization of Queries with Aggregates Surajit Chaudhuri Hewlett-Packard Laboratories 1501 Page Mill Road Palo Alto, CA 94304 chaudhuri@hpl.hp.com Kyuseok Shim IBM Almaden Research

More information

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining

Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining Frank Dehne 1,ToddEavis 2, and Andrew Rau-Chaplin 2 1 Carleton University, Ottawa, Canada, frank@dehne.net, WWW home page: http://www.dehne.net

More information

receiving excessively large amounts of data from the server, as in the database-as-a-service scenario; in the mediation scenario, the client (mediator

receiving excessively large amounts of data from the server, as in the database-as-a-service scenario; in the mediation scenario, the client (mediator Materializing Views with Minimum Size to Answer Queries (Technical Report) Rada Chirkova North Carolina State University chirkova@csc.ncsu.edu Chen Li UC Irvine chenli@ics.uci.edu Jia Li UC Irvine jiali@ics.uci.edu

More information

NP-Completeness of 3SAT, 1-IN-3SAT and MAX 2SAT

NP-Completeness of 3SAT, 1-IN-3SAT and MAX 2SAT NP-Completeness of 3SAT, 1-IN-3SAT and MAX 2SAT 3SAT The 3SAT problem is the following. INSTANCE : Given a boolean expression E in conjunctive normal form (CNF) that is the conjunction of clauses, each

More information

On the Codd Semantics of SQL Nulls

On the Codd Semantics of SQL Nulls On the Codd Semantics of SQL Nulls Paolo Guagliardo and Leonid Libkin School of Informatics, University of Edinburgh Abstract. Theoretical models used in database research often have subtle differences

More information

Using Statistics for Computing Joins with MapReduce

Using Statistics for Computing Joins with MapReduce Using Statistics for Computing Joins with MapReduce Theresa Csar 1, Reinhard Pichler 1, Emanuel Sallinger 1, and Vadim Savenkov 2 1 Vienna University of Technology {csar, pichler, sallinger}@dbaituwienacat

More information

Lecture 19 Thursday, March 29. Examples of isomorphic, and non-isomorphic graphs will be given in class.

Lecture 19 Thursday, March 29. Examples of isomorphic, and non-isomorphic graphs will be given in class. CIS 160 - Spring 2018 (instructor Val Tannen) Lecture 19 Thursday, March 29 GRAPH THEORY Graph isomorphism Definition 19.1 Two graphs G 1 = (V 1, E 1 ) and G 2 = (V 2, E 2 ) are isomorphic, write G 1 G

More information

Structural characterizations of schema mapping languages

Structural characterizations of schema mapping languages Structural characterizations of schema mapping languages Balder ten Cate INRIA and ENS Cachan (research done while visiting IBM Almaden and UC Santa Cruz) Joint work with Phokion Kolaitis (ICDT 09) Schema

More information

Inverting Schema Mappings: Bridging the Gap between Theory and Practice

Inverting Schema Mappings: Bridging the Gap between Theory and Practice Inverting Schema Mappings: Bridging the Gap between Theory and Practice Marcelo Arenas Jorge Pérez Juan Reutter Cristian Riveros PUC Chile PUC Chile PUC Chile R&M Tech marenas@ing.puc.cl jperez@ing.puc.cl

More information