Fighting Redundancy in SQL: the For-Loop Approach

Size: px
Start display at page:

Download "Fighting Redundancy in SQL: the For-Loop Approach"

Transcription

1 Fighting Redundancy in SQL: the For-Loop Approach Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY July 8, Introduction SQL is the standard query language for relational databases. However, it has some limitations, especially in areas like Decision Support, that have been noted in the literature ([18, 13]). In this paper, we study a class of Decision-Support SQL queries, characterize them and show how to process them in an improved manner. In particular, we analyze queries containing subqueries, where the subquery returns a single result (i.e. it has an aggregate function on its SELECT clause). These are called type-a and type-ja in [17]. In many of these queries, SQL exhibits redundancy in that FROM and WHERE clauses of query and subquery show a great deal of overlap. We argue that these patterns are currently not well supported by relational query processors. In particular, we show that more than one pass over the base relations in the database is necessary in order to compute the answer for such queries with traditional optimization techniques. However, this is not strictly necessary. We call this situation the two-pass problem. The following example gives some intuition about our proposal. Example 1 The TPC-H benchmark ([33]) is a popular reference point for Decision Support; it defines a data warehouse schema and a set of queries. The schema contains two large fact tables and a series of dimension tables which have been normalized (i.e. it s a snowflake schema). Query 2 is a typical query which shows a great deal of overlap between query and subquery: select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like %BRASS and r_name = EUROPE and s_nationkey = n_nationkey and n_regionkey = r_regionkey and ps_supplycost = (select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = EUROPE ) order by s_acctbal desc, n_name, s_name, p_partkey; This query is executed in most systems by using unnesting techniques. However, the commonality between query and subquery will not be detected, and all operations (including common joins and selections) will be repeated (see an in-depth discussion of this example in subsection 5.1). Our goal is to avoid duplication of effort. This research was sponsored by NSF under grant IIS A full version of this paper is available as a technical report at 1

2 Our method applies only to aggregated subqueries that contain WHERE clauses overlapping with the main query s WHERE clause. This may seem a very narrow type of queries until one realizes that all types of SQL subqueries can be rewritten as aggregated subqueries (EXISTS, for instance, can be rewritten as a subquery with COUNT; all other types of subqueries can be rewritten similarly ([3]). Therefore, the approach is potentially applicable to any SQL query with subqueries. Also, it is important to point out that the redundancy is present because of the structure of SQL, which necessitates a subquery in order to declaratively state the aggregation to be computed. Thus, we argue that such redundancy is not infrequent ([20]). With the addition of user-defined methods to SQL, detecting and dealing with redundancy is even more important, as many time such methods are expensive to compute and it is hard for the optimizer to decide whether to push them down or not ([15]). In this paper we describe an optimization method geared towards detecting and optimizing this redundancy. Our method not only computes the redundant part only once, but also proposes a new special operator to compute the rest of the query very effectively. Thus, the method is not general-purpose, but it has the potential to outperform traditional methods in the queries to which it applies. In section 2 we describe our approach and the new operator in more detail. In section 3 we show how the operator is implemented as a program. In section 4 we show how one such program can be generated for a given SQL query. In section 5 we show how to estimate the cost of query plans produced by our approach, and describe an experiment ran on the context of the TPC-H benchmark ([33]). In section 6 we discuss some related work on optimizing complex SQL queries. Finally, in section 7 we propose some further research. 2 Optimization of Redundancy In this section we try to capture the intuition of our previous example by defining patterns which detect redundancy in SQL queries. We then show how to use the matching of patterns and SQL queries to produce a query plan which avoids repeating computations. We represent SQL queries in an schematic form or pattern. With the keywords SELECT... FROM... WHERE we will use L, L 1, L 2,... as variables over a list of attributes; T, T 1, T 2,... as variables over a list of relations, F, F 1, F 2,... as variables over aggregate functions and, 1, 2,... as variables over (complex) conditions. Attributes will be represented by attr, attr 1, attr 2,.... If there is a condition in the WHERE clause of the subquery which introduces correlation it will be shown explicitly; this is called the correlation condition. The table to which the correlated attribute belongs is called the correlation table, and is said to introduce the correlation; the attribute compared to the correlated attribute is called the correlating attribute. Also, the condition that connects query and subquery (called a linking condition) is also shown explicitly. The operator in the linking condition is called the linking operator, the attributes the linking attributes and the aggregate function on the subquery side is called the linking aggregate. We will say that a pattern matches an SQL query when there is a correspondence g between the variables in the pattern and the elements of the query. Example 2 The pattern SELECT L FROM T WHERE 1 AND attr 1 θ (SELECT F(attr 2 ) FROM T WHERE 2 ) would match the query from example 1 by setting g( 1 ) = {p partkey = ps partkey and s suppkey = ps suppkey and p size = 15 and p type like %BRASS and r name = EUROPE and s nationkey = n nationkey and n regionkey = r regionkey }, g( 2 ) = {p partkey = ps partkey and s suppkey = ps suppkey and r name = EUROPE and s nationkey = n nationkey and n regionkey = r regionkey}, g(t) = {part,supplier,partuspp,nation,region}, g(f) = min and g(attr 1 ) = g(attr 2 ) = ps supplycost. Note that the T symbol appears twice so the pattern forces the query to have the same FROM clauses in the main query and in the subquery 1. The correlation condition is p partkey = ps partkey; the correlation table is part, and ps partkey is the the correlating attribute. The linking condition here is ps supplycost = min(ps suplycost); thus ps supplycost is the linking attribute, = the linking operator and min the linking aggregate. 1 For correlated subqueries, the correlation table is counted as present in the FROM clause of the subquery. 2

3 The basic idea is to divide the work to be done in three parts: one that is common to query and subquery, one that belongs only to the subquery, and one that belongs only to the main query 2. The part that is common to both query and subquery can be done only once; however, as we argue in subsection 5.1 in most systems today it would be done twice. We calculate the three parts above as follows: the common part is g( 1 ) g( 2 ); the part proper to the main query is g( 1 ) g( 2 ); and the part proper to the subquery is g( 2 ) g( 1 ). In example 1, this yields { p partkey = ps partkey and s suppkey = ps suppkey and r name = EUROPE and s nationke {p size = 15 and p type like %BRASS } and, respectively. We use this matching in constructing a program to compute this query. The process is explained in the next subsection. 2.1 The For-Loop Operator We start out with the common part, called the base relation, in order to ensure that it is not done twice. The base relation can be expressed as an SPJ query (in the above example, this would include all the joins and the condition r name = EUROPE ). Our strategy is to compute the rest of the query starting from this base relation. This strategy faces two difficulties. First, if we simply divide the query based on common parts we obtain a plan where redundancy is eliminated at the price of fixing the order of some operations. In particular, joins in the common part are performed together, and selections in the common part are performed with them. Hence, it is unclear whether this strategy will provide significant improvements by itself. This situation is similar to that of [24]. Second, when starting from the base relation, we face a problem in that this relation has to be used for two different purposes: it must be used to compute an aggregate after finishing up the WHERE clause in the subquery (i.e. after computing g( 2 ) g( 1 )); and it must be used to finish up the WHERE clause in the main query (i.e. to compute g( 1 ) g( 2 )) and then, using the result of the previous step, compute the final answer to the query. However, it is extremely hard in relational algebra to combine the operators involved. For instance, the computation of an aggregate must be done before the aggregate can be used in a selection condition. Also, in a non-correlated subquery conditions coming from the subquery affect the computation of the aggregate, but should not affect which tuples are considered for the final result; vice versa, conditions from the main query affect which tuples may make it into the final result, but should not affect the computation of the aggregate. In order to solve this problem, we define a new operator, called the for-loop, which combines several relational operators into a new one (i.e. a macro-operator). This strategy is similar to others defined in the recent literature on query optimization, which introduce special-purpose relational operators ([4, 10]). The approach is based on the observation that some basic operations appear frequently together and they could be more efficiently implemented as a whole. In our particular case, we show in the next subsection that there is an efficient implementation of the for-loop operator which allows it, in some cases, to compute several basic operators with one pass over the data, thus saving considerable disk I/O. Definition 2.1 Let R be a relation, sch(r) the schema of R, L sch(r), A sch(r), F an aggregate function, α a condition on R (i.e. involving only attributes of sch(r)) and β a condition on sch(r) {F (A)} (i.e. involving attributes of sch(r) and possibly F (A)). Then for-loop operator is defined as either one of the following: 1. F L L,F (A),α,β (R). The meaning of the operator is defined as follows: let T emp be the relation GB L,F (A) (σ α (R)) (GB is used to indicate a group-by operation). Then the for-loop yields relation σ β (R R.L=T emp.l T emp), where the condition of the join is understood as the pairwise equality of each attribute in L. This is called a grouped for-loop. 2. F L F (A),α,β (R). The meaning of the operator is given by σ β (AGG F (A) (σ α (R))) R, where AGG F (A) (R) indicates the aggregate F computed over all A values of R. This is called a flat for-loop. Note that β may contain aggregated attributes as part of a condition. In fact, in the typical use in our approach, it does contains an aggregation. The main use of a for-loop is to calculate the linking condition 2 We are assuming that all relations mentioned in a query are connected; i.e. that there are no Cartesian products present, only joins. Therefore, when there is overlap between query and subquery FROM clause, we are very likely to find common conditions in both WHERE clauses (at least the joins). 3

4 of a query with an aggregated subquery on the fly, possibly with additional selections. Thus, for instance, in example 1 the for-loop would take the flat form F L p partkey,min(ps supplycost),,p size=15 p type LIKE %BRASS ps suplycost=min(ps supplycost) (R), where R is the relation obtained by computing the base relation 3. The for-loop is equivalent to the relational expression σ p size=15 p type LIKE %BRASS ps suplycost=min(ps supplycost) (AGG min(ps supplycost) (R) R). It can be seen that this expression will compute the original SQL query; the aggregation will compute the aggregate function of the subquery (the conditions in the WHERE clause of the subquery have already been computed in R, since in this case 2 1 and hence 2 1 = ), and the Cartesian product will put a copy of this aggregate on each tuple, allowing the linking condition to be stated as a regular condition over the resulting relation. Note that this expression may not be better, from a cost point of view, than other plans produced by standard optimization. What makes this plan attractive is that the for-loop operator can be implemented in such a way that it computes its output with one pass over the data. In particular, the implementation will not carry out any Cartesian product, which is used only to explain the semantics of the operator. In addition, we compute the redundant part only once. 3 Implementation of the For-loop Operator To achieve the objective of computing several results at once with a single pass over the data, the operator will be written as an iterator that loops over the input implementing a simple program (hence the name). The basic idea is twofold: first, selections and groupings (either grouping alone or together with aggregate calculations) can be effectively implemented in one algorithm, even if algebraically is difficult to integrate them ([14, 12]); second, and more important, in some cases computing an aggregation and using the aggregate result in a selection can be done at the same time. This is due to the behavior of some aggregates and the semantics of the conditions involved. Assume, for instance, that we have a comparison of the type att = min(attr2), where both attr and attr2 are attributes of some table R. In this case, as we go on computing the minimum for a series of values, we can actually decide, as we iterate over R, whether some tuples will make the condition true or not ever. This is due to the fact that min is monotonically non-increasing, i.e. as we iterate over R and we carry a current minimum, this value will always stay the same or decrease, never increase. Since equality imposes a very strict constraint, we can take a decision on the current tuple t based on the values of t.attr and the current minimum, as follows: If t.attr is greater than the current minimum, we can safely get rid of it. If t.attr is equal to the current minimum, we should keep it, as least for now, in a temporary result temp1. If t.attr is less than the current minimum, we should keep it, in case our current minimum changes, in a temporary result temp2. Whenever the current minimum changes, we know that temp1 should be deleted, i.e. tuples there cannot be part of a solution. On the other hand, temp2 should be filtered: some tuples there may be thrown away, some may be in a new temp1, some may remain in temp2. At the end of the iteration, the set temp1 gives us the correct solution. Of course, as we go over the tuples in R we may keep some tuples that we need to get rid of later on; but the important point is that we never have to get back and recover a tuple that we dismissed, thanks to the monotonic behavior of min. This will allow us to implement the operation efficiently. This behavior does generalize to max, sum, count, since they are all monotonically non-decreasing (for sum, it is assumed that all values in the domain are positive numbers); however, average is not monotonic (either in an increasing or decreasing manner). More complex aggregates, like median and mode, which were added to SQL in the latest standard (ref to Melton) also do not have this nice behavior. Of course, a different operator dictates a different behavior, but the overall situation does not change: we can successfully take decisions on the fly without having to recover discarded tuples later on. For instance, if the operator were < instead of equality (so the condition reads att < min(attr2)), then 3 Again, note that the base relation contains the correlation as a join. 4

5 if t.attr is greater than or equal to the current minimum, we can safely ignore t; and if t.attr is smaller than the current minimum, we should keep it in a temporal result temp1. Whenever a new (lower) minimum is discovered, we need to filter out elements of temp1, discarding some values. That is, if m is the current minimum, any value a > m can be safely thrown away, since if a new minimum m is discovered (i.e. m < m), a still would not qualify. If a < m, however, it may be the case that a m, and a must be discarded at a later time when m is determined to be the new minimum. Finally, if the operator were > (so the condition is attr > min(attr2)), tuples are not discarded at all, but divided into for sure in the result set or possibly in the result set. That is, any value a > m is to be kept as it will be for sure part of the solution, but any value b < m must also be kept as it may become part of the solution if m < m is discovered and b > m. Still, note that in this case we know for a fact some tuples are part of the solution early and we can write them to output right away. To implement this behavior, we introduce the idea of a for-loop program. Intuitively, a for-loop program will implement the for-loop operator by iterating over its input relation once; this is achieved by exploiting the property of aggregate results explained above, which allows us (in some cases) to compute the aggregate and, at the same time, use the (temporary) aggregate in a condition. A for-loop program is an expression of the form for (t in R) Body where t is a tuple variable (called the driving tuple), R is a relational algebra expression, and Body is called a loop body. A loop body is a sequence of statements, where each statement is either a variable assignment or a conditional statement. We write the assignments as v := e;, where v is a variable and e an expression. Both variables and expressions are either of atomic (integer, string,... ), tuple or relation type. We allow atomic constants (0,1, a,...) and relational constants ( ). Also, we allow atomic variables (including integers and arithmetic on them), tuple variables and relation variables. Expressions are made up of variables, constants, arithmetic operators (for integer variables) and the operator (for relation variables). If e 1,..., e n are either atomic expressions or attribute names, then (e 1,...,e n ) is a tuple expression. If u is a tuple expression, then {u} is a relation expression. Conditional statements are written as: if (cond) p1; or: if (cond) p1 else p2;, with both p1 and p2 being sequences of statements. The condition cond is made up of the usual comparison operators (=, <, > and so on) relating constants and/or variables. Parenthesis ({, }) are used for clarity. Furthermore, for-loop programs obey one constraint: the only tuple variable in the loop body is the driving tuple and the only relational variable is an special variable called result. All other variables are atomic variables. The semantics of a for-loop program are defined in an intuitive way. Let p = for(t in R) Body be a for-loop program, and t 1,..., t n an arbitrary ordering of the tuples in R, called an ordering of the input. Then to execute p is to execute Body with t taking in the values t 1,..., t n in that order (loop bodies have an obvious, intuitive semantics, since each statement is a variable assignment or a conditional statement). The value of variable result at the end of the iteration is the value of the for-loop program. In order for this definition to be correct, we note that we use for-loop programs to compute (aggregate-extended) relational algebra expressions; since these expressions are generic ([1]), the bodies of our for-loop programs are invariant to the ordering of the input; that is, they yield the same value for the same basic relation regardless of what ordering is used. The above implements flat for-loops. For grouping for-loops, we allow a more complicated form of the for-loop program: for(t in R) GROUP(t.attr,Body1) [Body2] Body3 {Body4} with the following meaning: let attr be the name of an attribute in R, and let t 1,..., t n be an ordering of the tuples of R such that for any i, j {1,..., n},if t i.attr = t j.attr, then i = j + 1 or j = i + 1 (in other words, the ordering provides a grouping of R by attribute attr). Then the program Body1 is executed once for each tuple, and all variables in Body1 are reset for different values of attr (that is, Body1 is computed independently for each group), while program Body2 will be executed once for every value of attr (i.e. once for every group) after Body1 is executed. Body3 is simply done once for each tuple in R, as before, and Body4 is executed once, after the iteration is completed. A simple example will show how this kind of program is used 4 : Example 3 The SQL query 4 For simplicity, we use a synthetic database in our examples, with relations R (schema: (A, B, C, D)) and S (schema: (E, F )). 5

6 SELECT B, AVG(C) FROM R WHERE A = a 1 GROUP BY B can be computed by the program count := 0; sum := 0; avg := 0; result := ; for (t in π B,C (σ A= a 1 (R))) GROUP(t.B, {sum := sum + t.c; count := count + 1}) [avg := sum/count; result := result {(t.b, avg)};] The base relation here is π B,C (σ A= a (R)). This example has neither a Body3 nor a Body4 fragment. Observe 1 that it is assumed that variables sum and count get reset to their initial values for each group, while avg and result are global variables, and the instructions that contain them are executed only once for each group (once sum and count have been computed). Example 4 Assuming again the query of example 1 and the pattern of example 2, the implementation for the for-loop operator is as follows: min := + ; result := ; for(t in σ r name= EUROP E (part supplier partsupp nation region)) GROUP(t.p partkey, {if (t.ps supplycost < min) { min := t.ps supplycost; result := ; }}) [if (p size = 15 and p type like %BRASS ) { if (t.ps supplycost = min) result := result {t}; }] Again, note that min is assumed to be reset for each group, while result is a global variable. The reader can convince herself that the program will indeed compute the same query as the original SQL of example 1. We add a final construct to our for-loop programs: the statement FILTER(result, cond) will delete from result all tuples that do not meet the condition cond. When this code is in the [ ] section, the only tuples in result that are inspected for possible deletion are those added in the last grouping. This strategy has to filter out some tuples previously added to the result (which means it has to undo some previous work), but preliminary experiments suggest that it works well in practice ([2]). The basic reason is that in implementing the for-loop mechanism, what we really want is to read the basic relation from disk into memory once; therefore if all the elements of a group fit in memory (or are close to it) the computation can still be implemented as it if were one-pass as far as the disk subsystem is concerned. 4 Query Transformation In order to produce a query plan with for-loops, we need to indicate how the for-loop program is going to be produced for a given query. The general strategy is as follows: we classify each SQL query q into one of two categories, according to q s structure. For each category, a pattern p is given. As before, if q fits into p there is a mapping g between constants in q and variables in p. Associated with each pattern there is a for-loop program template t. A template is different from a program in that it has variables and options. Using the information on the mapping g (including the particular linking aggregate and linking condition in q), a concrete for-loop program is generated from t. We distinguish between two types of queries: 1. Type A queries, in which the subquery is not correlated (this corresponds to type J in [17]); and 2. Type B queries, where the subquery is correlated (this corresponds to the type JA in [17]). Queries of type A are interesting in that usual optimization techniques cannot do anything to improve them. Obviously, unnesting does not apply to them, and no other approach looks at query and subquery globally. Thus, our approach, whenever applicable, offers a chance to create an improved query plan. In contrast, queries of type B have been dealt with extensively in the literature ([17, 7, 11, 22, 32, 31, 30, 29]). As we will see, our approach is closely related to other unnesting techniques, but it is the only one that considers redundancy between query and subquery and its optimization. The process to produce a query tree containing a for-loop operator is simple: our patterns allow us to identify the part common to query and subquery (i.e. the base relation), which is used to start the query 6

7 tree. Standard relational optimization techniques can be applied to this part. Then a for-loop operator which takes the base relation as input is added to the query tree, and its parameters determined. In a general form, a for-loop will be of one of two types: For type B queries, F L(Aggs, F (attr), 2 1, 1 2 C)(R), where Aggs is the list of correlating attributes; F is the linking aggregate, attr is the linking attribute in the subquery, and C is the linking condition. R represents the base relation, computed from 1 2. For type A queries, F L(F (attr), 2 1, 1 2 C)(R), where all parameters are as above. Thus, the for-loop plan will compute the common part once (to obtain the base relation); then, the linking condition and (possibly) extra selections that are not part of the base relation will be computed with a for-loop program. Hence, the for-loop plan computes the minimum number of operations required to execute the query. Each case is explained in detail below. For now we will concentrate only on queries having the the same tables in the main query and the subquery i.e. total coincidence on the FROM clauses (this simplification is removed in subsection 4.3). For correlated subqueries, the approach acts as if the table that the correlated attribute belongs to is also present in the subquery (this will become equivalent to unnesting the subquery, as will be seen). 4.1 Type A Queries We show the process in detail for the type A queries. The general pattern a type A query must fit is given below: SELECT L FROM T WHERE 1 and attr 1 θ (SELECT F(attr 2 ) FROM T WHERE 2 ) {GROUP BY L2} The parenthesis around the GROUP BY clause are to indicate that such clause is optional 5. We create a query plan for this query in two steps: 1. A base relation is defined by g( 1 ) g( 2 )(g(t )). Note that this is an SPJ query, which can be optimized by standard techniques. 2. We apply a forloop operator defined by F L(g(F (attr 2 )), g( 2 ) g( 1 ), g( 1 ) g( 2 ) g(attr 3 θ F 2 (attr 4 ))) It can be seen that this query plan computes the correct result for this query by using the definition of the for-loop operator. Here, the aggregate is F (attr 2 ), α is g( 2 1 ) and β is g( 1 ) g( 2 ) g(attr θ F (attr 2 )). Thus, this plan will first apply 1 2 to T, in order to generate the base relation. Then, the for-loop will compute the aggregate F (attr 2 ) on the result of selecting g( 2 1 ) on the base relation. Note that ( 2 1 ) ( 1 2 ) = 2, and hence the aggregate is computed over the conditions in the subquery only, as it should. The result of this aggregate is then appended to every tuple in the base relation by the Cartesian product (again, note that this description is purely conceptual). After that, the selection on g( 1 ) g( 2 ) g(attr 3 θ F 2 (attr 4 )) is applied. Here we have that ( 1 2 ) ( 1 2 ) = 1, and hence we are applying all the conditions in the main clause. We are also applying the linking condition attr 3 θ F (attr 2 ), which can be considered a regular condition now because F (attr 2 ) is present in every tuple. Thus, the forloop computes the query correctly. This forloop operator will be implemented by a program that will carry out all needed operators with one scan of the input relation. Clearly, the concrete program is going to depend on the linking operator (θ, assumed to be one of {=, <=, >=, <, >}) and the aggregate function (F, assumed to be one of min,max,sum,count,avg). 5 Obviously, SQL syntax requires that L2 L, where L and L2 are lists of attributes. In the following, we assume that queries are well formed. 7

8 The program template for NN queries is shown below. The purpose of the line number is to serve as a reference when we use this template as a basis for others later on. The / is used to show options. [1] represents an always true condition, [0] an always false condition and [ ] and empty action. (1.) F=init (2.) result= (3.) for(t in σ ( 1 2 )(T)) { (4.) if(( 1 2 ) [t.attr 2 θ 1 F ]/[t.attr 2! = null]) F = α (5.) if(( 2 1 ) t.attr 1 θ 2 F )result = result (t) (6.) [ FILTER(partial, attr 1 θ 3 F ) ]/[ ] } (7.) if( [F θ 4 init]/[1]/[0]) { (8.) FILTER (result, attr 1 θ 5 var2) } The table below shows the options and the values that need to be chosen for the generation of the for loop program given the SQL query. Not all possible combinations have been shown for lack of space; only those for linking aggregates max and min are illustrated. Similar tables are generated for the other aggregate functions. 8

9 max min θ Changes θ Changes init = - α = attr 2 θ 1 = > θ 2 = >= init = + α = attr 2 θ 1 = < θ 2 = <= = θ 3 = < θ = 3 = > θ 5 = > θ 5 = < (4.) pattern 1 (4.) pattern 1 (6.) pattern 1 (6.) pattern 1 (7.) pattern 2 (7.) pattern 2 var2 = max var2 = min > init = - α = attr 2 θ 1 = > θ 2 = > θ 3 = <= θ 4 = = θ 5 = > < init = + α = attr 2 θ 1 = < θ 2 = < θ 3 = >= θ 4 = = θ 5 = < (4.) pattern 1 (4.) pattern 1 (6.) pattern 1 (6.) pattern 1 (7.) pattern 1 (7.) pattern 1 var2 = - var2 = + >= init = - α = attr 2 θ 1 = > θ 2 = >= θ 3 = < θ 4 = = θ 5 = = <= init = + α = attr 2 θ 1 = < θ 2 = <= θ 3 = > θ 4 = = θ 5 = < (4.) pattern 1 (4.) pattern 1 (6.) pattern 1 (6.) pattern 1 (7.) pattern 1 (7.) pattern 1 var2 = - var2 = + < init = - α = attr 2 θ 1 = > θ 2 = < θ 5 = >= > init = + α = attr 2 θ 1 = < θ 2 = > θ 5 = <= (4.) pattern 1 (4.) pattern 1 (6.) pattern 2( ) (6.) pattern 2( ) (7.) pattern 2 (7.) pattern 2 var2 = - var2 = + <= init = - α = attr 2 θ 1 = > θ 2 = <= θ 5 = > >= init = + α = attr 2 θ 1 = < θ 2 = >= θ 5 = < (4.) pattern 1 (4.) pattern 1 (6.) pattern 2( ) (6.) pattern 2( ) (7.) pattern 2 (7.) pattern 2 var2 = - var2 = + Traditional processing of this type of query would consist of: 9

10 1. processing and optimization of the subquery in isolation. A query tree to carry out AGG F (attr2)(σ 1 (T )) would be designed. 2. processing and optimization of the main query, minus the linking condition. A query tree to carry out π L (σ 2 (T )) would be designed. 3. processing of the linking condition would be carried out by a selection taking as input the result of the previous step and as condition attr 1 θ v, where v is the value obtained in the first step. Note that all parts common to 1 and 2 would be done twice in this approach. The following example illustrates the process in our approach and the differences with the traditional approach. Example 5 The SQL query SELECT * FROM R,S WHERE R.A = S.E and R.B = c and S.F = d and C = (SELECT max(c) FROM R,S WHERE R.A = S.E and R.B = c and R.D = e ) fits the NN pattern, with the matching g is defined as follows: g(l) = * ; g(t) = {R,S}; g( 1 ) = (R.A = S.E and R.B = c and S.F = d ); g( 2 ) = (R.A = S.E and R.B = c and R.D = e ); g(attr 1 ) = C ; g(attr 2 ) = C ; g(θ) = = ; g(f )= max. Therefore, g( 1 2 ) = (R.A = S.E and R.B = c ); g( 1 1 ) = (S.F = d ); and g( 1 2 ) = (R.D = e ). Looking up the table for F = max and θ= =, the entry gives the instructions to transform the pattern into a particular program. For instance, lines with no numbers give the initialization of variable values throughout the template, so init = - tells us that F must be initialized to -, and what each operator in the comparisons should be. Also, the entry tells us that line 4 must choose pattern 1 of the two present in the condition of the if expression. Plugging in these values and options in the template, it becomes: (1.) F=- (2.) result= (3.) for(t in σ ( 1 2 ) (T)) { (4.) if(( 1 2 ) [t.attr 2 > F ]) F = attr 2 (5.) if(( 2 1 ) t.attr 1 >= F ) result = result (t) (6.) [ FILTER(result, attr 1 < F ) ] } (7.) if([1]) { (8.) FILTER (result, attr 1 > max) } Finally, plugging in the values for g yields the following program: (1.) F=- (2.) result= (3.) for(t in σ ( R.A = S.E R.B = c )(R S)) { (4.) if((t.f = d ) [t.c > F ]) F = t.c (5.) if((t.d = c ) t.c >= F ) result = result (t) 10

11 (6.) [ FILTER(result, t.c < F ) ] } (7.) if([1]) { (8.) FILTER (result, t.c > max) } While the template may seem a bit complex, it has been designed to be general enough to build all needed programs starting with just one template. Obviously, the resulting program can be easily simplified. For instance, pattern conditional statements with conditions like [1] or [0] can be transformed into simpler, non-conditional statements; in this example, the condition in line (7) is trivially true and so can be eliminated. Also, the base relation in line (3) can be optimized with the usual techniques, since it is a SPJ expression. In this case, we would expect σ ( R.A = S.E R.B = c )(R S) to become (σ R.B= c (R)) R.A=S.E S. After some more optimization, the program becomes F=- result= for(t in ((σ R.B= c (R)) R.A=S.E S)) { if((t.f = d ) [t.c > F ]) {F = t.c; result = } if((t.d = c ) t.c = F ) result = result (t) Traditional processing in this example would result in a query tree for the subquery: AGG max(c) (σ R.B= c R.D= e (R)) R.A=S.E S (subtree is optimized in the same manner as the forloop program to make comparisons easier). a query tree for the query: (σ R.B= c (R)) R.A=S.E (σ S.F = d (S)) (subtree is again optimized). a selection on previous relation with condition C = v, where v is the value obtained in the first step. Hence, the join of R and S is done twice (as well as the selection on R.B = c ), while our approach carried out the join and the selection once. On the other hand, the standard approach can push several selections before the join on each occasion, while our approach can only push one of them. While our approach pipelines these selections with other operations (and therefore does them at no extra cost), the size of the relation that is input to the forloop is likely to be larger than the temporary results in the standard approach. Finally, our approach computes the aggregate and produces the final result at the same time, while the traditional approach first computes the aggregate and then produces the final result in separate steps. Clearly, which plan is better depends on two types of parameters: typical optimization parameters, like the size of R and S and the selectivity of the conditions; and the linking condition, in particular the linking operator and the linking aggregate, which dictate the type and efficiency of the for-loop operator. Thus, an optimizer generating both plans should estimate costs for each plan and choose the one with the lower cost. We show later how to estimate the cost of the plan containing the forloop operator. When a group by is present in the main query, we add a final group by node to the query plan. Thus, group bys are treated similarly to traditional approaches and are not shown here. 4.2 Type B queries The general pattern for type B queries is given next. SELECT L FROM T 1 WHERE 1 and attr 1 θ (SELECT F 1 (attr 2 ) FROM T 2 WHERE 2 and S.attr 3 θ R.attr 4 ) {GROUP BY L2} where R T 1 T 2, S T 2, and we are assuming that T 1 {R} = T 2 {S} (i.e. the FROM clauses contain the same relations except the one introducing the correlated attribute, called R, and the one introducing the correlation attribute, called S). We call T = T 1 {R}. As before, a group by clause is optional. 11

12 The corresponding forloop program can be generated using the following template: (1.) F 1 = init (2.) NN(θ, F 1 )(1) (3.) NN(θ, F 1 )(2) (4.) NN(θ, F 1 )(3) (5.) for(t in T 1 ) (6.) { (7.) GROUP (attr 1, (8.) NN(θ, F 1 )(4) (9.) NN(θ, F 1 )(5) (10.) NN(θ, F 1 )(6) ) (11.) [NN(θ, F 1 )(7) (12.) NN(θ, F 1 )(8) (13.) for ( t in partial ) (14.) F 1 = operation (15.) result = result U (L) (16.) F 1 = init, NN(θ, F 1 )(1) (17.) NN(θ, F 1 )(3), result= ] (18.) } As before, the particular program to be generated depends on the linking aggregate and the linking operator. For instance, the init and operation above are given by the following table: F 1 operation init sum sum+ attr 3 0 min (attr 3 < min)?attr 3 : min + max (attr 3 > max)?attr 3 : max count count++ 0 avg * * In the pattern, NN(θ,F)(n) refers to the n th line of the for loop program for the NN query where the linking operator is θ and the aggregate function is F. As an example, assume the query SELECT R.A,R.B FROM R,S WHERE R.A = S.E and S.C < (SELECT min(s.f) FROM S WHERE R.A = S.E) The equivalent forloop program for this query, generated from the template above after appropriate substitutions, is given below. 12

13 (1.) min=+ ; (2.) sum = 0; (3.) result = ; (4.) partial = ; (5.) for(t in R S ) (6.) { (7.) GROUP (t.a, (8.) if (t.f < min) min = t.f (9.) if (t.c <= min) partial = partial U {t} (10.) FILTER(partial, C > min)) (11.) [ FILTER (partial, t.f < min) (12.) for ( t in partial ) (13.) result = result U ( t.a, t.b) (14.) sum=0, min= + (15.) partial = ] (16.) } After the process of simplification, the resulting program is min=+ ; result = ; for(t in R S ) { GROUP (t.a, {if (t.f < min) {min = t.f} if (t.c < min) partial = partial U {t}) [FILTER(partial, C > min); result = result partial; min= + ; partial = ] } The program computes the join of R and S, and then loops over it computing the minimum of every group, as grouped by R.A. Inside every group, a partial result is calculated, which is then added to the final result. The process is repeated for every group. Note that, inside each group, we may have to filter some results as we go every time a lower minimum is discovered. Note also that, as a side effect, the result will actually be grouped by R.A. Traditional processing of this query would likely try to optimize the query by using unnesting techniques. In Dayal s approach, the table containing the correlated attribute is outer joined to the table containing the correlation attribute. Other joins and conditions in the WHERE clause of both query and subquery would be added to the tree. Then a group by would compute the aggregate in the subquery and finally a selection would implement the linking condition. Thus, the plan would contain the following steps: 1. Outer join of R and S. 2. Join of other tables in T 2 with the product of the previous steps, and selection of other conditions in subquery, followed by grouping and computation of aggregation. Thus, the subtree so far contains GB attr6,f (attr2) (σ 2 (T (R OJ S))), where OJ represents the outer join and GB a group by node. 3. The main query is executed by applying all selections in 1 to the relations in T, and the result is joined with the result of the previous step in condition attr1 θ F (attr2) (note that F (attr2) was computed in the previous step and is therefore and attribute like any other). Finally, L is projected. In the magic sets approach, 1 would be computed in its entirety, and a list of unique values for the correlating attribute would be generated. This list would be used as a semijoin to restrict the computation of 2 values, including the grouping and aggregation. Finally, the two partial results would be joined back and the linking condition would be computed. Thus, the plan would be composed of the following steps: 1. Compute T 1 = 1 (T ); this is the complement set. 13

14 2. Compute M = π R.attr4 (T ); this is the magic set. 3. Join M with S, continue computing 2 ; call the result T group T 2 by R.attr 4 ; compute aggregate F (attr 2 ). Call the result T Join T 3 and T 1 ; use a selection for the linking condition. In our approach, we consider the table containing the correlated attribute as part of the FROM clause of the subquery too (i.e. we effectively decorrelate the subquery). Thus, the outer join is always part of our common part. In our plan, there are two steps: 1. compute the base relation, given by g( 1 2 )(T {R, S}). This includes the outer join of R and S. 2. computation of a grouped forloop defined by which computes the rest of the query. F L(attr6, F (attr2), 2 1, 1 2 attr1 θ F (attr2)) Dayal s query plan and our query plan are shown as trees in Figure 4.2. Our plan has two main differences with Dayal s: the parts common to query and subquery are computed only once, at the beginning of the plan, and computing the aggregate, the linking predicate, and possible some selections is carried out by the forloop predicate in one step. Thus, we potentially deal with larger temporary results, as some selections (those not in 1 2 ) are not pushed down, but may be able to effect several computations at once (and do not repeat any computation). Compared to magic sets, a trade-off is quickly obvious: by fixing the common part, we do not repeat computations, but cannot separate the production of complementary and magic sets from the processing -since the common part is in both the complementary set and the subquery. Therefore, like Dayal s approach, we may compute more aggregates in the subquery than the magic set approach. However, we do not generate additional joins or temporary tables and do not repeat computations. Clearly, which plan is better depends on the amount of redundancy between query and subquery, the linking condition (which determines how efficient the for-loop operator is), and traditional optimization parameters, like the size of the input relations and the selectivity of the different conditions. 4.3 Extensions The presentation so far has limited for-loops to be applied to queries with aggregated subqueries, where the subquery and query have the same FROM clause. However, the approach can easily be generalized to more general cases. First, we point out that it is possible to rewrite any SQL query with a (non aggregated) subquery into a query with an aggregate subquery. For instance, it is well known that a condition like EXISTS Q, where Q is a subquery, can be transformed to the semantically equivalent 0 > Q, where Q is a query derived from Q by changing Q s SELECT clause from whatever it was to SELECT COUNT(*). The introduction of * is needed to deal with null values. Similar subtle transformations are needed in other cases. The point is, however, that all SQL subqueries can be rewritten similarly, as shown in [3]. Hence, the approach presented here can be applied to any SQL query with subqueries of any kind, by first rewriting the subquery and linking condition appropriately. Second, the approach can deal with situations where query and subquery have different FROM clauses as follows. Let T 1 be the FROM clause of the main query and T 2 the FROM clause of the subquery, and 1 and 2 are before. If T 1 T 2 =, then nothing can be done with our approach. However, if T 1 T 2, we can still derive a common part as 1 2 (T 1 T 2 ) 6 Let P 1 = 1 2 (T 1 T 2 ), P 2 = 1 2 (T 1 T 2 ) and P 3 = 2 1 (T 2 T 1 ). To compute the aggregate in the subquery we need P 1 P 2 ; and to compute the linking condition and the final result we need P 1 P 2. However, if we join each part separately we cannot use the for-loop (which takes one relation as input), and we cannot use P = P 1 P 2 P 3 as the base 6 Note that, in the case of correlated subqueries, we add to T 2 the relation in T 1 which provides the correlated attribute, and therefore all correlated subqueries are in this case. 14

15 Project(L) Project(L) Select(attr1 F(attr2)) FL(attr6,F(attr2),Delta2 Delta1,Delta1 Delta2^attr1 F(attr2)) Join Select(Delta1) GB(attr6,F(attr2)) Delta1 ^ Delta2 T Select(Delta2) Join T OuterJoin T R S (a) (b) Figure 1: Standard plan for NQ queries (a) vs. forloop plan (b) relation for the for-loop, since tuples in P 1 are needed for computations both in query and subquery: if a tuple in P 1 has a match in P 2 but does not have a match in P 3 it will disappear from P, even though it may be part of the final result; if a tuple in P 1 has a match in P 3 but does not have a match in P 2 it will disappear from P, even though it is needed to compute the right aggregation in the subquery. The problem is that all tuples in P 1 must be kept, but we need to know, for each such tuple, if it is part of the subquery only, the main query, or both. To understand the situation, imagine tables R, S, T and U as described below and the query SELECT R.A FROM R, S, T,Z WHERE R.B = S.C and S.D = T.E and T.F =Z.G and R.A = (SELECT SUM(S.D) FROM S,T,U WHERE R.B = S.C and S.D = T.E and T.F =V.I) R S T Z A B C D E F G H The common part is the join of R with S and T : R S T A B C D E F V I J

16 After that, the join with Z will qualify rows 1 and 3 (starting at the top) of the above result; the value of R.A sent to the correlated attribute is 2, and it is sent twice. Clearly, then, R S T is the same for query and subquery. However, in the subquery a join with V will qualify rows 2 and 4, and the sum will be over values 4 and 4 of S.D. Clearly, we cannot throw away any row of the common part if we expect to compute the forloop on it; we can only mark each tuple as belonging to the outer query, the subquery or both 7. One way to accomplish this is to use outer joins instead of regular joins in P : we (left) outer join P 1 to P 2, and then (left) outer join the result to P 3. Tuples which have nulls in both the P 2 and P 3 parts of the schema can be disregarded; tuples that have nulls in P 2 but not in P 3 are considered for the computation of the aggregate but not for the result; and tuples that have nulls in P 3 but not in P 2 are considered for the result, but not for the computation of the aggregate (obviously, tuples with no nulls are considered for both computation of the aggregate and appearance in the result). This can be accomplished easily by adding a condition isnull(p 1.key) to 1 2 and a condition isnull(p 2.key) to 2 1 in the for-loop operator. If all selections are pushed down, and all attributes required in the for-loop come from relations in T 1 T 2, all information needed is in P 1, and the outer joins with P 2 and P 3 are required basically to determine which tuples are in the main query and which are in the subquery part. A possible optimization then is to use semijoins in the definition of P. Unfortunately, a semijoin loses information about duplicates, and this information is needed to compute aggregates correctly in SQL (at least those aggregates that are duplicatesensitive ([])) and also to determine how many times a tuple must appear in the result, since SQL does not remove duplicates unless explicitly stated in the query (DISTINCT keyword). Thus, the semijoin strategy is not directly applicable. Our approach for this case is to join P 1 to P 2 and then to P 3 using cardinal joins. Intuitively, a cardinal join of relations R and S tells us how many tuples in S match each tuple in R by creating a counter in each tuple in R with such information. Thus, let A sch(r), B sch(s), and let sch(r) = A 1,..., A n. Then the cardinal join of R and S on condition AθB (in symbols R C AθB S) produces a relation of schema A 1,..., A n, N, where N is a special domain isomorphic with the natural numbers. The cardinal join is defined as R C AθB S = {t t R t[r] = t t[n] = {t t S t.aθt.b} }. Then (P 1 C P 2 ) C P 3 will add, to each tuple in P 1, two counters. If both are 0, the tuple can be ignored. If the counter created by P 2 is set of 0, the tuple is ignore for computing the linking condition and the final result; if the counter created by P 2 is set of 0, the tuple is ignore for computing the aggregate. Again, this can be accomplished easily by adding a condition (P 1.N 1 = 0) to 1 2 and a condition (P 2.N 2 = 0) to 2 1 in the for-loop operator. A comparison of our approach and the standard approach can be summarized as follows: In the presence of correlation, our approach is similar to unnesting in Dayal s style, in that query and subquery are (outer)joined through the correlation. Unlike magic sets, we cannot produce a minimal set of values for the correlation, as we start by identifying common parts that belong to both query and subquery. On the other hand, our approach avoids any duplicated work, and may be able to compute groupings, aggregations and some selections in one pass. Our approach cannot be applied when T 1 T 2 =, i.e. query and subquery have nothing in common. For non correlated queries, our approach degenerates to the standard approach when T 1 T 2 =. In this case, there is probably no other approach possible but the standard one (execute query and subquery separately). However, whenever some overlap exists, our approach may reduce the amount of work to be done, while the standard approach still executes query and subquery separately. In all cases, our approach executes any common parts of query and subquery together. As pointed out, this may or may not be a good strategy, depending on the amount of overlap, and size of temporary relations created (i.e. selectivity factor of conditions and their distribution). There is a trade-off between fixing the order of execution of joins and selections and not repeating any work. 7 Contrast this with the magic set approach: R S T Z would be computed and yield the complementary table; a projection on R.B would yield the magic set table; and a semijoin of this magic set with S T V would be used to compute the aggregates in the subquery. Thus, the number of computations in the subquery could potentially be reduced, at the cost of producing more temporary results and and additional join later in the plan. In particular, the join R S T is repeated. 16

Fighting Redundancy in SQL

Fighting Redundancy in SQL Fighting Redundancy in SQL Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY 40292 Abstract. Many SQL queries with aggregated subqueries

More information

A Nested Relational Approach to Processing SQL Subqueries

A Nested Relational Approach to Processing SQL Subqueries A Nested Relational Approach to Processing SQL Subqueries Bin Cao bin.cao@louisville.edu Antonio Badia abadia@louisville.edu Computer Engineering and Computer Science Department University of Louisville

More information

Computing SQL Queries with Boolean Aggregates

Computing SQL Queries with Boolean Aggregates Computing SQL Queries with Boolean Aggregates Antonio Badia Computer Engineering and Computer Science department University of Louisville Abstract. We introduce a new method for optimization of SQL queries

More information

Optimization of Nested Queries in a Complex Object Model

Optimization of Nested Queries in a Complex Object Model Optimization of Nested Queries in a Complex Object Model Based on the papers: From Nested loops to Join Queries in OODB and Optimisation if Nested Queries in a Complex Object Model by Department of Computer

More information

CS122 Lecture 4 Winter Term,

CS122 Lecture 4 Winter Term, CS122 Lecture 4 Winter Term, 2014-2015 2 SQL Query Transla.on Last time, introduced query evaluation pipeline SQL query SQL parser abstract syntax tree SQL translator relational algebra plan query plan

More information

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database query processing Query Processing The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database from high level queries

More information

Relational Algebra and SQL

Relational Algebra and SQL Relational Algebra and SQL Relational Algebra. This algebra is an important form of query language for the relational model. The operators of the relational algebra: divided into the following classes:

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Relational Algebra. Procedural language Six basic operators

Relational Algebra. Procedural language Six basic operators Relational algebra Relational Algebra Procedural language Six basic operators select: σ project: union: set difference: Cartesian product: x rename: ρ The operators take one or two relations as inputs

More information

CS 317/387. A Relation is a Table. Schemas. Towards SQL - Relational Algebra. name manf Winterbrew Pete s Bud Lite Anheuser-Busch Beers

CS 317/387. A Relation is a Table. Schemas. Towards SQL - Relational Algebra. name manf Winterbrew Pete s Bud Lite Anheuser-Busch Beers CS 317/387 Towards SQL - Relational Algebra A Relation is a Table Attributes (column headers) Tuples (rows) name manf Winterbrew Pete s Bud Lite Anheuser-Busch Beers Schemas Relation schema = relation

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

SQL: Queries, Programming, Triggers

SQL: Queries, Programming, Triggers SQL: Queries, Programming, Triggers CSC343 Introduction to Databases - A. Vaisman 1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for the

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition

Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition Language 4.1 Schema Used in Examples

More information

CIS 330: Applied Database Systems

CIS 330: Applied Database Systems 1 CIS 330: Applied Database Systems Lecture 7: SQL Johannes Gehrke johannes@cs.cornell.edu http://www.cs.cornell.edu/johannes Logistics Office hours role call: Mondays, 3-4pm Tuesdays, 4:30-5:30 Wednesdays,

More information

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets

TPC-H Benchmark Set. TPC-H Benchmark. DDL for TPC-H datasets TPC-H Benchmark Set TPC-H Benchmark TPC-H is an ad-hoc and decision support benchmark. Some of queries are available in the current Tajo. You can download the TPC-H data generator here. DDL for TPC-H datasets

More information

SQL. Chapter 5 FROM WHERE

SQL. Chapter 5 FROM WHERE SQL Chapter 5 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh 1 Basic SQL Query SELECT FROM WHERE [DISTINCT] target-list

More information

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model Non è possibile visualizzare l'immagine. Chapter 2: Intro to Relational Model Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Example of a Relation attributes (or columns)

More information

Informationslogistik Unit 4: The Relational Algebra

Informationslogistik Unit 4: The Relational Algebra Informationslogistik Unit 4: The Relational Algebra 26. III. 2012 Outline 1 SQL 2 Summary What happened so far? 3 The Relational Algebra Summary 4 The Relational Calculus Outline 1 SQL 2 Summary What happened

More information

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse

Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse Jiratta Phuboon-ob, and Raweewan Auepanwiriyakul Abstract A data warehouse (DW) is a system which has value and role for decision-making

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

Database Systems. Project 2

Database Systems. Project 2 Database Systems CSCE 608 Project 2 December 6, 2017 Xichao Chen chenxichao@tamu.edu 127002358 Ruosi Lin rlin225@tamu.edu 826009602 1 Project Description 1.1 Overview Our TinySQL project is implemented

More information

SQL: Queries, Constraints, Triggers

SQL: Queries, Constraints, Triggers SQL: Queries, Constraints, Triggers [R&G] Chapter 5 CS4320 1 Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for the Reserves relation contained

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University Lecture 3 SQL Shuigeng Zhou September 23, 2008 School of Computer Science Fudan University Outline Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views

More information

Basic form of SQL Queries

Basic form of SQL Queries SQL - 1 Week 6 Basic form of SQL Queries SELECT FROM WHERE target-list relation-list qualification target-list A list of attributes of output relations in relation-list relation-list A list of relation

More information

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls

Today s topics. Null Values. Nulls and Views in SQL. Standard Boolean 2-valued logic 9/5/17. 2-valued logic does not work for nulls Today s topics CompSci 516 Data Intensive Computing Systems Lecture 4 Relational Algebra and Relational Calculus Instructor: Sudeepa Roy Finish NULLs and Views in SQL from Lecture 3 Relational Algebra

More information

Chapter 3: SQL. Chapter 3: SQL

Chapter 3: SQL. Chapter 3: SQL Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested

More information

Optimized Query Plan Algorithm for the Nested Query

Optimized Query Plan Algorithm for the Nested Query Optimized Query Plan Algorithm for the Nested Query Chittaranjan Pradhan School of Computer Engineering, KIIT University, Bhubaneswar, India Sushree Sangita Jena School of Computer Engineering, KIIT University,

More information

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables

SQL: Queries, Programming, Triggers. Basic SQL Query. Conceptual Evaluation Strategy. Example of Conceptual Evaluation. A Note on Range Variables SQL: Queries, Programming, Triggers Chapter 5 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 R1 Example Instances We will use these instances of the Sailors and Reserves relations in our

More information

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination The Extended Algebra Duplicate Elimination 2 δ = eliminate duplicates from bags. τ = sort tuples. γ = grouping and aggregation. Outerjoin : avoids dangling tuples = tuples that do not join with anything.

More information

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5

SQL QUERIES. CS121: Relational Databases Fall 2017 Lecture 5 SQL QUERIES CS121: Relational Databases Fall 2017 Lecture 5 SQL Queries 2 SQL queries use the SELECT statement General form is: SELECT A 1, A 2,... FROM r 1, r 2,... WHERE P; r i are the relations (tables)

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Enterprise Database Systems

Enterprise Database Systems Enterprise Database Systems Technological Educational Institution of Larissa in collaboration with Staffordshire University Larissa 2006 Dr. Georgia Garani garani@teilar.gr Dr. Theodoros Mitakos teo_ms@yahoo.com

More information

Chapter 6: Formal Relational Query Languages

Chapter 6: Formal Relational Query Languages Chapter 6: Formal Relational Query Languages Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 6: Formal Relational Query Languages Relational Algebra Tuple Relational

More information

Chapter 4: SQL. Basic Structure

Chapter 4: SQL. Basic Structure Chapter 4: SQL Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Joined Relations Data Definition Language Embedded SQL

More information

Silberschatz, Korth and Sudarshan See for conditions on re-use

Silberschatz, Korth and Sudarshan See   for conditions on re-use Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

CS122 Lecture 10 Winter Term,

CS122 Lecture 10 Winter Term, CS122 Lecture 10 Winter Term, 2014-2015 2 Last Time: Plan Cos0ng Last time, introduced ways of approximating plan costs Number of rows each plan node produces Amount of disk IO the plan must perform Database

More information

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure. SL Lecture 4 SL Chapter 4 (Sections 4.1, 4.2, 4.3, 4.4, 4.5, 4., 4.8, 4.9, 4.11) Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Modification of the Database

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/25180 holds various files of this Leiden University dissertation Author: Rietveld, K.F.D. Title: A versatile tuple-based optimization framework Issue Date:

More information

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Chapter 3: SQL. Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Final Exam CSE232, Spring 97, Solutions

Final Exam CSE232, Spring 97, Solutions T1 Final Exam CSE232, Spring 97, Solutions Name: Time: 2hrs 40min. Total points are 148. A. Serializability I (8) Consider the following schedule S, consisting of transactions T 1, T 2 and T 3 r(a) Give

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

Foundations of Databases

Foundations of Databases Foundations of Databases Free University of Bozen Bolzano, 2004 2005 Thomas Eiter Institut für Informationssysteme Arbeitsbereich Wissensbasierte Systeme (184/3) Technische Universität Wien http://www.kr.tuwien.ac.at/staff/eiter

More information

AXIOMS OF AN IMPERATIVE LANGUAGE PARTIAL CORRECTNESS WEAK AND STRONG CONDITIONS. THE AXIOM FOR nop

AXIOMS OF AN IMPERATIVE LANGUAGE PARTIAL CORRECTNESS WEAK AND STRONG CONDITIONS. THE AXIOM FOR nop AXIOMS OF AN IMPERATIVE LANGUAGE We will use the same language, with the same abstract syntax that we used for operational semantics. However, we will only be concerned with the commands, since the language

More information

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy

Lecture 3 SQL - 2. Today s topic. Recap: Lecture 2. Basic SQL Query. Conceptual Evaluation Strategy 9/3/17. Instructor: Sudeepa Roy CompSci 516 Data Intensive Computing Systems Lecture 3 SQL - 2 Instructor: Sudeepa Roy Announcements HW1 reminder: Due on 09/21 (Thurs), 11:55 pm, no late days Project proposal reminder: Due on 09/20 (Wed),

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

SQL and Incomp?ete Data

SQL and Incomp?ete Data SQL and Incomp?ete Data A not so happy marriage Dr Paolo Guagliardo Applied Databases, Guest Lecture 31 March 2016 SQL is efficient, correct and reliable 1 / 25 SQL is efficient, correct and reliable...

More information

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1 CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost

More information

Chapter 3: Introduction to SQL

Chapter 3: Introduction to SQL Chapter 3: Introduction to SQL Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 3: Introduction to SQL Overview of the SQL Query Language Data Definition Basic Query

More information

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation!

Textbook: Chapter 6! CS425 Fall 2013 Boris Glavic! Chapter 3: Formal Relational Query. Relational Algebra! Select Operation Example! Select Operation! Chapter 3: Formal Relational Query Languages CS425 Fall 2013 Boris Glavic Chapter 3: Formal Relational Query Languages Relational Algebra Tuple Relational Calculus Domain Relational Calculus Textbook:

More information

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto Lecture 02.03. Query evaluation Combining operators. Logical query optimization By Marina Barsky Winter 2016, University of Toronto Quick recap: Relational Algebra Operators Core operators: Selection σ

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model

Agenda. Discussion. Database/Relation/Tuple. Schema. Instance. CSE 444: Database Internals. Review Relational Model Agenda CSE 444: Database Internals Review Relational Model Lecture 2 Review of the Relational Model Review Queries (will skip most slides) Relational Algebra SQL Review translation SQL à RA Needed for

More information

Optimization Overview

Optimization Overview Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:

More information

Final Exam CSE232, Spring 97

Final Exam CSE232, Spring 97 Final Exam CSE232, Spring 97 Name: Time: 2hrs 40min. Total points are 148. A. Serializability I (8) Consider the following schedule S, consisting of transactions T 1, T 2 and T 3 T 1 T 2 T 3 w(a) r(a)

More information

The SQL data-definition language (DDL) allows defining :

The SQL data-definition language (DDL) allows defining : Introduction to SQL Introduction to SQL Overview of the SQL Query Language Data Definition Basic Query Structure Additional Basic Operations Set Operations Null Values Aggregate Functions Nested Subqueries

More information

Database Technology Introduction. Heiko Paulheim

Database Technology Introduction. Heiko Paulheim Database Technology Introduction Outline The Need for Databases Data Models Relational Databases Database Design Storage Manager Query Processing Transaction Manager Introduction to the Relational Model

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

High Volume In-Memory Data Unification

High Volume In-Memory Data Unification 25 March 2017 High Volume In-Memory Data Unification for UniConnect Platform powered by Intel Xeon Processor E7 Family Contents Executive Summary... 1 Background... 1 Test Environment...2 Dataset Sizes...

More information

Detecting Logical Errors in SQL Queries

Detecting Logical Errors in SQL Queries Detecting Logical Errors in SQL Queries Stefan Brass Christian Goldberg Martin-Luther-Universität Halle-Wittenberg, Institut für Informatik, Von-Seckendorff-Platz 1, D-06099 Halle (Saale), Germany (brass

More information

3. Relational Data Model 3.5 The Tuple Relational Calculus

3. Relational Data Model 3.5 The Tuple Relational Calculus 3. Relational Data Model 3.5 The Tuple Relational Calculus forall quantification Syntax: t R(P(t)) semantics: for all tuples t in relation R, P(t) has to be fulfilled example query: Determine all students

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12 SQL QUERY EVALUATION CS121: Relational Databases Fall 2017 Lecture 12 Query Evaluation 2 Last time: Began looking at database implementation details How data is stored and accessed by the database Using

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

SQL Part 3: Where Subqueries and other Syntactic Sugar Part 4: Unknown Values and NULLs

SQL Part 3: Where Subqueries and other Syntactic Sugar Part 4: Unknown Values and NULLs SQL Part 3: Where Subqueries and other Syntactic Sugar Part 4: Unknown Values and NULLs 1-1 List of Slides 1 2 More on "where" conditions 3 Esoteric Predicates: Example 4 WHERE Subqueries 5 Overview of

More information

Tuning Relational Systems I

Tuning Relational Systems I Tuning Relational Systems I Schema design Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning, etc Query rewriting Using indexes appropriately,

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

More on SQL Nested Queries Aggregate operators and Nulls

More on SQL Nested Queries Aggregate operators and Nulls Today s Lecture More on SQL Nested Queries Aggregate operators and Nulls Winter 2003 R ecom m en ded R eadi n g s Chapter 5 Section 5.4-5.6 http://philip.greenspun.com/sql/ Simple queries, more complex

More information

Relational Model: History

Relational Model: History Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s

More information

CSC 261/461 Database Systems Lecture 19

CSC 261/461 Database Systems Lecture 19 CSC 261/461 Database Systems Lecture 19 Fall 2017 Announcements CIRC: CIRC is down!!! MongoDB and Spark (mini) projects are at stake. L Project 1 Milestone 4 is out Due date: Last date of class We will

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

SQL. CS 564- Fall ACKs: Dan Suciu, Jignesh Patel, AnHai Doan SQL CS 564- Fall 2015 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan MOTIVATION The most widely used database language Used to query and manipulate data SQL stands for Structured Query Language many SQL standards:

More information

Structured Query Language Continued. Rose-Hulman Institute of Technology Curt Clifton

Structured Query Language Continued. Rose-Hulman Institute of Technology Curt Clifton Structured Query Language Continued Rose-Hulman Institute of Technology Curt Clifton The Story Thus Far SELECT FROM WHERE SELECT * SELECT Foo AS Bar SELECT expression SELECT FROM WHERE LIKE SELECT FROM

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Why SQL? SQL is a very-high-level language. Database management system figures out best way to execute query

Why SQL? SQL is a very-high-level language. Database management system figures out best way to execute query Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do rather than how to do it Avoid a lot of data-manipulation details needed in procedural languages like C++ or Java Database

More information

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3

RELATIONAL ALGEBRA II. CS121: Relational Databases Fall 2017 Lecture 3 RELATIONAL ALGEBRA II CS121: Relational Databases Fall 2017 Lecture 3 Last Lecture 2 Query languages provide support for retrieving information from a database Introduced the relational algebra A procedural

More information

Chapter 3: Relational Model

Chapter 3: Relational Model Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple Relational Calculus Domain Relational Calculus Extended Relational-Algebra-Operations Modification of the Database

More information

Mobile and Heterogeneous databases

Mobile and Heterogeneous databases Mobile and Heterogeneous databases Heterogeneous Distributed Databases Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in two lectures. In

More information

Data Manipulation (DML) and Data Definition (DDL)

Data Manipulation (DML) and Data Definition (DDL) Data Manipulation (DML) and Data Definition (DDL) 114 SQL-DML Inserting Tuples INSERT INTO REGION VALUES (6,'Antarctica','') INSERT INTO NATION (N_NATIONKEY, N_NAME, N_REGIONKEY) SELECT NATIONKEY, NAME,

More information

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog Chapter 5: Other Relational Languages! Query-by-Example (QBE)! Datalog 5.1 Query-by by-example (QBE)! Basic Structure! Queries on One Relation! Queries on Several Relations! The Condition Box! The Result

More information

Incomplete Information: Null Values

Incomplete Information: Null Values Incomplete Information: Null Values Often ruled out: not null in SQL. Essential when one integrates/exchanges data. Perhaps the most poorly designed and the most often criticized part of SQL:... [this]

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Database Systems SQL SL03

Database Systems SQL SL03 Checking... Informatik für Ökonomen II Fall 2010 Data Definition Language Database Systems SQL SL03 Table Expressions, Query Specifications, Query Expressions Subqueries, Duplicates, Null Values Modification

More information

CS 582 Database Management Systems II

CS 582 Database Management Systems II Review of SQL Basics SQL overview Several parts Data-definition language (DDL): insert, delete, modify schemas Data-manipulation language (DML): insert, delete, modify tuples Integrity View definition

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

(Refer Slide Time 6:48)

(Refer Slide Time 6:48) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 8 Karnaugh Map Minimization using Maxterms We have been taking about

More information

Chapter 5: Other Relational Languages

Chapter 5: Other Relational Languages Chapter 5: Other Relational Languages Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 5: Other Relational Languages Tuple Relational Calculus Domain Relational Calculus

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

Chapter 3. The Relational Model. Database Systems p. 61/569

Chapter 3. The Relational Model. Database Systems p. 61/569 Chapter 3 The Relational Model Database Systems p. 61/569 Introduction The relational model was developed by E.F. Codd in the 1970s (he received the Turing award for it) One of the most widely-used data

More information

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL Chapter 3: Introduction to SQL Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 3: Introduction to SQL Overview of The SQL Query Language Data Definition Basic Query

More information