Optimizing relational queries in connection hypergraphs: nested queries, views, and binding propagations

Size: px
Start display at page:

Download "Optimizing relational queries in connection hypergraphs: nested queries, views, and binding propagations"

Transcription

1 The VLDB Journal (1998) 7: 1 11 The VLDB Journal c Springer-Verlag 1998 Optimizing relational queries in connection hypergraphs: nested queries, views, and binding propagations Jia Liang Han Bell Labs, Lucent Technologies, 6200 East Broad Street, Rm 0B115, Columbus, OH USA; jlhan@acm.org Edited by Y. Vassiliou. Received September 1, 1993 / Accepted January 8, 1996 Abstract. We optimize relational queries using connection hypergraphs (CHGs). All operations including value-passing between SQL blocks can be set-oriented. By introducing partial evaluations, reordering operations can be achieved for nested queries. For a query using views, we merge CHGs for the views and the query into one CHG and then apply query optimization. Furthermore, we may simulate magic sets methods elegantly in a CHG. Sideways informationpassing strategies (SIPS) in a CHG amount to partial evaluations of SIPS paths. We introduce the maximum SIPS strategy, which performs SIPS for all bindings and all SIPS paths for a query. The new method has several advantages. First, the maximum SIPS strategy can be more efficient than the previous SIPS based on simple heuristics. Second, it is conceptually simple and easy to implement. Third, the processing strategies may be incorporated with the search space for query execution plans, which is a proven optimization strategy introduced by System R. Fourth, it provides a general framework of query optimization and may potentially be used to optimize next-generation database systems. Key words: Relational query optimization Connection hypergraphs Partial evaluations SIPS Search space 1 Introduction One of the main advantages of the relational model is declarativeness of its query languages. A declarative language frees application programmers from procedural details such as data structures, access paths to relations, join methods, thus greatly simplifying the task of querying and programming. However, such power does not come for free. Great effort is required to design and optimize relational database systems to achieve good performance. For a query in a highlevel declarative query language, there are often a number of potential evaluation procedures. Which procedure is the most efficient depends on many factors, including the query type, the ranges of attributes (image sizes), the size of relations, and so on. Query processing may be viewed as a sequence of basic operations. A basic operation (or an operation in short) refers to either a selection, a projection, a join, or a cartesian product. Query evaluation is usually divided into two phases. Phase one does not materialize relations but considers various query execution plans (QEPs). A QEP specifies procedural details, such as the order of operations, access paths to relations, buffer management, and management of temporary relations. For a not too simple relational query, the number of QEPs is large. One possible approach is to map all QEPs into a discrete search space, carry out an exhaustive search while using statistics and other information to estimate computation cost of each QEP, then choose the most efficient QEP. Such a strategy was proposed in system R (Selinger et al. 1979) and has proven to be effective in practice. In phase two, relations are materialized and operations are carried out to obtain the answer. The relational language SQL allows nested queries and views. Nested queries and views are important, since they allow programmers to adopt a modular approach to application problems. They may also play an important role in some new-generation database systems, for example, objectoriented database systems. Query optimizers in system R and many current systems were designed for simple SQL queries, i.e., one query having only one SQL block (one SELECT-FROM-WHERE structure). They are often very inefficient when used to evaluate nested queries and views. The inefficiency may be attributed largely to two causes: (1) the value-passing between SQL blocks is tuple-oriented not set-oriented; (2) reordering operations is restricted to one SQL block. To optimize nested queries, Kim (1982) proposed query rewriting methods which transform nested SQL queries into unnested ones. Mumick et al. (1990b) noted that Kim s transformations have similarities with semijoins, Bernstein and Chiu (1981), and magic sets methods (Bancilhon et al. 1986; Beeri and Ramakrishnan 1991; Ullman 1989). Magic sets methods were proposed to optimize recursive datalog queries. Magic sets methods rewrite programs and queries so that irrelevant tuples are not generated during query evaluation. The key to magic set methods is sideways information-passing strategies (SIPS) (Bancilhon et al. 1986; Beeri and Ramakrishnan 1991; Ullman 1989), which pass binding information in a relation side-

2 2 ways (without evaluating the relation or query fully) to other relations. Mumick et al. (1990a,b,c) applied and extended magic set methods to optimize relational queries, as well as datalog queries. Magic set methods are more powerful than Kim s transformations and are applicable to general relational queries. Pirahesh et al. (1992) considered program transformations in the presence of duplicates and aggregates. Analytical and experimental results in these papers and more recently in Mumick and Pirahesh (1994) showed that these program transformations may improve performance of some queries by several orders of magnitude. In this paper, we use the connection hypergraph (CHG) to optimize relational queries, including nested queries and views. The CHG was used in Ullman (1982, 1989) to give a graph version of the Wong-Youssefi algorithm (Wong and Youssefi 1976). Further study using the CHG to optimize unnested SQL queries was given in Han (1994a, 1995). The CHG was used to construct a search space of QEPs, which has several largely orthogonal dimensions (Han 1995): (1) evaluation orders of operations, (2) evaluation methods for each operation, (3) access paths to relations and operations restricted to one base relation, (4) various groupings of join clusters for pipelining. In this paper, we first generalize the graph method to nested queries and views. Each SQL block is represented by one CHG. For a nested query, CHGs for SQL blocks are connected by association hyperedges into one CHG. Each view definition is represented by one CHG, and a query using views may be represented by a CHG merged from CHGs (association hyperedges might be required). We introduce partial evaluations of association hyperedges for the purpose of reordering operations and value-passing across SQL blocks. All operations in a CHG, including partial evaluations, can be set-oriented. Program transformations are unnecessary to evaluate nested queries and views efficiently. Furthermore, partial evaluations may be applied to relation hyperedges as well. Partial evaluations may propagate bindings in a CHG and can be used to simulate magic sets methods elegantly. A sideways information-passing strategy (SIPS) path is a path that starts from a relation with some bindings and ends at another relation (a more precise definition is given in Sect. 5). There may be a few bindings in a query and many different SIPS paths available for binding propagations. To remove as many irrelevant tuples as possible during query evaluation, we propose the maximum SIPS strategy, which performs SIPS for all bindings and all SIPS paths. The search space for QEPs is revised to incorporate SIPS. An exhaustive search then can find the most efficient QEP. This addresses the interplay problem of cost estimate and program transformations mentioned in Mumick et al. (1990b). In addition, it is straightforward in a CHG to generalize SIPS to bindings of type A > a and the like. The arrangement of the rest of this paper is as follows. Sect. 2 introduces the CHG for SQL queries and shows query evaluation in a CHG. In Sect. 3, we examine some important query optimization heuristics: push-selections, project-outirrelevant-attributes-early, and pipelining. The search space for QEPs is constructed. Section 4 addresses query optimization in a CHG for nested queries and queries using views. In Sect. 5, we explain binding propagations in a CHG and simulate magic sets methods by partial evaluations. We consider various SIPS paths and propose the maximum SIPS strategy. We modify the search space so that an exhaustive search may take SIPS into consideration. Summary and further research topics are given in Sect Connection hypergraph and query evaluation We first represent SQL queries by CHGs. The SQL language has many features and it is impossible to consider all of them here. We are limited to a subset of SQL in this paper. An SQL query may be represented by a CHG (Ullman 1989; Han 1994a). An SQL query may have several SQL blocks, each of which has a SELECT-FROM-WHERE structure. A WHERE clause consists of a set of conditions joined by AND. Conditions may be classified into four categories: (1) A = a, (2) A = B, (3) AθB, where θ is one of <,, /=, >,, (4) AθB, where θ is one of,, /=,,. A simple SQL query is an SQL query of only one SQL block. A nested SQL query has more than one SQL block. Each SQL block may be represented by a CHG as follows. 1. An attribute A of relation R i is represented by a node, R i.a. If both R i and R j (they can be different occurrences of the same relation) have attribute A, we create different nodes R i.a and R j.a. We use the term node and attribute interchangeably. 2. A relation hyperedge is drawn for each relation in the FROM clause, which is a solid circle enclosing all its attributes. 3. To represent a condition of type (1) in a CHG, we label the node by A = a. 4. For a condition of type (3) or (4) in the WHERE clause, we draw a condition hyperedge, which is a dotted circle enclosing all nodes in the condition. 5. For a condition of type (2), we merge the corresponding nodes. 6. Attributes in the SELECT clause are known as distinguished attributes. 7. We draw a hyperedge, also a solid circle, which encloses the distinguished nodes. The hyperedge corresponds to the output relation. A relation hyperedge or hyperedge corresponds to a relation; when there is no confusion hyperedge and relation are used interchangeably. On notation, we use different fonts for hyperedge and for relation, e.g., emp as the hyperedge for the emp relation. To represent a nested SQL query, we may draw a CHG for each SQL block and connect CHGs by association hyperedges. An association hyperedge is drawn as a dashed circle enclosing related attributes. Association hyperedges are labeled by IS IN, UNION, EXISTS, etc. Example 2.1. The following example is taken from Mumick et al. (1990b). It has the following relations: emp(eno, Ename, Sal, Bonus, Job, Dno, EKidsN), dept(dno,

3 3 >50000 Job=Sr Prog Eno Ename Sal Bonus Job Dno EkidsN Fig. 1. The CHG for Example 2.1 IS IN SNo PNo JNo Qty PNo PName Dim Price >25 Color Mgr Loc Loc=San Jose shipment part Qty=20 Fig. 2. The CHG for Example 2.2 emp dept Mgr, Loc), where Eno, Dno are employee number and department number, respectively, Mgr stands for manager, and Loc location. The query below finds every senior programmer whose salary plus bonus is greater than 50,000 and whose department is located in San Jose. SELECT Ename, Mgr FROM emp, dept WHERE Job = Sr Programmer AND Sal + Bonus > AND emp.dno = dept.dno AND Loc = San Jose The above query is slightly different from that in Mumick et al. (1990b) (the original example had a subquery P(emp, dept)). The CHG for this query is shown in Fig. 1. Example 2.2. The following example is a variation (for the purpose of binding propagations later) of an example in Kim (1982). Consider relations shipment(sno, PNo, JNo, Qty), part(pno, PName, Dim, Price, Color). The nested SQL query below finds the supplier numbers of suppliers that supply parts of quantity = 20 whose unit price is greater than 25: SELECT SNo FROM shipment WHERE Qty = 20 AND PNo IS IN (SELECT PNo FROM part WHERE Price > 25) The CHG for this SQL query is shown in Fig. 2. Definition 2.1. A base relation is a relation known originally in the database (usually is stored on disk). A transient relation is a temporary relation generated during query evaluation and is used in later evaluation steps. We traverse the CHG for a relational query and mark hyperedges and conditions as we proceed. The traversal determines a sequence of events, S = (E 1,..., E n ), where an event E i can be either a relation hyperedge, a condition hyperedge, a condition, or an association hyperedge. Definition 2.2. Any relation hyperedge is evaluable. A condition hyperedge is evaluable if the relation hyperedges it intersects precede it, i.e., they have been marked. A condition (of type A = a) is evaluable if the relation hyperedge has been marked. (Conditions and condition hyperedges may be combined with retrieval of a relation in Sect. 3.) Until Sect. 4, we consider only sequences in each of which all events are evaluable and without association hyperedges (except for the semantics on nested queries later in this section). We map each event in S into a basic relational operation. The transient relation obtained after an event E j is denoted as TRAN(E 1,..., E j ). The first event E 1 must be a relation hyperedge. We now define a procedure EV AL(TRAN(E 1,..., E j 1 ), E j ), which takes TRAN(E 1,..., E j 1 ), E j and results in TRAN(E 1,......, E j ). 1. Initially, EV AL(E 1 ) gives TRAN(E 1 ) = R 1, where R 1 corresponds to the relation hyperedge E 1. The corresponding relational operation is just retrieval of R If E j is a condition or a condition hyperedge, then EV AL(TRAN(E 1,..., E j 1 ), E j ) gives TRAN(E 1,..., E j ) = σ F (TRAN(E 1,..., E j 1 )), where F is the condition. 3. If E j is a relation hyperedge for relation R j which intersects the marked part of the CHG, then EV AL(TRAN (E 1,..., E j 1 ), E j ) gives TRAN(E 1,..., E j ) = TRAN(E 1,..., E j 1 ) R j. 4. If E j is a relation hyperedge not intersecting the marked part of the CHG, then EV AL(TRAN(E 1,..., E j 1 ), E j ) gives TRAN(E 1,..., E j ) = TRAN(E 1,..., E j 1 ) R j. For example, consider S = {emp, dept,...} in Example 2.1. T RAN(emp, dept) = T RAN(emp) dept. In Example 2.2, let S = {shipment, part,...}. TRAN(shipment, part) = TRAN(shipment) part. After all hyperedges and conditions in a CHG have been marked and corresponding relational operations have been carried out, we project the transient relation onto the distinguished attributes. This results in a relation for the hyperedge, which is the answer to the query. After an intermediate step, some attributes in the transient relation may be not in the query and no longer required in future processing. They may be projected out. This issue will be addressed in detail in the next section.

4 4 Algorithm 2.1. (1) Traverse the CHG which results in a sequence S = (E 1, E 2,..., E n ); (2) for j = 1 to n do (3) EV AL(TRAN(E 1,..., E j 1 ), E j ); (4) project TRAN(E 1,..., E n ) onto the distinguished attributes, which results in the answer. Theorem 2.1. Algorithm 2.1 terminates and evaluates the answer correctly. Example 2.3. For a nontrivial CHG, there can be a large number of sequences due to different evaluation orders. Consider Example 2.1 and the sequence S = (emp, Job = Sr Programmer, Sal + Bonus > 50000, dept, Loc = San Jose ). The evaluation proceeds as follows. (1) Retrieve the emp relation. (2) Apply the condition Job = Sr Programmer, which amounts to a selection on emp. (3) Condition hyperedge Sal + Bonus > is now evaluable and is evaluated next. This corresponds to a selection on the previous transient relation and results in a new transient relation, which is a subset of the emp relation. (4) Evaluate the hyperedge dept. This corresponds to a natural join with the transient relation. (5) Evaluate the condition Loc = San Jose, which corresponds to a selection. Finally, we project the transient relation onto (Ename, Mgr) to obtain the answer to the query. There are many other possible evaluation orders. Consider another sequence S = (dept, Loc = San Jose, emp, Sal + Bonus > 50000, Job = Sr Programmer ). The operations are as follows. (1) Retrieve the dept relation. (2) Apply the selection Loc = San Jose. (3) Evaluate hyperedge emp, which corresponds to a natural join. (4) Apply the condition hyperedge Sal + Bonus > (5) Apply the selection Job = Sr Programmer, followed by the final projection. This gives another evaluation order. Semantics of a nested query are often stated as evaluate inner blocks first. To give a more precise definition, we introduce the dependency graph (DG) (Ullman 1988). Every node in a DG represents a relation, either a base relation or a derived relation (for a hyperedge). In each SQL block, we draw an arc from each node for a relation in the FROM clause to the node for the relation. We also draw an arc from the node for the relation of an inner block to the node for the relation of the outer block. For example, the DG for Example 2.2 is given in Fig. 3. If recursion is not allowed, which is assumed so in this paper, then the DG is a DAG (directed acyclic graph). A topologic order, strata, may be assigned to nodes in a DAG. Derived relations are evaluated bottom-up in a topologic order (Ullman 1988, Han 1994b), and all relations may be determined in this way. The stratum of an association hyperedge is equal to the highest stratum of the relation hyperedges it intersects. Whether an association hyperedge is evaluable is determined by the following. An association hyperedge is evaluable if all the SQL blocks below its stratum have been marked. Operation for an evaluable association hyperedge is determined by the meaning of its label. shipment (SNo) (PNo) part Fig. 3. The DG for Example 2.2 There is one optimization technique on memory management. If the marked part of the CHG consists of several disjoint components, then the transient relation may be regarded as a cartesian product of subrelations. It is more efficient in space to store such a transient relation as subrelations rather than to construct and store it explicitly. Such a decomposition technique was used in Wong and Youssefi (1976). The CHG was used to represent QUEL queries (Ullman 1989). QUEL is a language based on relational calculus; thus, it is more declarative than SQL and may be represented by a CHG. Since SQL is based on both relational algebra and relational calculus, it has some procedural aspects. However, an SQL query may still be represented by a CHG because of commutative and associative properties of relational algebra reviewed in Ullman (1989). This should come as no surprise, since relational algebra and relational calculus have the same expressive power. Declarativeness of the CHG representation implies many evaluation orders. The CHG approach to query evaluation has several advantages: 1. The CHG gives a natural abstract representation of queries. A logical unit of data in a relational database is a relation, which is represented by a hyperedge in the CHG. 2. All operations in the CHG representation can be setoriented. 3. Using a CHG, it is possible to separate semantics from query evaluation. For example, adding duplicate adornments to a CHG is simple and the semantics are not connected to an evaluation. 4. The CHG is convenient to order basic operations and to construct a search space for QEPs. By enlarging the search space, we may incorporate many optimization strategies (more in later sections). 5. A CHG contains information on relation schemes and attributes sufficient to determine the relevance of attributes to the query (see the next section). 6. It is easy to generalize query optimization in a CHG to nested queries and views. 7. Graph algorithms have been well studied and efficient algorithms are well known. 3 The search space In this section, we review results presented in Han (1995). We first describe in a CHG two well-known optimization heuristics, push-selections and project-out-(irrelevant)- attributes. These heuristics simplify the search space, while

5 5 leading to the most efficient or close to the most efficient QEP. The push-selections heuristic performs selections before joins or cartesian products. Selections remove tuples irrelevant to a query. This has cascading effects if performed early. The push-selections heuristic often reduces the total cost by several orders of magnitude. There can be extreme cases when this heuristic does not lead to the most efficient QEP. In order to find the most efficient procedure for sure, we may consider all evaluation orders, including those performing selections after joins or products. However, this significantly increases the size of the search space. For example, consider a query of three selections and two joins. If we use the push-selections heuristic, then only two operations need to be ordered. On the other hand, there may be as many as 5! evaluation orders (some may be invalid) without the heuristics. Since extreme cases are rare and it still performs well even if it does not find the best one, this heuristic is widely used. The following two rules realize the push-selections heuristic in the CHG representation. (R1) If possible, use a constraint (a condition of type A = a, A > a,... or a restricted set of values for an attribute) to retrieve a relation. (R2) Evaluate a condition or a condition hyperedge as soon as it is evaluable. Note that (R1) combines two operations into one and, strictly speaking, might not observe the evaluable condition in the previous section. During query evaluation, some attributes of a transient relation may be neither queried nor useful in future processing. These attributes may be projected out to reduce the size of a transient relation. Definition 3.1. An attribute in a relation is irrelevant to a query if projecting it out from this relation will not change the answer to the query. Otherwise, the attribute is relevant. A relation is irrelevant to a query if its deletion will not change the answer to the query. Otherwise, the relation is relevant. Note that the relevance concept is used for three different objects in this paper: relations, attributes, and tuples in a relation, respectively. After irrelevant attributes are projected out, we obtain a subrelation of the original relation. After irrelevant tuples are removed or filtered out, such as push-selections or sideways information passing, we obtain a subset of the original relation. Relevance of relations and attributes may be determined by the following theorems (Han 1995). Theorem 3.1. A relation hyperedge is irrelevant to a query iff it is not connected to any distinguished node. All nodes in an irrelevant hyperedge are irrelevant. Theorem 3.2. Assume that initially all hyperedges are relevant. A node in the marked part of a CHG (an attribute in the transient relation) is relevant to the query iff either it is a distinguished node or it is in a hyperedge or condition yet to be marked. The following rule corresponds to the project-out-(irrelevant)-attributes early heuristic: (R3) After a hyperedge has been marked, project out irrelevant attributes from the transient relation. As an example, in Fig. 1, after the emp hyperedge and condition Job = Sr Prog have been evaluated, attributes Eno, Job, EKidsN become irrelevant and may be projected out. If the condition hyperedge Sal +Bonus > has also been evaluated, then attributes Sal and Bonus are irrelevant as well. Although in principle the projection heuristic can offer great savings, in practice, however, the savings may be not as significant as they look. Even though transient relations are usually large, if pipelining is used, e.g., in nested-loop methods, then they need not be materialized. Project-outirrelevant-attributes in main memory is useful but its impact on the total cost is not great. This is probably why this heuristic is not used as widely as possible. If disk is used for temporary relations, then this heuristic should be applied. Pipelining is important in reducing costs related to transient relations. Consider two adjacent operations, E i and E i+1. The cost of E i consists of Ci in, Ci aux, C cpu i, Ci out, for the input, auxiliary relations, CPU cost, and output, respectively. An auxiliary relation is a temporary relation created to facilitate an operation. For example, sorted relations used for a sort-join are auxiliary relations. The cost of E i+1 consists of similar terms. If we do not write TRAN(E 1,..., E i ) to disk but use it directly to evaluate E i+1, known as pipelining E i to E i+1, we save Ci out, Ci+1 in. Since transient relations are usually large, pipelining often gives great savings. However, pipelining is not always possible or efficient, mainly because pipelined operations compete for resources, especially memory space. One type of pipelining, pipelining a join/cartesian product to a sequence of selections and/or projections, is always feasible and does not cause adverse effects in efficiency (Han 1995). This forms operation clusters, each cluster having one join or cartesian product followed by zero or more selections/projections. Thus, the problem of evaluation orders is reduced to ordering all relation hyperedges in a CHG. Additional pipelining is also possible. In particular, more than one join may be pipelined into a join cluster. However, there are restrictions for such pipelining. If TRAN(E 1,..., E i ) is large with respect to the buffer size, then E i+1 cannot use join methods that require auxiliary relations. Some join methods, mainly sort-join, require auxiliary relations. To take this into consideration, we group joins into various join clusters. We may now construct the search space. The search space consists of four largely orthogonal dimensions: (1) evaluation orders of operations, (2) evaluation methods for each operation, (3) access paths and operations on one base relation, and (4) various groupings of join clusters. As an example for (2), there are many potentially efficient methods to evaluate a join. For (3), operations on one base relation and access paths are considered together because both evaluate one relation hyperedge. Orthogonality of these dimensions means that various choices in a dimension are available after other dimensions are fixed. Let d i be the number of possible choices for dimension i. d 1 = e!, where e is the number of relation hyperedges. d 2 is determined by the system designer. d 3 depends on the query. d 4 = 2 e 1. The total number of choices, D, is

6 6 D = d 1,d 2,d 3 2 e 1. (3.1) If all the dimensions are orthogonal to each other, then D = e!d 2 d 3 2 e 1. An exhaustive search may be used to find the best QEP among D states. If the number of states is so large that an exhaustive search is impractical, then some optimization strategies proposed in Swami and Gupta (1988) and Swami (1989) may be considered. The experiments in Swami and Gupta (1988) and Swami (1989) used a cost model for main memory databases. More studies may be required for other cost models and on pipelining. 4 Nested queries and queries using views In many current database systems, query optimizers were designed for unnested SQL queries. When such a system processes a nested query, it first evaluates partially the outer block, resulting in some value (either a constant or a tuple). This is passed to the inner blocks. The system then evaluates the inner blocks using this value. Results from the inner blocks are passed back to the outer block and are used to finish processing the outer block. This simple method has two performance problems. (1) The value-passing between blocks is tuple-oriented not set-oriented. The inner blocks are evaluated once for each value passed from the outer block. (2) The search for an efficient evaluation order is restricted to one SQL block. Similar problems also exist for queries using views. To solve the above problems, Kim (1982) proposed methods to transform nested queries into unnested ones. Magic sets methods also carry out query transformations (Mumick et al. 1990a,b,c). In this section, we show that the above inefficiency problem may be easily solved by a processing strategy in the CHG representation. In addition, we incorporate the processing strategy into the search space. 4.1 Nested queries First, in a CHG, all operations may be implemented as setoriented not tuple-oriented. For example, in Example 2.2 when evaluation proceeds from the outer block to the inner block, a set of values of PNo instead of a single value may be passed to the inner block. Therefore, the first performance problem of nested queries does not exist. Reordering operations for a nested query is also possible in the CHG representation. For operations within one SQL block, reordering can be carried out as before. Let us consider the meaning of reordering operations of different SQL blocks. First, according to semantics, an association hyperedge is not evaluable until all SQL blocks below its stratum have been evaluated. This condition will be relaxed and an association hyperedge may be evaluated partially. Second, usually it is inefficient to evaluate hyperedges in other SQL blocks, since the corresponding operation is a cartesian product. However, partial evaluations of association hyperedges change this. An example is given below. Example 4.1. Consider Example 2.2. The condition Qty = 20 is a selection on the relation shipment and may be evaluated early. This results in a transient relation (SNo, PNo) after projecting out irrelevant attributes. At this point, the association hyperedge is not evaluable, since the semantics require that the inner block be evaluated before the association hyperedge. However, we may project TRAN(SNo, PNo) onto PNo, which results in a set {PNo}. According to the meaning of IS IN, tuples in the relation part whose PNo value is not in the set {PNo} are irrelevant to the query. Thus, this set may be used to restrict the relation part when the inner block is evaluated. Evaluation of the inner block results in a set {PNo}. This set is then passed back to the outer block and joined with the previous transient relation (SNo, PNo). Finally, we project the result onto attribute SN o to obtain the answer. Definition 4.1. A subtransient relation is a subrelation of a transient relation, i.e., a projection of a transient relation. Definition 4.2. A partial evaluation of an association hyperedge is as follows. (1) Project the transient relation onto those attributes in the association hyperedge, resulting a subtransient relation. (2) Find the set of attributes intersecting other parts of the CHG. (3) Use the subtransient relation and the meaning of the association hyperedge (IN, UNION, EXISTS, etc.) to obtain possible values for these attributes, i.e., new relations known as association relations, one association relation for each SQL block. A partial evaluation results in an association relation(s) which is then used to evaluate the inner block(s). (Query optimization using sideways information passing is another type of partial evaluation, which is addressed in Sect. 5.) Partial evaluations allow us to arrange association hyperedges or relation hyperedges in any order. If an association hyperedge is not evaluable, we may evaluate it partially. A partially evaluated association hyperedge needs to be evaluated fully again when all SQL blocks below its stratum have been evaluated. This may be represented in the event sequence S by adding one additional association hyperedge. Normally, the place of such an addition should be immediately after the association hyperedge becomes evaluable, since such an operation usually reduces the size of the transient relation. More formally, let the event sequence be S = (E 1,..., A i, E j,..., E k,...), where A i is an association hyperedge not evaluable, E k the last hyperedge whose evaluation enables A i evaluable. The sequence will be rewritten as S = (E 1,..., A i, E j,..., E k, A i,...), where A i denotes a partial evaluation. The last A i is evaluated fully. Attributes in an association hyperedge are relevant to the query until it is fully evaluated. A transient relation after a partial evaluation usually may be decomposed into the original transient relation and the relevant part of the association relations. As before, it is unnecessary to construct a decomposable relation explicitly. We may store the original transient relation on disk and use the relevant part of the association relations to evaluate the inner blocks. For example, in Example 4.1, we may keep the transient relation (SNo, PNo) on disk and use only the set {PNo} after the partial evaluation to process the inner block. The transient relation for the outer block is used only

7 7 Job = Sr Programmer emp1 EkidsN Eno Ename Sal Bonus Job Dno EkidsN Fig. 4. The CHG for Example 4.2 > Job Bonus Avg(Sal) Ename Eno emp2 after the inner block has been evaluated and the results are passed back to the outer block. Example 4.2. Consider query (C) in Mumick et al. (1990b), which was used as an example for magic sets methods. SELECT Ename FROM emp e1 WHERE Job = Sr Programmer AND Sal > (SELECT AVG(e2.Sal) FROM emp e2 WHERE e2.dno = e1.dno) The CHG for this query is shown in Fig. 4. Note that the two SQL blocks overlap. The main causes of inefficiency given in Mumick et al. (1990b) are: (1) tuple-oriented (repeated computation of the same department); (2) a fixed evaluation order. The first problem does not exist for the CHG approach. The second problem has been addressed above. This query has several interesting properties. First, here the inner block and the outer block are not disjoint. The condition e2.dno = e1.dno in the inner WHERE clause merges the two blocks. Second, this query has an aggregate operator AVG. A partial evaluation is often not possible when aggregate operators are involved. Aggregate operators and grouping operations (GROUP BY) should be carried out with care, because a set can be empty (Han 1994b). It is possible that more than one partial evaluation may be applied to one association hyperedge (although probably of little practical significance). Suppose a transient relation intersects an association hyperedge and a partial evaluation passes the relevant values to the inner block. Later, more hyperedges or conditions in the outer block might be evaluated, which results in a new transient relation. A projection of the new transient relation onto the intersecting attribute gives a set of values that usually is a subset of the values passed earlier. The new values may be passed through the association hyperedge again to replace the old ones. In addition, a partial evaluation may be made either from an inner block to the outer block or from an outer block to the inner blocks. 4.2 A method to enumerate QEPs Figure 5 is an algorithm which enumerates all QEPs according to our analysis so far. It has four loops, each for one search dimension. Lines (4) and (5) determine whether or Algorithm 4.1. (search space for queries with or without nesting) (1) order relation and association hyperedges into lists {L 1 }; (2) for each list L 1 do (3) for each hyperedge do (4) if it is an unevaluable association hyperedge A i (5) then add A i to L 1 after the last hyperedge below its stratum; (6) pipeline conditions and condition hyperedges; (7) the result is lists {L 1 }; (8) for each list L 1 do (9) for each relation hyperedge do (10) enumerate evaluation methods for the corresponding operation; (11) the result is lists {L 2 }; (12) for each list L 2 do (13) for each hyperedge do (14) enumerate access paths; (15) the result is lists {L 3 }; (16) for each list L 3 do (17) for each hyperedge do (18) enumerate join clusters; (19) the result is lists {L 4 }; /* {L 4 } is the search space for QEPs */ Fig. 5. An algorithm to enumerate QEPs not an association hyperedge A i is evaluable for the given evaluation order in L 1. If not, we insert A i in L 1 after the last hyperedge below its stratum, as discussed earlier. The newly added hyperedge can be fully evaluated, while the preceding hyperedge has to be evaluated partially. This algorithm may be combined with an appropriate cost model to find the most efficient QEP. Its complexity is bound by the complexity of the search space. In a practical implementation, the designer might use heuristics to reduce the search space by pruning unlikely choices. 4.3 Queries using views For an SQL query using views, we first draw one CHG for each view and for the query. If a relation is given by one and only one view, then we merge the view CHG with the query CHG, i.e., merge the hyperedge of the view with the corresponding hyperedge in the query CHG. If a relation is defined by several views, then we need to specify the semantics. We adopt the semantics used in Starburst (Mumick et al. 1990b), i.e., a derived relation is the union of all the relation definitions. This coincides with Prolog. In a CHG, we use an association hyperedge labeled UNION to connect these CHGs. After a CHG has been constructed, we may apply the usual query optimization strategy (Fig. 5). Example 4.3. The following example is from Pirahesh et al. (1992). The view keeps the item number and vendor names for an item that vendors have supplied since the year 85. CREATE VIEW itpv AS (SELECT DISTINCT itp.itemn, pur.vendn FROM itp, pur WHERE itp.ponum = pur.ponum AND pur.odate > 85 ) SELECT itm.itemn, itpv.vendn FROM itm, itpv

8 8 For the view itpv pur For the query itpv itm pur ponum itemn... vendn odate >85... itp itpv a itemn vendn =<itemn<20... itm itpv ponum itemn... vendn odate >85... b itp 1=<itemn<20 Fig. 6. a The CHG for the original query in Example 4.3. b The CHG merged from a WHERE itm.itemn = itpv.itemn AND itm.itemn >= 01 AND itm.itemn < 20 The merged hypergraph is shown in Fig. 6. From experimental results shown in Pirahesh et al. (1992), current database systems do not evaluate such queries efficiently. The main causes of inefficiencies are the same as nested queries: (1) tuple-oriented value-passing between the query and views; (2) restricted operations reordering among the query and views. Magic sets methods (Mumick et al. 1990b; Pirahesh et al. 1992) have been proposed to rewrite queries using views. In our approach, we first obtain a CHG merging from the query and views, then use the same strategy as nested queries. The CHG approach removes the causes of inefficiency. Similar to nested queries, this method integrates cost estimates and evaluation methods and can find the most efficient QEP after an exhaustive search. 5 Binding propagations and magic sets methods In Sect. 5.1, we simulate magic set methods in a CHG. The maximum SIPS strategy is proposed in Sect In Sect. 5.3, we enumerate QEPs incorporation with the maximum SIPS strategy. 5.1 Simulate magic sets methods Magic set methods were proposed to optimize recursive datalog programs (Bancilhon et al. 1986; Beeri and Ramakrishnan 1991; Ullman 1989). Magic sets methods use binding information in a query to rewrite the program and query so that irrelevant tuples are not generated during query evaluation. Note that the relevance here, which refers to a subset of a relation, is different from the relevance for projection in Sect. 3, which refers to a subset of attributes in a relation. One key idea of magic sets methods is SIPS, which mean passing binding information in a query or relation to other relations without computing the relation fully. Mumick et al. (1990a,b,c) observed that Kim s transformations, semijoin methods, and magic sets methods share a common heuristic: filtering out irrelevant tuples in query evaluation. They applied magic sets methods to relational queries and showed that the former are often more powerful and more efficient than other program transformations. SIPS and magic sets methods may be simulated elegantly in the CHG representation. Consider Example 2.1 again. Suppose we first evaluate emp. As discussed in Sect. 3, we may use the binding Job = Sr Programmer to restrict tuples in emp so that only tuples of senior programmers are constructed for the transient relation. However, even among these tuples, many perhaps work at a department not located in San Jose. These tuples are also irrelevant to the final answer and it is more efficient if they are removed. To do so, we may first find those departments that are located in San Jose. This can be achieved by a look-up of the dept relation and find the corresponding values of Dno as a set {Dno}. Now there are two conditions on the emp relation, Job = Sr Programmer and {Dno}. One condition may be used to retrieve relevant tuples in emp and the other to restrict the tuples. Which choice is more efficient depends on the indices and on the values. This may be incorporated into the search dimension on access paths to emp. Definition 5.1. Suppose R 1 has some bindings and intersects with other hyperedges. R 1 may be evaluated partially as follows. (1) Apply the bindings to relation R 1 ; this results in a temporary relation T. (2) Perform semijoins of T onto the intersecting hyperedges, which results in subtransient relations, known as magic relations or magic sets for unity arities. (We borrow the term magic from magic sets methods.) In the above example the first step evaluates the dept relation partially, which results in a set (the magic set) {Dno}. If the first step evaluates dept fully, then both Dno and Mgr are relevant to the query and should be kept in the transient relation (Dno, Mgr), which is a procedure already considered in the search space in Sect. 3. The magic set {Dno} has the minimum size to pass sideways to the emp relation. Hyperedge dept is evaluated twice, and only partially the first time. Partial evaluation separates magic sets methods from other heuristics in Sect. 3. Partial evaluation of a relation hyperedge is similar to partial evaluation for an association hyperedge before; both evaluate a hyperedge partially and project the result onto a relation scheme for further evaluation. The difference is that, for an association hyperedge, partial evaluation may have to be used, because the hyperedge may be unevaluable, while here partial evaluation is used for the purpose of optimization. The above query processing strategy is quite complex. Let us understand why it may improve efficiency. If all bindings in a query are in the same base relation, then the push-

9 9 selections heuristic together with a search on the access paths (discussed in Sect. 3) may be sufficient. If bindings appear in different relations, then it might be beneficial to perform a semijoin to pass the binding information in a relation, say, R 1, to another relation, R 2. Result of the semijoin may then be used to restrict R 2, which reduces the size of R 2. (Note that this in fact introduces an additional join. However, cost of such a join is in the order of a selection, since one relation scheme encloses the other scheme completely.) The reduced R 2 is then joined with R 1. Cost of the final join is usually less than that of the original join. We may also view the above strategy in a different way. Computation cost in query evaluation may be classified into three levels. Level 0 is at the scheme level, which costs almost nothing when compared with other operations. Level 1 includes selections and projections, which involves one relation and is not expensive. Level 2 includes joins and products and is the most expensive. In general, it pays to perform operations at a lower level as much as possible to reduce costs at a higher level. Magic sets methods require more low-level operations, but save the costs of some joins. 5.2 Maximum SIPS For a not too simple query, there may be a few bindings and many ways for SIPS. We introduce a new graph named cograph to illustrate SIPS paths. Definition 5.2. A cograph is an ordinary undirected graph for a CHG. Each relation or association hyperedge corresponds to one node in the cograph. An edge is drawn in the cograph between two nodes if their corresponding hyperedges intersect. We indicate a binding or bindings for a relation by adding a prime to the relation. Definition 5.3. By a path in a CHG we refer to a sequence of relation hyperedges, association hyperedges, and sets of nodes (intersections of hyperedges) which maps one-to-one to a path in its cograph. Definition 5.4. A SIPS path is a path in a cograph that starts from a relation (node) with some bindings, known as the initial relation, and ends at a relation (node), known as the destination relation. For example, in Example 2.1, there are two SIPS paths: one from emp to dept and another from dept to emp. Definition 5.5. A SIPS operation is a sequence of semijoins along a SIPS path (partial evaluations) that results in a subrelation (known as the magic relation, a magic set if its arity is one) of the destination relation. Definition 5.6. The maximum SIPS strategy maximizes the effects of SIPS on a relation by performing SIPS operations for all bindings in the query and over all SIPS paths for this relation. Let us focus on one destination relation R. Bindings of other relations may be passed sideways to R. There can be many SIPS paths. SIPS from different bindings and paths usually have different effects on R. If a cograph contains cycles, there can be an infinite number of SIPS paths. To solve this problem, we note that a cyclic SIPS path is not useful as far as SIPS are concerned. A magic relation generated by an acyclic SIPS path contains the magic relations generated by those cyclic SIPS paths that correspond to the acyclic one. To preserve the completeness of the answer, only acyclic SIPS paths need to be used for SIPS operations. Example 5.1. To illustrate various SIPS paths, consider a query on abstract relations: R 1 (A, B, C, D), R 2 (E, F, G, H), R 3 (J, K, L). SELECT C FROM R 1, R 2, R 3 WHERE R 1.A = a AND R 1.C = R 2.E AND R 1.D = R 2.F AND R 1.A = R 3.J AND R 2.H = R 3.L AND R 3.K = k Its CHG is drawn in Fig. 7 and its cograph in Fig. 8. There are two bindings in the query: R 1.A = a in R 1 and R 3, and R 3.K = k in R 3. The potential SIPS paths for R 2 are R 1 R 2, R 3 R 2, R 3 R 1 R 2 (R 1 R 3 R 2 is ignored, because the binding on R 1 is contained by the bindings on R 3 ). We need not consider cyclic SIPS paths, e.g., R 1 R 2 R 3 R 1 R 2. Consider the SIPS path R 1 R 2. Since R 1 and R 2 intersect on two attributes, C, D, the SIPS operation gives a magic relation (C, D) for R 2. The SIPS path R 3 R 2 determines a magic set (H) for R 2. Note that both (H) and (C, D) restrict R 2. In fact, we may take the benefit of all bindings if the SIPS operations keep the magic relation (C, D, H) for R 2. The above example has an interesting feature. Different SIPS paths for the same destination relation may give magic relations of different relational schemes, all being subrelations of the destination relation. A natural join of these magic relations results in a large magic relation. For the above example, this is in fact a cartesian product. However, we cannot claim that this simple approach finds the most restrictive magic relation, since information could be lost when semijoins are performed on SIPS paths. Further research is required on finding the most restrictive magic relation efficiently. The term maximum SIPS strategy does not mean that the obtained magic relation is the most restrictive. It simply means that SIPS are carried out for all bindings and all SIPS paths. Example 5.2. Let us see how the maximum SIPS strategy will process the query in Example 5.1. Consider the event sequence S = {R 1, R 2, R 3 }. For R 1, there is one SIPS path R 3 R 2 R1 (another SIPS path R 3 R 1 is ignored since the constraint is contained by the binding on R 1 ). The magic relation is (C, D). For R 2, the SIPS paths have been given in Example 5.1. For R 3, there is one SIPS path R 1 R 2 R 3. Other possible event sequences are {R 1, R 3, R 2 }, {R 2, R 1, R 3 }, {R 2, R 3, R 1 }, {R 3, R 1, R 2 }, {R 3, R 2, R 1 }. It is not difficult to implement the maximum SIPS strategy. Let the event sequence be S. Every relation R in S can be a destination relation. For a node corresponding to

10 10 A=a K=k R3 R1 A B K H G R1 C D R2 R2 R3 Fig. 7. The CHG for Example 5.1 Fig. 8. The cograph for Fig. 7 a destination relation, we first find all acyclic SIPS paths in the cograph. Various graph algorithms, including depth-first search, may be used for this purpose. SIPS paths in the CHG are easy to obtain, since there is a one-to-one mapping between paths in the CHG and paths in the cograph. Clearly, given a CHG with bindings, we may carry out the maximum SIPS strategy. Example 5.3. As a practical example on SIPS, consider the query in Example 4.2. Let us first pass the binding on emp1 to emp2. A partial evaluation of emp1 with binding Job = Sr Programmer gives the magic set {Dno}. {Dno} is used to evaluate emp2, which results in a transient relation (Avg(Sal), Dno) after projecting out irrelevant attributes. The transient relation is then joined with emp1, followed by evaluation of the association hyperedge. Finally, a projection is used to obtain the answer. This gives the same procedure as program (M) in Mumick et al. (1990b). This example has only one binding and one SIPS path. Algorithm 5.1. (search space with maximum SIPS) (1) order relation and association hyperedges into lists {L 1 }; (2) for each list L 1 do (3) for each hyperedge do (4) if it is an unevaluable association hyperedge A i (5) then add A i to L 1 after the last hyperedge below its stratum; (6) the result is lists {L 0 1 }; (7) for each list L 0 1 do (8) for each relation hyperedge H in L 0 1 do begin (9) find all acyclic SIPS paths; /* for maximum SIPS */ (10) perform semijoins for these SIPS paths; /* SIPS */ (11) end (12) pipeline conditions and condition hyperedges; (13) the result is lists {L 1 }; /* go to line 8 of Fig. 5 */ Fig. 9. An algorithm to perform maximum SIPS 5.3 A method to enumerate QEPs Not only SIPS are simple in the CHG representation, they can also be integrated easily with other query optimization strategies. In particular, it may be combined with the processing strategy for nested queries and for queries using views, and with the search space. To enumerate QEPs, only minor changes are needed for Fig. 5. After the order of relation and association hyperedges has been decided, we may perform the maximum SIPS strategy for each relation. Such an algorithm is given in Fig. 9. It should be used in conjunction with Fig. 5. In Fig. 9, lines 1 5 are the same as in Fig. 5. Line 9 finds all acyclic SIPS paths (as discussed before, depth-first searches may be used for this purpose). Line 10 performs SIPS for all SIPS paths. The output of Fig. 9 should be connected to line 8 of Fig. 5. We might consider SIPS of various degrees, ranging from none to the maximum SIPS strategy, for all relation hyperedges and add another search dimension to the search space. However, this is unlikely to be useful in practice. Our view is that an implementation will likely use either the maximum SIPS strategy or none at all for simplicity. A word of caution on SIPS. SIPS might not offer great savings if pipelining is used for magic relations. The reason is similar to that for projections (Sect. 3 and Han 1995). SIPS are based on two ideas: push-selections and semijoins. The benefit of semijoins is greatly reduced if pipelining is used. Without semijoins, the other idea of SIPS is fully accounted for by methods discussed in Sect. 3. A further study on cost estimate with memory management is required to determine when semijoins and SIPS are beneficial. In the above discussion, bindings of type A = a are implied. It is also possible to apply SIPS using a restriction of type A > a(a < a, A a,...) similar to Mumick et al. (1990a). Such SIPS often gain less in efficiency. This is because a condition like B > 500 often restricts a relation by a small factor, while a selection A = a may restrict a relation by a larger factor. For example, if only 5% of all employees are senior programmers, then the selection Job = Sr Programmer restricts the relation emp by a factor of 20. In summary, the method proposed here is a simple, elegant alternative to magic sets methods for relational queries. This new method has the following advantages over magic sets methods proposed earlier. (1) The new method is relatively intuitive. It is easier to implement. (2) Unlike magic sets methods which may introduce many rules, the new method is simpler and more efficient implementations are possible. (3) It uses the maximum SIPS strategy as discussed above. The earlier magic sets methods, in fact, choose one SIPS path if there are more than one. The new method gives smaller magic sets, and thus, is more efficient. (4) It incorporates SIPS with the search space, thus solving the interplay problem of cost estimates and program transformations mentioned in Mumick et al. (1990b). (5) The undesirable effect of generating recursions from nonrecursive programs (Mumick et al. 1990b) do not occur here.

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Chapter 3. Algorithms for Query Processing and Optimization

Chapter 3. Algorithms for Query Processing and Optimization Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E)

QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E) QUERY PROCESSING & OPTIMIZATION CHAPTER 19 (6/E) CHAPTER 15 (5/E) 2 LECTURE OUTLINE Query Processing Methodology Basic Operations and Their Costs Generation of Execution Plans 3 QUERY PROCESSING IN A DDBMS

More information

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Query Processing. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Query Processing A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in four lectures. In case you

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

SQL Queries. COSC 304 Introduction to Database Systems SQL. Example Relations. SQL and Relational Algebra. Example Relation Instances

SQL Queries. COSC 304 Introduction to Database Systems SQL. Example Relations. SQL and Relational Algebra. Example Relation Instances COSC 304 Introduction to Database Systems SQL Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca SQL Queries Querying with SQL is performed using a SELECT statement. The general

More information

Chapter 14: Query Optimization

Chapter 14: Query Optimization Chapter 14: Query Optimization Database System Concepts 5 th Ed. See www.db-book.com for conditions on re-use Chapter 14: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 13: Query Optimization. Chapter 13: Query Optimization

Chapter 13: Query Optimization. Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Equivalent Relational Algebra Expressions Statistical

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Query Processing SL03

Query Processing SL03 Distributed Database Systems Fall 2016 Query Processing Overview Query Processing SL03 Distributed Query Processing Steps Query Decomposition Data Localization Query Processing Overview/1 Query processing:

More information

COSC 304 Introduction to Database Systems SQL. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 304 Introduction to Database Systems SQL. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 304 Introduction to Database Systems SQL Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca SQL Queries Querying with SQL is performed using a SELECT statement. The general

More information

Database System Concepts

Database System Concepts Chapter 14: Optimization Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth and Sudarshan.

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 19 Query Optimization

Chapter 19 Query Optimization Chapter 19 Query Optimization It is an activity conducted by the query optimizer to select the best available strategy for executing the query. 1. Query Trees and Heuristics for Query Optimization - Apply

More information

System R Optimization (contd.)

System R Optimization (contd.) System R Optimization (contd.) Instructor: Sharma Chakravarthy sharma@cse.uta.edu The University of Texas @ Arlington Database Management Systems, S. Chakravarthy 1 Optimization Criteria number of page

More information

QUERY OPTIMIZATION [CH 15]

QUERY OPTIMIZATION [CH 15] Spring 2017 QUERY OPTIMIZATION [CH 15] 4/12/17 CS 564: Database Management Systems; (c) Jignesh M. Patel, 2013 1 Example SELECT distinct ename FROM Emp E, Dept D WHERE E.did = D.did and D.dname = Toy EMP

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Advanced Databases. Lecture 4 - Query Optimization. Masood Niazi Torshiz Islamic Azad university- Mashhad Branch Advanced Databases Lecture 4 - Query Optimization Masood Niazi Torshiz Islamic Azad university- Mashhad Branch www.mniazi.ir Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design Administrivia Physical Database Design R&G Chapter 16 Lecture 26 Homework 5 available Due Monday, December 8 Assignment has more details since first release Large data files now available No class Thursday,

More information

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and Chapter 6 The Relational Algebra and Relational Calculus Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 Outline Unary Relational Operations: SELECT and PROJECT Relational

More information

QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM

QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM QUERY OPTIMIZATION FOR DATABASE MANAGEMENT SYSTEM BY APPLYING DYNAMIC PROGRAMMING ALGORITHM Wisnu Adityo NIM 13506029 Information Technology Department Institut Teknologi Bandung Jalan Ganesha 10 e-mail:

More information

Other Relational Query Languages

Other Relational Query Languages APPENDIXC Other Relational Query Languages In Chapter 6 we presented the relational algebra, which forms the basis of the widely used SQL query language. SQL was covered in great detail in Chapters 3 and

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14 Query Optimization Chapter 14: Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Chapter 14 Query Optimization

Chapter 14 Query Optimization Chapter 14: Query Optimization Chapter 14 Query Optimization! Introduction! Catalog Information for Cost Estimation! Estimation of Statistics! Transformation of Relational Expressions! Dynamic Programming

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Database Design and Tuning

Database Design and Tuning Database Design and Tuning Chapter 20 Comp 521 Files and Databases Spring 2010 1 Overview After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for

More information

Introduction Alternative ways of evaluating a given query using

Introduction Alternative ways of evaluating a given query using Query Optimization Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational Expressions Dynamic Programming for Choosing Evaluation Plans Introduction

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan Plan for today Query Processing/Optimization CPS 216 Advanced Database Systems Overview of query processing Query execution Query plan enumeration Query rewrite heuristics Query rewrite in DB2 2 A query

More information

Parser: SQL parse tree

Parser: SQL parse tree Jinze Liu Parser: SQL parse tree Good old lex & yacc Detect and reject syntax errors Validator: parse tree logical plan Detect and reject semantic errors Nonexistent tables/views/columns? Insufficient

More information

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W.

Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, MAINTENANCE OF RECURSIVE VIEWS. Suzanne W. Encyclopedia of Database Systems, Editors-in-chief: Özsu, M. Tamer; Liu, Ling, Springer, 2009. MAINTENANCE OF RECURSIVE VIEWS Suzanne W. Dietrich Arizona State University http://www.public.asu.edu/~dietrich

More information

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is

More information

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart Physical Database Design and Tuning R&G - Chapter 20 Although the whole of this life were said to be nothing but a dream and the physical world nothing but a phantasm, I should call this dream or phantasm

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization References Access path selection in a relational database management system. Selinger. et.

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Friday Nights with Databases!

Friday Nights with Databases! Introduction to Data Management Lecture #22 (Physical DB Design) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 It s time again for... Friday

More information

Physical Database Design and Tuning. Chapter 20

Physical Database Design and Tuning. Chapter 20 Physical Database Design and Tuning Chapter 20 Introduction We will be talking at length about database design Conceptual Schema: info to capture, tables, columns, views, etc. Physical Schema: indexes,

More information

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques 376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list

More information

Semantic Errors in Database Queries

Semantic Errors in Database Queries Semantic Errors in Database Queries 1 Semantic Errors in Database Queries Stefan Brass TU Clausthal, Germany From April: University of Halle, Germany Semantic Errors in Database Queries 2 Classification

More information

Relational Query Optimization. Highlights of System R Optimizer

Relational Query Optimization. Highlights of System R Optimizer Relational Query Optimization Chapter 15 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Highlights of System R Optimizer v Impact: Most widely used currently; works well for < 10 joins.

More information

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University Query Optimization Shuigeng Zhou December 9, 2009 School of Computer Science Fudan University Outline Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational

More information

Traditional Query Optimization

Traditional Query Optimization Chapter 2 Traditional Query Optimization This chapter sets the stage for the work covered in the rest of the thesis. Section 2.1 gives a brief overview of the important concerns and prior work in traditional

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L11: Physical Database Design Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR, China

More information

CS122 Lecture 10 Winter Term,

CS122 Lecture 10 Winter Term, CS122 Lecture 10 Winter Term, 2014-2015 2 Last Time: Plan Cos0ng Last time, introduced ways of approximating plan costs Number of rows each plan node produces Amount of disk IO the plan must perform Database

More information

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until

Databases. Relational Model, Algebra and operations. How do we model and manipulate complex data structures inside a computer system? Until Databases Relational Model, Algebra and operations How do we model and manipulate complex data structures inside a computer system? Until 1970.. Many different views or ways of doing this Could use tree

More information

Indices. We consider B-Trees only

Indices. We consider B-Trees only We consider B-Trees only key attributes: a 1,..., a n data attributes: d 1,..., d m Often: one special data attribute holding the TID of a tuple Some notions: simple/complex key unique/non-unique index

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

Query Processing Strategies and Optimization

Query Processing Strategies and Optimization Query Processing Strategies and Optimization CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/25/12 Agenda Check-in Design Project Presentations Query Processing Programming Project

More information

Key Points. COSC 122 Computer Fluency. Databases. What is a database? Databases in the Real-World DBMS. Database System Approach

Key Points. COSC 122 Computer Fluency. Databases. What is a database? Databases in the Real-World DBMS. Database System Approach COSC 122 Computer Fluency Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Key Points 1) allow for easy storage and retrieval of large amounts of information. 2) Relational

More information

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation.

DBMS Y3/S5. 1. OVERVIEW The steps involved in processing a query are: 1. Parsing and translation. 2. Optimization. 3. Evaluation. Query Processing QUERY PROCESSING refers to the range of activities involved in extracting data from a database. The activities include translation of queries in high-level database languages into expressions

More information

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog

Chapter 5: Other Relational Languages.! Query-by-Example (QBE)! Datalog Chapter 5: Other Relational Languages! Query-by-Example (QBE)! Datalog 5.1 Query-by by-example (QBE)! Basic Structure! Queries on One Relation! Queries on Several Relations! The Condition Box! The Result

More information

Chapter 5: Other Relational Languages

Chapter 5: Other Relational Languages Chapter 5: Other Relational Languages Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 5: Other Relational Languages Tuple Relational Calculus Domain Relational Calculus

More information

The Relational Algebra

The Relational Algebra The Relational Algebra Relational Algebra Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) 27-Jan-14

More information

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection Outline Query Processing Overview Algorithms for basic operations Sorting Selection Join Projection Query optimization Heuristics Cost-based optimization 19 Estimate I/O Cost for Implementations Count

More information

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] 1. What is DBMS? A Database Management System (DBMS) is a program that controls creation, maintenance and use

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 7 - Query optimization CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 7 - Query optimization Announcements HW1 due tonight at 11:45pm HW2 will be due in two weeks You get to implement your own

More information

Deductive Databases. Motivation. Datalog. Chapter 25

Deductive Databases. Motivation. Datalog. Chapter 25 Deductive Databases Chapter 25 1 Motivation SQL-92 cannot express some queries: Are we running low on any parts needed to build a ZX600 sports car? What is the total component and assembly cost to build

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Lecture #16 (Physical DB Design)

Lecture #16 (Physical DB Design) Introduction to Data Management Lecture #16 (Physical DB Design) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework info:

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser.

Query Processing: an Overview. Query Processing in a Nutshell. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. QUERY. Parser. .. CSC 468 DBMS Implementation Alexander Dekhtyar.. Query Processing: an Overview Query Processing in a Nutshell QUERY Parser Preprocessor Logical Query plan generator Logical query plan Query rewriter

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

Data on External Storage

Data on External Storage Advanced Topics in DBMS Ch-1: Overview of Storage and Indexing By Syed khutubddin Ahmed Assistant Professor Dept. of MCA Reva Institute of Technology & mgmt. Data on External Storage Prg1 Prg2 Prg3 DBMS

More information

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database

The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database query processing Query Processing The query processor turns user queries and data modification commands into a query plan - a sequence of operations (or algorithm) on the database from high level queries

More information

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag. Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE

More information

COSC Dr. Ramon Lawrence. Emp Relation

COSC Dr. Ramon Lawrence. Emp Relation COSC 304 Introduction to Database Systems Normalization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Normalization Normalization is a technique for producing relations

More information

Module 9: Query Optimization

Module 9: Query Optimization Module 9: Query Optimization Module Outline Web Forms Applications SQL Interface 9.1 Outline of Query Optimization 9.2 Motivating Example 9.3 Equivalences in the relational algebra 9.4 Heuristic optimization

More information

SQL STRUCTURED QUERY LANGUAGE

SQL STRUCTURED QUERY LANGUAGE STRUCTURED QUERY LANGUAGE SQL Structured Query Language 4.1 Introduction Originally, SQL was called SEQUEL (for Structured English QUery Language) and implemented at IBM Research as the interface for an

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions.

Relational Model History. COSC 304 Introduction to Database Systems. Relational Model and Algebra. Relational Model Definitions. COSC 304 Introduction to Database Systems Relational Model and Algebra Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Relational Model History The relational model was

More information

10. Record-Oriented DB Interface

10. Record-Oriented DB Interface 10 Record-Oriented DB Interface Theo Härder wwwhaerderde Goals - Design principles for record-oriented and navigation on logical access paths - Development of a scan technique and a Main reference: Theo

More information

Lecture 19: Query Optimization (1)

Lecture 19: Query Optimization (1) Lecture 19: Query Optimization (1) May 17, 2010 Dan Suciu -- 444 Spring 2010 1 Announcements Homework 3 due on Wednesday in class How is it going? Project 4 posted Due on June 2 nd Start early! Dan Suciu

More information

Relational Algebra. Procedural language Six basic operators

Relational Algebra. Procedural language Six basic operators Relational algebra Relational Algebra Procedural language Six basic operators select: σ project: union: set difference: Cartesian product: x rename: ρ The operators take one or two relations as inputs

More information

Query processing and optimization

Query processing and optimization Query processing and optimization These slides are a modified version of the slides of the book Database System Concepts (Chapter 13 and 14), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan.

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

Chapter 13: Query Optimization

Chapter 13: Query Optimization Chapter 13: Query Optimization Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 13: Query Optimization Introduction Transformation of Relational Expressions Catalog

More information

Relational Model: History

Relational Model: History Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s

More information

Objectives. After completing this lesson, you should be able to do the following:

Objectives. After completing this lesson, you should be able to do the following: Objectives After completing this lesson, you should be able to do the following: Describe the types of problems that subqueries can solve Define subqueries List the types of subqueries Write single-row

More information

Subquery: There are basically three types of subqueries are:

Subquery: There are basically three types of subqueries are: Subquery: It is also known as Nested query. Sub queries are queries nested inside other queries, marked off with parentheses, and sometimes referred to as "inner" queries within "outer" queries. Subquery

More information

Optimization Overview

Optimization Overview Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:

More information

Silberschatz, Korth and Sudarshan See for conditions on re-use

Silberschatz, Korth and Sudarshan See   for conditions on re-use Chapter 3: SQL Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 3: SQL Data Definition Basic Query Structure Set Operations Aggregate Functions Null Values Nested

More information

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag. Physical Design D B M G 1 Phases of database design Application requirements Conceptual design Conceptual schema Logical design ER or UML Relational tables Logical schema Physical design Physical schema

More information

Graduate Examination. Department of Computer Science The University of Arizona Spring March 5, Instructions

Graduate Examination. Department of Computer Science The University of Arizona Spring March 5, Instructions Graduate Examination Department of Computer Science The University of Arizona Spring 2004 March 5, 2004 Instructions This examination consists of ten problems. The questions are in three areas: 1. Theory:

More information

Query Processing and Optimization *

Query Processing and Optimization * OpenStax-CNX module: m28213 1 Query Processing and Optimization * Nguyen Kim Anh This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Query processing is

More information

Lecture 21: Query Optimization (1)

Lecture 21: Query Optimization (1) Lecture 21: Query Optimization (1) November 17, 2010 1 Administrivia (Preview for Friday) For project 4, students are expected (but not required) to work in pairs. Ideally you should pair up by end of

More information

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra.

Review. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) Where We Are. Relational Algebra. Relational Algebra. Administrivia (Preview for Friday) Lecture 21: Query Optimization (1) November 17, 2010 For project 4, students are expected (but not required) to work in pairs. Ideally you should pair up by end of day

More information

COSC 122 Computer Fluency. Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

COSC 122 Computer Fluency. Databases. Dr. Ramon Lawrence University of British Columbia Okanagan COSC 122 Computer Fluency Databases Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca Key Points 1) Databases allow for easy storage and retrieval of large amounts of information.

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Relational Model, Relational Algebra, and SQL

Relational Model, Relational Algebra, and SQL Relational Model, Relational Algebra, and SQL August 29, 2007 1 Relational Model Data model. constraints. Set of conceptual tools for describing of data, data semantics, data relationships, and data integrity

More information

Why Relational Databases? Relational databases allow for the storage and analysis of large amounts of data.

Why Relational Databases? Relational databases allow for the storage and analysis of large amounts of data. DATA 301 Introduction to Data Analytics Relational Databases Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca DATA 301: Data Analytics (2) Why Relational Databases? Relational

More information