BlossomTree: Evaluating XPaths in FLWOR Expressions

Size: px
Start display at page:

Download "BlossomTree: Evaluating XPaths in FLWOR Expressions"

Transcription

1 BlossomTree: Evaluating XPaths in FLWOR Expressions Ning Zhang University of Waterloo School of Computer Science Shishir K. Agrawal Indian Institute of Technology, Bombay Department of Computer Science M. Tamer Özsu University of Waterloo School of Computer Science Technical Report CS November 2004 Research supported in part by grants from Communications and Information Technology Ontario and from the Natural Sciences and Engineering Research Council (NSERC) of Canada. This research was done while the author was visiting the University of Waterloo.

2 Abstract Efficient evaluation of path expressions has been studied extensively. However, evaluating more complex FLWOR expressions that contain multiple path expressions has not been well studied. In this paper, we propose a novel pattern matching approach, called BlossomTree, to evaluate a FLWOR expression that contains correlated path expressions. BlossomTree is a formalism to capture the semantics of the path expressions and their correlations. These correlations include variable referencing, structural relationship, value-based relationship, or a mixture of structural and value-based relationship. We propose a general algebraic framework (abstract data types and logical operators) to evaluate BlossomTree pattern matching that facilitates efficient evaluation and experimentation. We design efficient data structures and algorithms to implement the abstract data types and logical operators. Our experimental studies demonstrate that the BlossomTree pattern matching approach can generate highly efficient query plans. 1 Introduction XML data management, especially XML query processing, has attracted considerable research and development activity in the past few years. In XML query languages, in particular XQuery [6], a path expression is arguably the most natural way to retrieve nodes from tree structured data, such as XML documents. Path expressions allow us to traverse XML trees in different directions (termed axes ), to filter tree nodes by specifying predicates on tag names and values, and to return tree nodes. By analogy, path expressions have a similar role in XML query processing as selection operations have in relational query processing. In a more general setting, e.g., XQuery FLWOR (for-let-where-order-by-return) expressions, path expressions are usually used as building blocks for retrieving relevant nodes from XML documents. The intermediate results of these path expressions can be bound to variables, which are further referenced by other expressions. Example 1 Consider the following query, excerpted from XQuery Use Cases [8] (with minor modifications), that extracts, from a bibliography document, all pairs of distinct books that are written by the same list of authors: <bib> { for $book1 in doc("bib.xml")//book, $book2 in doc("bib.xml")//book let $aut1 := $book1/author let $aut2 := $book2/author where $book1 << $book2 and not($book1/title = $book2/title) and deep-equal($aut1, $aut2) return <book-pair> { $book1/title } { $book2/title } </book-pair> } </bib> There are 18 path expressions in this FLWOR expression, where two of them (both doc()//book s) operate on the same XML document, while others operate on the variable bindings of these two path expressions (and the variable bindings thereof). The node comparison operator << compares the document order of two nodes. The deep-equal() is a function that compares two subtrees. Since all the path expressions in the above FLWOR expression are related to each other, we call them correlated path expressions. More formally, two path expressions are correlated if they are related by variable references, structural relationships (e.g., <<), value-based relationships (e.g., not =), or value and structural mixed relationships (e.g., deep-equal). A straightforward approach to evaluating correlated path expressions in such a complex FLWOR expression is to follow the semantics of FLWOR expression and evaluate the path

3 bib.xml // ~ ~ << // ($book1) book book ($book2) not = ($aut1) author title ($t1) ($t2) title author ($aut2) deep equal Figure 1: A BlossomTree for the query in Example 1 expressions for each iteration in the for-loop. However, this approach may be very inefficient, due to the redundancy during the loop and the costs of evaluating path expressions. To tackle this problem, we first formalize this FLWOR expression into a BlossomTree 1 as shown in Figure 1. The BlossomTree is an annotated directed graph that captures all the semantics specified in the FLWOR expression. A vertex in the graph represents a tag-name and value constraints in a path expression. Variables can be bound to any vertex, in which case the vertex is called a blossom. An edge represents the (structural, value-based, or mixed) relationship between tree nodes that matches the two vertices. Among these edges, solid lines represent tree edges that are specified by path expressions, while dashed lines represent crossing edges that are generated by operators on variables bindings in the where-clause. To avoid clogging the graph, tree edges are not shown as directed, but their directions follow the hierarchical structure. We shall give a formal definition and semantics for BlossomTree pattern matching in Section 3.1. Given such a BlossomTree, the problem of evaluating FLWOR expressions can be reduced to matching the BlossomTree against an XML document. The focus of this paper is, therefore, to define a general algebraic framework for evaluating the BlossomTree and to develop optimization techniques based on this framework. Since BlossomTree is a generalization of pattern tree for path expressions, the evaluation of BlossomTree can be extended from previous research on pattern trees [17, 4, 20, 2, 7, 22]. There are basically three approaches to evaluating tree pattern matching: navigational approach, join-based approach and hybrid approach 2. In this paper, we extend the hybrid approach [22] to a broader environment including FLWOR expressions. The objective of this research is to find out the pros and cons between the navigational and join-based approaches and try to tradeoff between them. In summary, our contributions are as follows: 1. We propose a general algebraic framework (abstract data types and operators) for a substantial subset of FLWOR expressions. 2. We propose physical implementations of the abstract data type and its operators. In particular, data structures are proposed to support efficient implementation of operators. 3. The implementation of two major operators generalized NoK pattern matching and structural joins are discussed in detail. We propose a suite of join algorithms that exploit the properties of the input data. 4. We implement the techniques and conduct experiments based on our framework and join algorithms. The remaining of the paper is organized as follows: we first introduce the background and related work in Section 2. The algebraic framework is presented in Section 3, where abstract data types are introduced in Section 3.2, and operators on these data types are presented in Section 3.3. The design of data structures and implementation of physical operators are introduced in Section 4. Implementation and experimental results are presented in Section 5. At last, we conclude in Section 6. 1 The name BlossomTree is inspired by FLWOR and multiple returning nodes. 2 We shall introduce these three approaches in detail in Section 2.

4 2 Background and Related Work In this section, we briefly introduce the existing approaches to evaluating path expressions. Previous work on evaluating FLWOR expressions are also presented and compared with our approach. 2.1 Evaluating Path Expressions The problem of evaluating path expressions against XML trees can be formalized as the tree pattern matching (TPM) problem (a.k.a. twig query): a path expression can be modeled as a pattern tree that specifies a set of constraints. The solution is to find all nodes in the XML tree that satisfy all the constraints, and return all those that match a returning node specified in the pattern tree. Previous research on the evaluation and optimization of path expressions fall into three classes. A navigational approach traverses the tree structure and tests whether a tree node satisfies the constraints specified by the path expression [17, 4]. For example, the algorithm in [4] traverses the XML tree in pre-order (document order). This traversal is either through SAX event callbacks or is supported by the underlying storage system. A data structure called work array is established and maintained to keep track of which pattern tree nodes have been matched and which need to be matched for the incoming SAX events or XML tree nodes. This algorithm is very efficient when the document is non-recursive (i.e., an element could not be the descendant of itself). However, for recursive documents, the presence of // could result in very large memory requirement (in the order of the maximum number of recursive degree) [3]. Another approach is to first select, for each pattern tree node, a list of XML tree nodes that satisfy the constraints associated with it, and then pairwise join the lists based on their structural relationships (e.g., parent-child, ancestor-descendant, etc.) [20, 2, 7, 16]. Using proper labeling techniques [11, 9, 18], TPM can be evaluated reasonably efficiently by various join techniques (merge join [20], stack-based structural join [2], and holistic twig joins [7]). We call this the join-based approach. Join-based approach is scalable to large document and query size, by relying on the secondary storage to store intermediate results. Advanced algorithms (e.g., [2, 7]), which take advantage of the existence of tag-name indexes, can reduce the need to store intermediate results onto secondary storage, but the memory requirement is proportional to the maximum depth of the input document tree. A major disadvantage of the join-based approache is the update problem. Since it does not rely on navigation to discover the structural relationships between nodes, encodings of the structural relationship must be pre-computed and inserted into the indexes. This precomputation can be thought of as a materialization process of the structural relationship. Thus, the joinbased approach inherits the update problem associated with materialized views [5] (e.g., if a single element is inserted or deleted, the encodings of its subtree or all following nodes in the document may need to be recomputed). We proposed a hybrid approach [22] to combine the advantages and avoid the disadvantages of the navigational and join-based approaches. The key idea is to, first, navigate through the XML documents to obtain intermediate results that contain structural information, and then join the intermediate results based on the structural information. In this way, we avoid the update problem since the structural information is obtained dynamically. To avoid the complexity of matching recursive documents in the navigational approach, we only navigate the tree using a subset of the path expression that will not result in a recursive matching. This methodology is implemented by first decomposing a general pattern tree into interconnected next-of-kin (NoK) pattern trees, which are special cases of a pattern tree that only consists of / and following-sibling axes. For each NoK pattern tree, a navigational NoK pattern matching operator can be evaluated highly efficiently using a single scan of the input, or index lookups. The efficiency is due to the fact that the recursive (or global ) axis // is excluded from the NoK pattern tree. The final result of the interconnected NoK pattern tree is then obtained by joining, based on their structural relationships, the intermediate results from the multiple NoK pattern matching. We had shown previously that NoK pattern matching operator itself can be evaluated very efficiently [22], but optimizing the evaluation of more general expression containing multiple NoK expressions interconnected by global operations (e.g., //, <<) has not been studied. These cases occur when multiple NoK expressions exist in the same path expression or in different for-clauses. For example, the path expression doc("bib.xml")/book[//author="smith"]/title retrieves the titles of the books that are written by Smith. These two NoK patterns are interconnected by // relationship on book and author nodes. According to the

5 definition of NoK patterns, this path expression should be partitioned into two NoK pattern trees, corresponding to doc("bib.xml")/book/title and author[.="smith"]. Assuming that both NoK pattern matching operators are evaluated using sequential scan (i.e., for each XML tree node in document order, try to match the subtree starting from that node to the NoK pattern tree), we need to scan the input XML file twice, which is not I/O optimal. The FLWOR expression given in Example 1 is another example where multiple path expressions are connected by for- and let-clauses. Each of these path expressions can be decomposed into NoK pattern trees, which operate on the same document. Combining them as a whole tree to apply the hybrid approach may save significant I/O. This is the first issue we study in this paper. The above examples motivate our first optimization technique: combining multiple NoK pattern matching operators into one scan whenever possible. Suppose a pattern tree is decomposed into two NoK pattern trees, each of which needs a scan of the (same) input XML tree. An intuitive observation is that the total number of scans of the input data can be reduced to one, if the intermediate results can be pipelined to a structural join operator without materializing them to secondary storage. We call this optimization technique pipelined NoK (corresponding to pipelined join technique). One of our objectives in this paper is, therefore, to analyze all types of structural joins 3 and determine what kinds of properties are required of the input streams in order to use the pipelined NoK technique. In fact, we will show that only some types of structural joins can use the pipelined NoK approach, while others cannot, no matter what kind of information is associated with the input nodes. However, pipelined NoK technique may not always be desirable since it requires more memory to cache intermediate results. This could be a serious problem when the input document and the path query is recursive. Therefore, we need to tradeoff between saving I/O and meeting the memory requirement. 2.2 Evaluating Path Expressions in the FLWOR Expression Path expressions as the building blocks are often embedded in FLWOR expressions and constructor expressions (cf. Example 1). Evaluating them multiple times in a for-loop would greatly degrade the overall performance. Instead, if all the path expressions can be considered as one unit, and decomposed whenever needed, it could be much more efficient. Previous research on evaluating path expressions in FLWOR expression can also be classified as navigational [12] and join-based approaches [10, 15]. These are natural extensions of the approaches to evaluating path expressions. In the navigational approach [12], all path expressions are extracted from the FLWOR expression to form a single pattern tree. The pattern tree allows multiple nodes to be marked as extraction nodes (a.k.a., returning nodes), which correspond to variable bindings. The output of the pattern matching operator is a set of tuples, each column in the tuple corresponds to a different variable binding. Since the output is a relational table, this approach is particularly desirable when the XML query processor is built based upon relational database systems. However, since, in general, the result of the tree pattern matching is another tree whose nodes are mapped from the extracting nodes, relational tables (or flat tuples) are not space efficient in representing the tree structure. For example, consider a pattern tree with three extraction nodes a, b, and c, where a is the parent of b and c. If the XML tree has a subtree a 1 with children b 1,c 1,b 2,c 2, where each a i matches a, b i matches b, and so on (throughout the paper, we use a i to represent the i-th occurrence of elementa), the resulting tuples are (a 1,b 1,c 1 ), (a 1,b 1,c 2 ), (a 1,b 2,c 1 ), and (a 1,b 2,c 2 ). This implies that the table contains redundancies due to the fact that its projection on b and c columns is actually a Cartesian product of the matches of all b and c. On the other hand, the join-based approach [10, 15] treats every output of the pattern match as a tree. Each pattern tree node could be marked as a tree logical class (TLC), such that all matches to this pattern tree node can be retrieved by this TLC. Therefore, these TLC s can be thought of as references or indexes to the matched values of the extraction nodes. In this sense, it is in the same role as the column names (or positions) in the tuple format in the navigational approach. Therefore, no structural relationships can be derived from the TLC s. Furthermore, the join-based approach also performs a structural join on each structural constraint, even though they can be answered more efficiently using navigation. When multiple path expressions are combined together, the number of structural joins will quickly become unmanageable for the optimizer to figure out an optimal join order. 3 The types are defined based on the axes in path expressions [6].

6 In this paper, we extend the hybrid approach [22] for evaluating path expressions in a broader environment with FLWOR expression. The basic idea is twofold: (1) the path expressions are first combined together to form a BlossomTree to capture the semantics of the FLWOR expression. Then the BlossomTree is divided into NoK pattern trees as in [22]. The difference is that edge-cutting may happen not only on the edge labeled with global axes, but also value-based joins. (2) When a NoK pattern tree is constructed from multiple path expressions in a FLWOR expression, it could contain multiple returning nodes (blossoms). We define an abstract data type (NestedList) to represent the results of matching NoK Pattern Tree (which may contain multiple returning nodes) against XML document. NestedList is a generalization of flat list that allows an item be another list. Different from flat tuples or TLC references, each item in the NestedList has an ID (Dewey ID) that contains the structural relationships between matches from different pattern tree nodes. Furthermore, since NestedList can be thought of as trees, it also avoids the redundancy problem associated with the flat table. In the following sections, we first introduce the abstract data type NestedList, and discuss how NestedList can be constructed from NoK Pattern Tree pattern matching. Then various operators on NestedList are presented. 3 An Algebraic Framework In this section, we will introduce the formalism to capture the semantics of FLWOR expression. The abstract data type is presented in Section 3.2. The operators on the abstract data type are introduced in Section BlossomTree The FLWOR expressions that we consider in this paper are a subset of that defined in XQuery [6], in that we only allow path expressions in the for-, let-, where-, order by- and return-clauses: FLWOR ::= ( for var in Path let var := Path ) + ( where Boolean)? ( order by Path)? return Path To capture the semantics of the restricted FLWOR expression, we introduce the formalism of BlossomTree. Definition 1 (BlossomTree) A BlossomTree is an annotated directed graph that consists of interconnected pattern trees, at least one of which has a root node. Each vertex is annotated with a tag name and optional value constraints. In addition, each vertex can be associated with a variable, called blossom. Each edge in the graph is annotated with a 2-tuple r, m, where r is a relationship between the incident vertices, and m is a matching mode ( f for mandatory and l for optional ). Based on the definition, edges can be classified into tree edges and crossing edges. The relationships r associated with tree edges are always structural relationships (/, //, etc.), and the mode m could be either f or l, representing whether the edges is contributed by a for-clause or let-clause, respectively. The relationships r associated with crossing edges can be either structural, value-based, or a mixture of structural and value-based relationships. In this case, the mode m could be f only. For instance, the BlossomTree illustrated in Figure 1 has two l edges, represented by regular solid lines. The other tree edges are annotated with f, represented by bold solid lines. The variables in the parentheses beside the nodes are bound to those nodes. Unless otherwise labeled, the default tree edge r relationship is /. Taking a BlossomTree and an XML document, the semantics of the BlossomTree pattern matching can be defined using a mapping f from the BlossomTree nodes to XML tree nodes: The root of each pattern tree is mapped to the root element of the corresponding document. For any node u in the BlossomTree, f(u) must satisfy the tag-name and value constraints associated with u.

7 If two nodes u and v in the BlossomTree are connected by a directed edge (u, v) with annotation r, m, then f(u) and f(v) must satisfy the relationship specified by r, i.e., r(f(u), f(v)) = True. If two nodes u and v are connected by an f annotated edge, f is a valid mapping iff for each non-empty f(u), there is a non-empty f(v) satisfying the above conditions. If two nodes u and v are connected by an l annotated edge, f is a valid mapping iff f(u) is non-empty, but f(v) may be empty. In this case, f(v) is defined to be an empty sequence if no XML tree nodes satisfy the above conditions. Example 2 Consider matching BlossomTree in Figure 1 against the following XML document: <bib> <book> <title> Maximum Security </title> </book> <book> <title> The Art of Computer Programming </title> <author> <last> Knuth </last> <first> Donald </first> </author> </book> <book> <title> Terrorist Hunter </title> </book> <book> <title> TeX Book </title> <author> <last> Knuth </last> <first> Donald </first> </author> </book> </bib> The output is: <bib> <book-pair> <title> Maximum Security </title> <title> Terrorist Hunger </title> </book-pair> <book-pair> <title> The Art of Computer Programming </title> <title> TeX Book </title> </book-pair> </bib> The second book-pair being returned is obvious, since all the BlossomTree nodes are matched and all constraints are satisfied. The first book-pair is in the output, because neither of these books have an author element, but the matching of them is optional (due to the l annotated edge connecting them), so both variables $aut1 and $aut2 are bound to empty sequences, which matches to each other by the definition of deep-equal() function (define in XQuery Functions and Operators [14]). Given such a BlossomTree, the hybrid approach first decomposes it into interconnected NoK Pattern Tree s. Each NoK Pattern Tree is evaluated by a navigational pattern matching operator. The intermediate results are then joined together using the join-based approach. It is straightforward to decompose a BlossomTree into interconnected NoK Pattern Trees. Algorithm 1 gives the pseudo-code to decompose a BlossomTree based on depth-first traversal. In the parameters, pt is a BlossomTree, S is a set containing the roots of the NoK Pattern Tree, and T is a set containing the non-root nodes in the current NoK Pattern Tree. Initially, S is set to be the set of all roots in pt, and T is set to be.

8 Algorithm 1 Decomposing BlossomTree into NoK s Decompose(pt : BlossomTree, S, T : Set) 1 while S 2 do extract an item u from S; 3 T ; 4 initialize t as empty NoK tree; 5 for each out-edge (u, v) s.t. v has not been visited 6 do if the label of (u, v) is a local axis 7 then set v as visited; 8 put v in T; 9 add v and edge (u, v) in t; 10 else put v in S; DFS(t,S,T); 13 output t; DFS(t : NoKBlossomTree, S, T : Set) 1 while T 2 do extract an item u from T; 3 delete u from pt; 4 for each out-edge (u, v) s.t. v has not been visited 5 do if the label of (u, v) is a local axis 6 then set v as visited; 7 put v in T; 8 add v and edge (u, v) in t; 9 else put v in S; 10 DFS(t,S,T);

9 3.2 Abstract Data Type Before introducing how the pattern matching is performed on the NoK Pattern Tree, we first introduce the abstract data type (a.k.a. sort in algebra) that represents the output of pattern matching. Note that the abstract data type introduced here is at the conceptual level. It could be implemented by different concrete data structures (introduced in Section 4) to ensure efficiency. Since the XML tree data model is very irregular and flexible, our basic idea of the design of the abstract data types is to define more regular data structures (thus operators based on it could be more efficiently implemented) that isolate the irregular tree structures. This separation allows further operations to operate on the regular data structures only. In our algebraic framework, the more regular data structure is NestedList, which is the output of NoK pattern matchings. Many operations, such as selection, projection and joins, can be applied to NestedList. Another abstract data type Env is generated when variables are bound to some specific values in the NestedList. The final XML document result is constructed from Env. The relationship between these abstract data types is illustrated in Figure 2. construction XMLTree NestedList Env NoK variable binding selection projection join Figure 2: The data flow from one abstract data type to another In this paper, we concentrate on the NestedList and operations on it. The Env abstract data type and how variables are bound are outside the scope of this paper. Interested readers are referred to [21] for a brief overview. Before giving the formal definition, we shall give an example of how the NestedList is constructed. Example 3 Consider the NoK Pattern Tree illustrated in Figure 3(a), where all nodes in the tree are returning nodes. Under the same convention, the bold edges are f annotated (mandatory matching) and regular edges are l annotated (optional matching). To be able to reference the tree node, we first (arbitrarily) fix an order on the children (since normal pattern tree is unordered), and assign each tree node a Dewey ID shown in the parenthesis. This artificial ordering does not affect the semantics of the pattern matching since it is never used to match any order (e.g., document order), but just used to conveniently reference nodes. a (1) a1 a1 (1.1) b c (1.2) b1 c1 b2 c2 b3 b1 b2 b3 c1 c2 (1.1.1) d (a) NoK tree d1 d2 d3 (b) XML tree d1 d2 d3 (c) Matchings Figure 3: NoK Pattern Tree Pattern Matching against an XML tree Figure 3(b) shows an example XML tree, where each t i represents the i-th occurrence of tag t. The resulting tree is shown in Figure 3(c). The result of the match can be thought of as a special tree structure that conforms to the NoK pattern tree. For instance, all XML tree nodes (b1, b2, and b3) that are matched with pattern tree node 1.1 are grouped together, indicated by the box around the nodes. By grouping such nodes together, it is efficient to retrieve all nodes that are matched with a pattern tree node, given its Dewey ID. The tree edges in Figure 3(c) indicate which pairs of nodes satisfy the structural relationships specified

10 by the NoK pattern tree. If x i has an edge to y m and, x i+1 has an edge to y m+k, this implies that x i can pair any of y m up to y m+k 1 to form a match to the edge (x, y) in the NoK pattern tree. For instance, the tree edges between the b s and d s in Figure 3(c) indicate that b1 has no children in the match, b2 has two children d1 and d2, and b3 has one child d3 in the match. The result tree is a compact representation of all mappings f from NoK Pattern Tree nodes to XML tree nodes (recall the semantics definition in Section 3.1). To derive a mapping from the result tree, simply choose one node from each array associated with each Dewey ID, and check whether any pair of nodes conform to the structural relationship specified by the NoK pattern tree. Conceptually, the above resulting tree can be defined as an instance of an abstract data type, which we term NestedList. The NestedList abstract data type is a more restricted form and has some special features than the general nested list defined in languages such as Lisp. A NestedList can be constructed from a BlossomTree (or specially NoK Pattern Tree), where each returning node is given a Dewey ID. For instance, the initial NestedList constructed from the NoK Pattern Tree in Figure 3(a) is (a(b(d))(c)), where each ( indicates a nesting, each ) indicates a unnesting, and the characters (actually Dewey ID s of these characters) are placeholders for the matchings result to these characters. It is well-known that any tree can be represented by such a nested list form [13]. When the NoK pattern tree is matched with XML tree nodes, the placeholders are filled out with the actual nodes. To represent the grouping of multiple matchings with the same Dewey ID, a new notation [] is introduced. For example, the final resulting tree in Figure 3(c) is represented by the NestedList shown in Figure 4. (a1,[(b1,()),(b2,[(d1),(d2)]),(b3,(d3))],[(c1),(c2)]) b subtrees c subtrees Figure 4: An example NestedList In this notation, [] represents the grouping of subtrees under the same parent (e.g., grouping b1, b2, and b3 subtrees, but not d3 with d1 and d2), and () represents nesting. Note that an empty sequence () is introduced to b1 s child in order to conform to the structure of NoK pattern tree. Definition 2 (NestedList) The NestedList abstract data type is a nested list representation of an ordered tree structure that is leveraged by the grouping notation []. Note that the NestedList abstract data type is conceptual; one can use any data structure to implement it in order to guarantee good performance for certain operations on it. In Section 4, we shall introduce the physical data structures for NestedList to support efficient operations. 3.3 Operators on NestedList In this subsection, we define logical operators on NestedList. These operators are very similar to relational operators, but working on a sequence of nested tuples instead of a set of flat tuples. Instead of taking column names or positions as parameters in the flat tuple case, the operators on NestedList take Dewey ID s as parameter. As shown in Figure 2, the signature of each of these operations is from a NestedList (in the case of Join, two NestedList s) to a NestedList. The semantics of each operator is as follows: Projection (π ID ): Unnest the NestedList according to the Dewey ID to a list of subtrees. The result is the concatenation of all the roots of the subtrees. For example, π 1.1 (t) = [b1,b2,b3], where t is the NestedList in Figure 4. Selection (σ ϕ(id) ): First project on the Dewey ID, then evaluate the predicate ϕ on the projected list. If the predicate returns false, remove the corresponding item from the NestedList. If the resulting NestedList is not a valid match anymore (e.g., a mandatory node matches to empty sequence after the removal), return empty sequence; otherwise return the modified NestedList. For example, σ position(1.1)=2 = [b2], where position is the list position function in path expression (e.g., //book[2]).

11 Join ( ϕ(id1,id 2)): The join of two NestedList s is a combination of projection and selection. It first projects the Dewey ID s on the corresponding two NestedList s, and then apply join predicate ϕ on the projected lists. If the predicate ϕ returns true, the two NestedList s are combined into a single NestedList (see below for details); otherwise, return empty sequence. This semantics can be easily extended to a sequence of NestedList s: simply concatenate the results on each NestedList in order. The semantics for projection and selection are quite straightforward, however, the join semantics needs more elaboration. The problem is how to construct the resulting NestedList from two NestedList s. Consider the BlossomTree in Figure 1. Since matching to the whole BlossomTree should contain all the returning nodes (those with variable bindings), we should assign a Dewey ID to each returning node before decomposing it into interconnected NoK pattern trees. Since there are two root nodes in the BlossomTree, we introduce an artificial (super-)root that is the parent of both root nodes. The Dewey ID s can be assigned according to any ordering of the roots. Suppose that $b1 is assigned 1.1, $b2 is assigned 1.2, $aut1 is assigned 1.1.1, $t1 is assigned 1.1.2, $t2 is assigned 1.2.1, and $aut2 is assigned The initial NestedList for each of the NoK pattern tree is of the form: (($book1,($aut1),($t1)),($book2,($t2),($aut2))) Each NoK pattern tree outputs a sequence of NestedList s that conform to the above format, although some results contain placeholders. When two sequences of NestedList s are joined, the placeholders are filled out in the result. Example 4 Consider the BlossomTree in Figure 1 and the XML document in Example 2 as input, and suppose the execution plan is shown in Figure 5, where NoK 1 and NoK 2 are both (book,(author,title)) with different variable bindings. ϕ NoK 1 NoK 2 doc(bib.xml) doc(bib.xml) Figure 5: A query plan, where the join predicate ϕ = ( ) deep-equal(1.1.1, 1.2.2) The result of NoK 1 is of the form (assuming b i represent the i-th occurrence of book, a i represent the i-th occurrence of author, and t i represent the i-th occurrence of title): [((b1,(),(t1)), ((),())), ((b2,(a1),(t2)), ((),())), ((b3,(),(t3)), ((),())), ((b4,(a2),(t4)), ((),()))] Each line above is a matching against NoK 1, and the empty sequences are placeholders. The result of NoK 2 is of the same form, but with placeholders in the position of NoK 1 : [(((),()), ((b1,(),(t1)))), (((),()), (b2,(a1),(t2))), (((),()), (b3,(),(t3))), (((),()), (b4,(a2),(t4)))] The join operator takes the two sequences of NestedList s and the join predicate, and evaluates the predicate for each combination of the NestedList s from the two sequences. The final result is:

12 [((b1,(),(t1)),((b3,(),(t3)))), ((b2,(a1),(t2)),(b4,(a2),(t4)))] This is the same result returned by Example 2. Given a NoK Pattern Tree and an XML tree, a sequential scan of the XML tree against the blossom tree is defined to be the process of scanning the XML tree node in document order. At each node, a NoK pattern matching operator is applied. The results of all matchings are concatenated into a sequence of NestedList s. 4 Physical Operators In this section, we introduce how the operators defined in the previous section are implemented. Since selection and projection are rather straightforward, we concentrate on the NoK Pattern Tree pattern matching operator and various join operators. 4.1 NoK Pattern Tree Tree Pattern Matching A NoK pattern tree could have multiple returning nodes, and optional edges ( l annotated edges). As introduced in Section 3, we should give each returning node a Dewey ID indicating their structural relationships. This is achieved by extracting all returning nodes to form a returning tree. In the returning tree, two nodes are connected by an edge if and only if they are the closest ancestor-descendant in the NoK tree. For example, in the NoK pattern tree, a has a child b and b has a child c, and only a and c are returning nodes, in the returning tree a and b are directly connected by an edge. Before introducing the NoK Pattern Tree pattern matching algorithm, we first introduce the data structure that implements NestedList abstract data type. Given a Dewey ID, the data structure should support the following operations: Unnesting a NestedList is one of the most common operations. The data structure should support efficient navigation (e.g., through pointer) between nesting levels. Retrieving child by position index is another common operation. This operation usually follows unnesting and retrieve the i-th item in a sequence. For example, projection on Dewey ID can be translated into the operations unnesting on the root, then retrieving the fifth child, then unnesting again, and last retrieving the third child. Inserting a child to a list is frequently used in constructing the NestedList. Since the NestedList is dynamically constructed from depth-first traversal of the XML tree, it requires dynamically inserting an item into the sequence. Based on these requirements, we design the data structure as shown in Figure 6. Each returning node is associated with a linked list (a is associated with list a1, etc.), where new element is inserted at the end. We say that this linked list is connected by sibling pointers (indicated by arrows in Figure 6). If a returning node has k children in the BlossomTree, it includes an array of size k, containing pointers to the list associated with its children. We call these pointers child pointers (indicated by arrows in Figure 6). In addition to these pointers, a pointer to the last child, a pointer to its parent, a pointer to the next node in document order are also added to facilitate efficient navigation. To avoid clogging the graph, we omit them in Figure 6, but readers should keep in mind that they are associated with each node in the data structure. Given a returning tree, the resulting NestedList can be constructed as shown in Algorithm 2. Parameters p and q of the algorithm are NoK pattern tree nodes that are both returning nodes and p is a child of q. x is an XML tree node. Before invoking the algorithm, a prerequisite is that p matches with x. This algorithm is an extension of the NoK pattern matching algorithm [22]: instead of returning only one list, it keeps a list of resulting nodes on each returning node and maintains the hierarchical structure between them. It traverses the XML tree in depth-first order (through First-Child and Following-Sibling in lines 8 and 19). During the traversal, it also recursively matches the NoK pattern tree node and builds up a list of matchings for each returning node. Each recursive call (lines 13 and 14) represents a matching of a NoK subtree against an XML subtree. In the algorithm, the sibling pointers and children pointers are established in lines 2 to 5. The frontier children mentioned in line 7 are the children of a NoK pattern tree

13 a1 b1 b2 b3 c1 c2 d1 d2 d3 Figure 6: The concrete data structure for NestedList Algorithm 2 NoK Pattern Tree Pattern Matching NoK-Blossom(p, q : NoKNode, x : XMLTreeNode) 1 if p is a returning node 2 then append x to the sibling list associated with p; 3 t last element of q; 4 if t has no child pointer 5 then add a child pointer from t to x; 6 else add x to the last child of t; 7 S frontier children of p; 8 u First-Child(x); 9 repeat 10 for each s S that matches u with both tag name and value constraints 11 do 12 if p is a returning node 13 then b NoK-Blossom(s, p, u); 14 else b NoK-Blossom(s, q, u); 15 if b = true or s is l annotated 16 then S S \ {s}; 17 delete s and its incident arcs from the pattern tree; 18 insert new frontiers caused by deleting s; 19 u Following-Sibling(u); 20 until u = nil or S = 21 if S and p is a returning node 22 then remove all matches to p associated with its parent 23 return false; 24 return true;

14 node that are candidate matches for the incoming XML tree node [22]. If after matching the whole subtree rooted at p, there are still unmatched pattern tree nodes (line 21), the partial results built up during the recursive call are removed (line 22). Based on this data structure and algorithm, we shall prove that when we project on any Dewey ID, the resulting list associated with that DeweyID is in document order. We call any operator that has this property order preserving. This property is important for developing the join algorithms introduced in Section 4.2. Theorem 1 (projection is order-preserving) Suppose the sequential scan of an XML tree against a NoK Pattern Tree is s. Given any Dewey ID on the NoK pattern tree, the projection that simply returning the resulting list associated with that ID is in document order. Proof The proof has two steps: first, we prove that for any NestedList l that is the result of a non-recursive NoK pattern matching (i.e., no // connecting the NoK with the root), the projection on l is in document order. Secondly, we prove that for recursive matching, the list is still in document order. The first step can be proved by the construction of the NestedList. Since the construction is by traversing the XML tree in document order, and a node is inserted into the result as soon as it is first discovered. Therefore, if two nodes are matched with the same pattern tree nodes, the node with lower document order must be inserted before the one with higher document order. Therefore, the projection on the NestedList on any Dewey ID is in document order. The second step is also straightforward. Since a NoK pattern tree can have multiple matches during a recursive traversal of a subtree (in the case of recursive document), these matches can be interleaved together among the lists associated with each pattern tree node. By child pointers, different matches can be separated. When the NoK is recursively matched against a subtree, each matching node (in document order) is inserted into the list associated with its pattern tree node. These matches must in document order because they are inserted at the first time discover it in depth-first traversal (document order). 4.2 Merging NoK operators and pipelined joins As we have seen in Figure 5, there are situations where multiple NoK operators operate on the same XML file and the results are then joined together. There are two situations that an optimizer can exploit: (1) if both NoK operators use a sequential scan access method (e.g., when no tag-name or value-based indexes are available), we can save I/O by merging multiple NoK operators into one combined operator and using one scan only. (2) If the join connecting these two NoK operators has some special properties (e.g., the input is ordered by document order), we can use pipelined joins (merge-join style) to save the I/O for intermediate results. In this subsection, we shall present the two optimization techniques. The merging of two NoK pattern trees proceeds as follows. For each NoK pattern tree, maintain a set of pattern tree nodes that are eligible for matching the incoming XML tree nodes (following the same notation in [22], we call these pattern tree nodes frontier nodes). When a new XML tree node arrives, it is matched to both sets of frontier nodes. This is in the same way that multiple Deterministic Finite Automata (DFA) are merged to a Nondeterministic Finite Automaton (NFA). This technique can be extended to merge arbitrary number of NoK pattern trees. Given two sequence of NestedList s resulting from NoK operators, the join on DeweyID d 1 and d 2 is performed on the projections of d 1 and d 2 on the two NoK s. By Theorem 1, the results of the projections will preserve document order. Therefore, we can use any structural join algorithms (e.g., TwigStack [7]) that rely on the inputs being in document order. However, some joins are not order-preserving. For example, the structural join on << operator is not order preserving. We shall give a counter example showing why it is not. Example 5 Consider the example shown in Figure 1. A sub-query execution plan could be shown in Figure 7. Given the XML document as shown in Example 2, the resulting NestedList s are as follows: [((b1,(),(t1)), (b2,(a1),(t2))), ((b1,(),(t1)), (b3,(),(t3))), ((b1,(),(t1)), (b4,(a2),(t4))), ((b2,(a1),(t2)), (b3,(),(t3))),

15 ϕ NoK 1 NoK 2 Figure 7: A Query Plan, where the join condition ϕ = $book1 << $book2 ((b2,(a1),(t2)), (b4,(a2),(t4))), ((b3,(),(t3)), (b4,(a2),(t4)))] The projection on the Dewey ID 1.2 over the above sequence of NestedList s is [b2,b3,b4,b3,b4,b4], which is not in document order. In fact, it is easy to show that some other structural joins, such as preceding, following, isnot, and all value-based joins are not document-order-preserving. Nevertheless, the most commonly used //-joins are indeed document-order-preserving on non-recursive documents, which gives the nice property of being able to compose //-joins in pipelined fashion. Theorem 2 (order-preserving on non-recursive documents) If the NoK pattern trees connected by //-join operators are assigned Dewey ID globally (i.e., according to the BlossomTree, rather than individual NoK pattern tree), the result of //-joins preserves document order on non-recursive documents. Proof From Theorem 1, we know that the projections on each of the NoK pattern trees are order-preserving. We need to prove two cases: first, suppose l is a NestedList that matches the first NoK, and m 1, m 2 are the two consecutive NestedList s that join with l, since projection on NoK is order-preserving, m 1 and m 2 must be in document order, therefore, the join of l and m 1, m 2 must also be in document order. Secondly, suppose l 1 and l 2 are two consecutive NestedList s that match the first NoK, and m 1 and m 2 are two consecutive NestedList s that matches the second NoK. Since projection on both NoK s are documentpreserving, i.e., l 1 < l 2 and m 1 < m 2 in document order, l 1 and l 2 cannot both join with m 1 and m 2, i.e., the situation that (l 1, m 1 ), (l 1, m 2 ), (l 2, m 1 ), and (l 2, m 2 ) or (l 1, m 2 ) and (l 2, m 1 ) will never happen. This is because (1) the document is non-recursive, so l i (or m i ) cannot match to the same pattern tree node, and (2) the tree structure prevents a node u being a descendant of a node v and v s sibling at the same time. Therefore, the projection of the result of the join is still in document order. For //-join on non-recursive documents, we propose a pipelined join algorithm. As in the relational pipelined joins, the pipelined //-join and NoK pattern matching are defined using iterators: each time a getnext() function is called, a matching of the NoK pattern tree or join result is returned. The getnext() function of NoK pattern matching is based on Algorithm 2. Here, we only introduce the algorithm for the //-join operator. GetNext(M, N : NoKTree; k, l : ID) 1 m M.GetNext(); 2 n N.GetNext(); 3 if m = null n = null 4 then return null; 5 if Desc-or-Self(proj(m, k), proj(n, l)) = true 6 then return fill(m,n); 7 else if m << n 8 then m M.GetNext(); 9 else n N.GetNext(); 10 goto 3;

16 Of these parameters, M and N are two pipelined NoK pattern matching operators, and k and l are two Dewey IDs on which the join is performed. This algorithm is in a merge-join fashion. In this algorithm, the //-join GetNext function calls the NoK pattern matching operator s GetNext function (line 1 and 2). If one of them reaches the end (returning null), the join finishes as well. If m is not the ancestor of n or n itself, depending on which one has lower document order, another GetNext function call is invoked to the NoK pattern matching operator with lower document order (lines 7 to 9). If there are some nodes in the projection of m on DeweyID k that matches the projection of n on l, the resulting NestedList s are constructed by filling out the placeholders in one of the NestedList s and return it as a result. The pipelined //-joins can save intermediate results being stored on secondary storage. However, if the input XML document is recursive, the order preserving property will not hold. Even if we modify the pipelined algorithm to cache more results to guarantee no results will be lost, the memory requirement for caching the intermediate results or candidates thereof could be large [3]. Therefore, in order to tradeoff between memory requirement and secondary storage access, the optimizer needs to have the knowledge of how recursive the input XML document is. For non-recursive or low degree of recursive document, the pipelined join algorithm (or its modification with caching capability) can be used. For highly recursive document, we propose a nested loop style join algorithm. 4.3 Nested Loop Joins For those joins that are not order-preserving (e.g., <<-join, following-join, and isnot-join), a nested-loop join is in order. Since this join destroys the document order, we cannot compose other pipelined joins on top of the nested-loop join. For the joins that potentially return all pairs of combinations from two lists (essentially a Cartesian product), a straightforward naive nested loop join algorithm is unavoidable, so we omit it here. For the nested loop joins on //-relationships, a special optimization technique can be applied. The basic idea is to exploit the fact if node u is an ancestor of v, then u s following sibling is not an ancestor of v. In the nested loop join algorithm, if NoK pattern tree T 1 connects another NoK pattern tree T 2 via a //-edge, T 1 is always on the left (outer) and T 2 is always on the right (inner). When the join operator get a nest matching from T 1, it will piggyback the range (p 1, p 2 ), where p 1 is the position of the XML tree node that matches with the joining node, and p 2 is the position of its nest sibling. The inner NoK T 2 can then only scan within this (p 1, p 2 ) range. We call this bounded nested loop join (BNLJ) algorithm. 5 Experimental Evaluation categories recursive? data set size #nodes avg. dep. max dep. tags tree Y d1 69 MB 1, 212, MB Synthetic d2 17 MB 403, MB N d3 30 MB 620, MB Real Y d4 82 MB 2, 437, MB N d5 133 MB 3, 332, MB Table 1: Categories of testing data sets The framework based on BlossomTree can be thought of as an algebra for the FLWOR expressions. As we have seen in the previous sections, a BlossomTree can be broken down into logical operators, each of which has different physical implementations (e.g., a structural join can be implemented with pipelined joins, nested loop joins, or TwigStack joins [7]). Each operator may have different performance on different data sets and queries. An optimizer is responsible for choosing an appropriate physical operator based on its knowledge of the system environment. Therefore, the purpose of our experiments is to find out in which situation a particular physical operator performs better than the others. In this paper, we focus on the join operator implementations. To assess the effectiveness and efficiency of the proposed join operators, we conducted extensive experiments and compared them with the performance of existing systems or prototypes. Both the data and

XML Query Processing and Optimization

XML Query Processing and Optimization XML Query Processing and Optimization Ning Zhang School of Computer Science University of Waterloo nzhang@uwaterloo.ca Abstract. In this paper, I summarize my research on optimizing XML queries. This work

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today XML Query Processing CPS 216 Advanced Database Systems Announcements (March 31) 2 Course project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore Announcements (March 31) 2 XML Query Processing PS 216 Advanced Database Systems ourse project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

Navigation- vs. Index-Based XML Multi-Query Processing

Navigation- vs. Index-Based XML Multi-Query Processing Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Supporting Positional Predicates in Efficient XPath Axis Evaluation for DOM Data Structures

Supporting Positional Predicates in Efficient XPath Axis Evaluation for DOM Data Structures Supporting Positional Predicates in Efficient XPath Axis Evaluation for DOM Data Structures Torsten Grust Jan Hidders Philippe Michiels Roel Vercammen 1 July 7, 2004 Maurice Van Keulen 1 Philippe Michiels

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014 Outline Gerênciade Dados daweb -DCC922 - XML Query Processing ( Apresentação basedaem material do livro-texto [Abiteboul et al., 2012]) 2014 Motivation Deep-first Tree Traversal Naïve Page-based Storage

More information

Index-Trees for Descendant Tree Queries on XML documents

Index-Trees for Descendant Tree Queries on XML documents Index-Trees for Descendant Tree Queries on XML documents (long version) Jérémy arbay University of Waterloo, School of Computer Science, 200 University Ave West, Waterloo, Ontario, Canada, N2L 3G1 Phone

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

University of Waterloo Midterm Examination Solution

University of Waterloo Midterm Examination Solution University of Waterloo Midterm Examination Solution Winter, 2011 1. (6 total marks) The diagram below shows an extensible hash table with four hash buckets. Each number x in the buckets represents an entry

More information

yqgm_std_rules documentation (Version 1)

yqgm_std_rules documentation (Version 1) yqgm_std_rules documentation (Version 1) Feng Shao Warren Wong Tony Novak Computer Science Department Cornell University Copyright (C) 2003-2005 Cornell University. All Rights Reserved. 1. Introduction

More information

SFilter: A Simple and Scalable Filter for XML Streams

SFilter: A Simple and Scalable Filter for XML Streams SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Twig Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li, Junichi Tatemura Wang-Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan NEC Laboratories

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views 1. (4 points) Of the following statements, identify all that hold about architecture. A. DoDAF specifies a number of views to capture different aspects of a system being modeled Solution: A is true: B.

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

W4231: Analysis of Algorithms

W4231: Analysis of Algorithms W4231: Analysis of Algorithms 10/21/1999 Definitions for graphs Breadth First Search and Depth First Search Topological Sort. Graphs AgraphG is given by a set of vertices V and a set of edges E. Normally

More information

Query Containment for XML-QL

Query Containment for XML-QL Query Containment for XML-QL Deepak Jindal, Sambavi Muthukrishnan, Omer Zaki {jindal, sambavi, ozaki}@cs.wisc.edu University of Wisconsin-Madison April 28, 2000 Abstract The ability to answer an incoming

More information

15 150: Principles of Functional Programming Some Notes on Regular Expression Matching

15 150: Principles of Functional Programming Some Notes on Regular Expression Matching 15 150: Principles of Functional Programming Some Notes on Regular Expression Matching Michael Erdmann Spring 2018 1 Introduction Regular expression matching is a very useful technique for describing commonly-occurring

More information

XML and Relational Databases

XML and Relational Databases XML and Relational Databases Leonidas Fegaras University of Texas at Arlington Web Data Management and XML L8: XML and Relational Databases 1 Two Approaches XML Publishing treats existing relational data

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

On Label Stream Partition for Efficient Holistic Twig Join

On Label Stream Partition for Efficient Holistic Twig Join On Label Stream Partition for Efficient Holistic Twig Join Bo Chen 1, Tok Wang Ling 1,M.TamerÖzsu2, and Zhenzhou Zhu 1 1 School of Computing, National University of Singapore {chenbo, lingtw, zhuzhenz}@comp.nus.edu.sg

More information

Compact Access Control Labeling for Efficient Secure XML Query Evaluation

Compact Access Control Labeling for Efficient Secure XML Query Evaluation Compact Access Control Labeling for Efficient Secure XML Query Evaluation Huaxin Zhang Ning Zhang Kenneth Salem Donghui Zhuo University of Waterloo, {h7zhang,nzhang,kmsalem,dhzhuo}@cs.uwaterloo.ca Abstract

More information

Subscription Aggregation for Scalability and Efficiency in XPath/XML-Based Publish/Subscribe Systems

Subscription Aggregation for Scalability and Efficiency in XPath/XML-Based Publish/Subscribe Systems Subscription Aggregation for Scalability and Efficiency in XPath/XML-Based Publish/Subscribe Systems Abdulbaset Gaddah, Thomas Kunz Department of Systems and Computer Engineering Carleton University, 1125

More information

FIX: Feature-based Indexing Technique for XML Documents

FIX: Feature-based Indexing Technique for XML Documents FIX: Feature-based Indexing Technique for XML Documents Ning Zhang M. Tamer Özsu Ihab F. Ilyas Ashraf Aboulnaga David R. Cheriton School of Computer Science University of Waterloo {nzhang,tozsu,ilyas,ashraf}@uwaterloo.ca

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 4 Graphs Definitions Traversals Adam Smith 9/8/10 Exercise How can you simulate an array with two unbounded stacks and a small amount of memory? (Hint: think of a

More information

CMSC th Lecture: Graph Theory: Trees.

CMSC th Lecture: Graph Theory: Trees. CMSC 27100 26th Lecture: Graph Theory: Trees. Lecturer: Janos Simon December 2, 2018 1 Trees Definition 1. A tree is an acyclic connected graph. Trees have many nice properties. Theorem 2. The following

More information

Optimizing distributed XML queries through localization and pruning

Optimizing distributed XML queries through localization and pruning Optimizing distributed XML queries through localization and pruning Patrick Kling pkling@cs.uwaterloo.ca M. Tamer Özsu tozsu@cs.uwaterloo.ca University of Waterloo David R. Cheriton School of Computer

More information

UNIT-IV BASIC BEHAVIORAL MODELING-I

UNIT-IV BASIC BEHAVIORAL MODELING-I UNIT-IV BASIC BEHAVIORAL MODELING-I CONTENTS 1. Interactions Terms and Concepts Modeling Techniques 2. Interaction Diagrams Terms and Concepts Modeling Techniques Interactions: Terms and Concepts: An interaction

More information

Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases Yangjun Chen Department of Applied Computer Science University of Winnipeg Winnipeg, Manitoba, Canada R3B 2E9 y.chen@uwinnipeg.ca

More information

Lecture 1. 1 Notation

Lecture 1. 1 Notation Lecture 1 (The material on mathematical logic is covered in the textbook starting with Chapter 5; however, for the first few lectures, I will be providing some required background topics and will not be

More information

Functional programming with Common Lisp

Functional programming with Common Lisp Functional programming with Common Lisp Dr. C. Constantinides Department of Computer Science and Software Engineering Concordia University Montreal, Canada August 11, 2016 1 / 81 Expressions and functions

More information

Indexing Keys in Hierarchical Data

Indexing Keys in Hierarchical Data University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan

More information

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018 XPath and XQuery Introduction to Databases CompSci 316 Fall 2018 2 Announcements (Tue. Oct. 23) Homework #3 due in two weeks Project milestone #1 feedback : we are a bit behind, but will definitely release

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

Algebraic XQuery Decorrelation with Order Sensitive Operations

Algebraic XQuery Decorrelation with Order Sensitive Operations Worcester Polytechnic Institute Digital WPI Computer Science Faculty Publications Department of Computer Science 2-2005 Algebraic XQuery Decorrelation with Order Sensitive Operations Song Wang Worcester

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

The NEXT Framework for Logical XQuery Optimization

The NEXT Framework for Logical XQuery Optimization The NEXT Framework for Logical XQuery Optimization Alin Deutsch Yannis Papakonstantinou Yu Xu University of California, San Diego deutsch,yannis,yxu @cs.ucsd.edu Abstract Classical logical optimization

More information

Ecient XPath Axis Evaluation for DOM Data Structures

Ecient XPath Axis Evaluation for DOM Data Structures Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

Query Processing for High-Volume XML Message Brokering

Query Processing for High-Volume XML Message Brokering Query Processing for High-Volume XML Message Brokering Yanlei Diao University of California, Berkeley diaoyl@cs.berkeley.edu Michael Franklin University of California, Berkeley franklin@cs.berkeley.edu

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Tree-Pattern Queries on a Lightweight XML Processor

Tree-Pattern Queries on a Lightweight XML Processor Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant IIS 0339032, UC Micro, and Lotus Interworks Outline

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Introduction to XQuery. Overview. Basic Principles. .. Fall 2007 CSC 560: Management of XML Data Alexander Dekhtyar..

Introduction to XQuery. Overview. Basic Principles. .. Fall 2007 CSC 560: Management of XML Data Alexander Dekhtyar.. .. Fall 2007 CSC 560: Management of XML Data Alexander Dekhtyar.. Overview Introduction to XQuery XQuery is an XML query language. It became a World Wide Web Consortium Recommendation in January 2007,

More information

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1) Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Query Processing and Optimization in Native XML Databases

Query Processing and Optimization in Native XML Databases Query Processing and Optimization in Native XML Databases Ning Zhang David R. Cheriton School of Computer Science University of Waterloo nzhang@uwaterloo.ca Technical Report CS-2006-29 August 2006 Abstract

More information

12 Abstract Data Types

12 Abstract Data Types 12 Abstract Data Types 12.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define the concept of an abstract data type (ADT). Define

More information

Web Data Management. XML query evaluation. Philippe Rigaux CNAM Paris & INRIA Saclay

Web Data Management. XML query evaluation. Philippe Rigaux CNAM Paris & INRIA Saclay http://webdam.inria.fr/ Web Data Management XML query evaluation Serge Abiteboul INRIA Saclay & ENS Cachan Ioana Manolescu INRIA Saclay & Paris-Sud University Philippe Rigaux CNAM Paris & INRIA Saclay

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Index-Driven XQuery Processing in the exist XML Database

Index-Driven XQuery Processing in the exist XML Database Index-Driven XQuery Processing in the exist XML Database Wolfgang Meier wolfgang@exist-db.org The exist Project XML Prague, June 17, 2006 Outline 1 Introducing exist 2 Node Identification Schemes and Indexing

More information

XML Filtering Technologies

XML Filtering Technologies XML Filtering Technologies Introduction Data exchange between applications: use XML Messages processed by an XML Message Broker Examples Publish/subscribe systems [Altinel 00] XML message routing [Snoeren

More information

Answering Aggregate Queries Over Large RDF Graphs

Answering Aggregate Queries Over Large RDF Graphs 1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Querying Spatiotemporal Data Based on XML Twig Pattern

Querying Spatiotemporal Data Based on XML Twig Pattern Querying Spatiotemporal Data Based on XML Twig Pattern Luyi Bai Yin Li Jiemin Liu* College of Information Science and Engineering Northeastern University Shenyang 110819 China * Corresponding author Tel:

More information

An Algebraic Approach to XQuery View Maintenance

An Algebraic Approach to XQuery View Maintenance Translation An Algebraic Approach to XQuery View Maintenance J. Nathan Foster (Penn) Ravi Konuru (IBM) Jérôme Siméon (IBM) Lionel Villard (IBM) Source Source Query View View PLAN-X 08 Quick! 1 + 2 + +

More information

Course: The XPath Language

Course: The XPath Language 1 / 30 Course: The XPath Language Pierre Genevès CNRS University of Grenoble Alpes, 2017 2018 2 / 30 Why XPath? Search, selection and extraction of information from XML documents are essential for any

More information

<=chapter>... XML. book. allauthors (1,5:60,2) title (1,2:4,2) XML author author author. <=author> jane. Origins (1,1:150,1) (1,61:63,2) (1,64:93,2)

<=chapter>... XML. book. allauthors (1,5:60,2) title (1,2:4,2) XML author author author. <=author> jane. Origins (1,1:150,1) (1,61:63,2) (1,64:93,2) Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno Columbia University nicolas@cscolumbiaedu Nick Koudas AT&T Labs Research koudas@researchattcom Divesh Srivastava AT&T Labs Research divesh@researchattcom

More information

Structural Joins, Twig Joins and Path Stack

Structural Joins, Twig Joins and Path Stack Structural Joins, Twig Joins and Path Stack Seminar: XML & Datenbanken Student: Irina ANDREI Konstanz, 11.07.2006 Outline 1. Structural Joins Tree-Merge Stack-Tree 2. Path-Join Algorithms PathStack PathMPMJ

More information

Lecture 2 Finite Automata

Lecture 2 Finite Automata Lecture 2 Finite Automata August 31, 2007 This lecture is intended as a kind of road map to Chapter 1 of the text just the informal examples that I ll present to motivate the ideas. 1 Expressions without

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

CS122 Lecture 15 Winter Term,

CS122 Lecture 15 Winter Term, CS122 Lecture 15 Winter Term, 2014-2015 2 Index Op)miza)ons So far, only discussed implementing relational algebra operations to directly access heap Biles Indexes present an alternate access path for

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Lecture Notes on Binary Decision Diagrams

Lecture Notes on Binary Decision Diagrams Lecture Notes on Binary Decision Diagrams 15-122: Principles of Imperative Computation William Lovas Notes by Frank Pfenning Lecture 25 April 21, 2011 1 Introduction In this lecture we revisit the important

More information

SDMX self-learning package No. 5 Student book. Metadata Structure Definition

SDMX self-learning package No. 5 Student book. Metadata Structure Definition No. 5 Student book Metadata Structure Definition Produced by Eurostat, Directorate B: Statistical Methodologies and Tools Unit B-5: Statistical Information Technologies Last update of content December

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

XML has become the de facto standard for data exchange.

XML has become the de facto standard for data exchange. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. 12, DECEMBER 2008 1627 Scalable Filtering of Multiple Generalized-Tree-Pattern Queries over XML Streams Songting Chen, Hua-Gang Li, Jun

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Lecture 32. No computer use today. Reminders: Homework 11 is due today. Project 6 is due next Friday. Questions?

Lecture 32. No computer use today. Reminders: Homework 11 is due today. Project 6 is due next Friday. Questions? Lecture 32 No computer use today. Reminders: Homework 11 is due today. Project 6 is due next Friday. Questions? Friday, April 1 CS 215 Fundamentals of Programming II - Lecture 32 1 Outline Introduction

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

XML Storage and Indexing

XML Storage and Indexing XML Storage and Indexing Web Data Management and Distribution Serge Abiteboul Ioana Manolescu Philippe Rigaux Marie-Christine Rousset Pierre Senellart Web Data Management and Distribution http://webdam.inria.fr/textbook

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different. Chapter 6 Trees In a hash table, the items are not stored in any particular order in the table. This is fine for implementing Sets and Maps, since for those abstract data types, the only thing that matters

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Query Processing. Solutions to Practice Exercises Query:

Query Processing. Solutions to Practice Exercises Query: C H A P T E R 1 3 Query Processing Solutions to Practice Exercises 13.1 Query: Π T.branch name ((Π branch name, assets (ρ T (branch))) T.assets>S.assets (Π assets (σ (branch city = Brooklyn )(ρ S (branch)))))

More information

Basic Properties The Definition of Catalan Numbers

Basic Properties The Definition of Catalan Numbers 1 Basic Properties 1.1. The Definition of Catalan Numbers There are many equivalent ways to define Catalan numbers. In fact, the main focus of this monograph is the myriad combinatorial interpretations

More information

Lecture 17: Solid Modeling.... a cubit on the one side, and a cubit on the other side Exodus 26:13

Lecture 17: Solid Modeling.... a cubit on the one side, and a cubit on the other side Exodus 26:13 Lecture 17: Solid Modeling... a cubit on the one side, and a cubit on the other side Exodus 26:13 Who is on the LORD's side? Exodus 32:26 1. Solid Representations A solid is a 3-dimensional shape with

More information