Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases

Size: px
Start display at page:

Download "Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases"

Transcription

1 Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Boon-Siew Seah 1, Klarinda G. Widjanarko 1, Sourav S. Bhowmick 1, Byron Choi 1 Erwin Leonardi 1, 1 School of Computer Engineering, Nanyang Technological University, Singapore Singapore-MIT Alliance, Nanyang Technological University, Singapore { ,klarinda,assourav,kkchoi,lerwin}@ntu.edu.sg May 3, 007 Abstract In this paper, we present a novel ordered XPATH evaluation in tree-unaware RDBMS. The novelties of our approach lies in the followings. (a) We propose a novel XML storage scheme which comprises only leaf nodes, their corresponding data values, order encodings and their root-to-leaf paths. (b) We propose an algorithm for mapping ordered XPATH queries into SQL queries over the storage scheme. (c) We propose an optimization technique that enforces all mapped SQL queries to be evaluated in a left-to-right join order. By employing these techniques, we show, through a comprehensive experiment, that our approach not only scales well but also performs better than some representative tree-unaware approaches on more than 65% of our benchmark queries with the highest observed gain factor being In addition, our approach reduces significantly the performance gap between tree-aware and tree-unaware approaches and even outperforms a state-of-the-art tree-aware approach for certain benchmark queries. 1 Introduction With the rapid emergence of XML as the de facto standard for exchanging data on the Web, the interest in efficiently querying growing XML data sources has increased. One of the salient features of XML data is that it is order-sensitive. Supporting an ordered data model of XML as well as ordered XML queries, ordered XPATH axes and position predicates in particular, have been the key to successful XML applications, e.g., [1]. In this paper, we present a novel approach to efficiently evaluate ordered XPATH queries in a relational database. Current approaches for evaluating XPATH expressions in relational databases can be arguably categorized into two representative types. They either resort to encoding XML data as tables and translating XML queries into relational queries [3, 4, 5, 6, 10, 11, 15] or store XML data as a rich data type and process XML queries by enhancing the relational infrastructure [9]. The former approach can further be classified into two representative types. Firstly, a host of work on processing XPATH queries on tree-unaware relational databases has been reported [5, 10, 11] these approaches do not modify the database kernels. Secondly, there have been several efforts on enabling relational databases to be tree-aware by invading the database kernel to implement XML support [3, 4, 6, 15]. It has been shown that the latter approaches appear scalable and, in particular, perform orders of magnitude faster than some tree-unaware approaches [3, 6]. In this paper, we focus on supporting ordered XPATH evaluation in a tree-unaware relational environment. There is a considerable benefit in such an approach with respect to portability and ease of implementation on top of an off-the-shelf RDBMS. Although a diverse set of strategies for evaluating XML queries in tree-unaware relational environment have been recently proposed, few have undertaken a comprehensive study on evaluating ordered XPATH queries. Tatarinov et al. [1] is the first to show that it is indeed possible to support ordered

2 XPATH queries in relational databases. However, this approach does not scale well with large XML documents. In fact, as we shall show in Section 7, the GLOBAL-ORDER approach in [1] failed to return results for 0% of our benchmark queries on 1GB dataset in 60 minutes. Furthermore, this approach resorts to manual tuning of the relational optimizer when it failed to produce good query plans. Although such a manual tuning approach works, it is a cumbersome solution. In this paper, we address the above limitations by proposing a novel scheme for ordered XPATH query processing. Our storage strategy is built on top of SUCXENT++ [10], by extending it to support efficient processing of ordered axes and predicates. SUCXENT++ is designed primarily for query-mostly workloads. We exploit SUCXENT++ s strategy to store leaf nodes, their corresponding data values, auxiliary encodings and root-to-leaf paths. In contrast, some approaches, e.g., [6, 15], explicitly store information for all nodes of an XML document. Specifically, the followings remark the novelties of our storage scheme. (1) For each level of an XML document, we store an attribute called RValue which is an enhancement of the original RValue, proposed in [10], for processing recursive XPATH queries. () For each leaf node we store three additional attributes namely BranchOrder, DeweyOrderSum and SiblingSum. These attributes are the foundation for our ordered XPATH processing. The key features of these attributes are that they enable us (a) to compare the order between non-leaf nodes by comparing the order between their first descendant leaf nodes only; and (b) to determine the nearest common ancestor of two leaf nodes efficiently. As a result, it is not necessary to store the order information of non-leaf nodes. Furthermore, given any pair of nodes, these attributes enable us to evaluate position-based predicates efficiently. As highlighted in [1], relational optimizers may sometimes produce poor query plans for processing XPATH queries. In this paper, we undertake a novel strategy to address this issue. As opposed to manual tuning efforts, we propose an automatic approach to enforce the optimizer to replace previously generated poor plans with probably better query plans, as verified by our experiments. Unlike tree-aware schemes, our technique is non-invasive in nature. That is, it can easily be incorporated without modifying the internals of relational optimizers. Specifically, we enforce a relational optimizer to follow a left-to-right join order and enforce the relational engine to evaluate the mapped SQL queries according to the XPATH steps specified in the query. The good news is that this technique can select better plans for the majority of our benchmark queries across all benchmark datasets. As we shall see in Section 7, the performance of previously-inefficient queries in SUCXENT++ is significantly improved. The highest observed gain factor is 59. Furthermore, queries that failed to finish in 60 minutes were able to do so now, in the presence of such a join-order enforcement. This is indeed stimulating as it shows that some sophisticated internals of relational optimizers not only are irrelevant to XPATH processing but also often confuse XPATH query optimization in relational databases. Overall a joinorder-conscious SUCXENT++ significantly outperforms both GLOBAL-ORDER and SHARED-INLINING[11] in at least 65% of the benchmark queries with the highest observed gain factors being 1939 and 880, respectively. To the best of our knowledge, this is the first effort on exploiting a non-invasive automatic technique to improve query performance in the context of XPATH evaluation in relational environment. Recently, [3] showed that MONETDB is among the most efficient and scalable tree-aware relational-based XQuery processor and outperforms the current generation of XQuery systems significantly. Consequently, we investigated how our proposed technique compared to MONETDB. Our study revealed some interesting results. First, although MONETDB is and 3-74 times faster than GLOBAL-ORDER and SHARED-INLINING, respectively, for the majority of the benchmark queries, this performance gap is significantly reduced when MONETDB is compared to SUCXENT++. Our results show that not only MONETDB is now times faster than SUCXENT++ with join-order enforcement but surprisingly our approach is faster than MONETDB for 33% of benchmark queries! Additionally, MONETDB (Win3 builds) failed to shred 1GB dataset as it is vulnerable to the virtual memory fragmentation in Windows environment. Note that, this is in contrary to the results in [3] where MONETDB was built on top of Linux.6.11 operating system (8GB RAM), using a 64-bit address space, and was able to efficiently shred 11GB dataset. In summary, the main contributions of this paper are as follows. In Section 4, we describe our novel schemaoblivious relational storage scheme for XML. In Section 5, we present how ordered XPATH queries are supported in our proposed storage scheme. In Section 6, we proposed a novel left-to-right join order-based technique to

3 Document (DocId, Name) Path (PathId, PathExp) PathValue(DocId, PathId, LeafOrder, BranchOrder, BranchOrderSum, LeafValue) DocumentRValue (DocId, Level, RValue) (a) Original Schema of SUCXENT++ Document (DocId, Name) Path (PathId, PathExp) PathValue(DocId, DeweyOrderSum, PathId, BranchOrder, LeafOrder, SiblingSum, LeafValue) Attribute (DocId, LeafOrder, PathId, LeafValue) DocumentRValue (DocId, Level, RValue) (b) Modified Schema of SUCXENT++ Mod Level M RVal RVal Book Catalog 1 3 Title Chapter Chapter D1 = 0 * 1 Para Para D = 3 D3 = 4 preceding * - number representing local order of the node Di = DeweyOrderSum 1 D4 = 6 A The context node A The leaf node that represents the context node (c) XML Data Para 1 D5 = 19 Book descendant-or-self D6 = 3 Book D7 = 38 following Figure 1: SUCXENT++ schema and example of XML data improve query plan selection of relational query optimizers. To the best of our knowledge, this is the first effort on exploiting such non-invasive automatic technique to improve query performance in the context of XPATH processing in relational environment. Through an extensive experimental study in Section 7, we show that our approach significantly outperforms existing tree-unaware approaches for ordered XPATH queries. Additionally, our approach reduces significantly the performance gap between tree-aware and tree-unaware approaches and even outperform a state-of-the-art tree-aware approach for certain benchmark queries. Related Work Most of the previous tree-unaware approaches, except [1], focused on proposing efficient evaluation for children and descendant-or-self axes and positional predicates in XPATH queries. In this paper, the main focus is on the evaluation for following, preceding, following-sibling, and preceding-sibling axes as well as position-based and range predicates. All previous approaches, reported query performance on small/medium XML documents smaller than 500 MB. We investigate query performance on large synthetic and real datasets. This gives insights on the scalability of the state-of-the-art tree-unaware approaches for ordered XML processing. Compared to the tree-aware schemes [3, 4, 6, 15], our technique is tree-unaware in the sense that it can be built on top of any commercial RDBMS without modifying the database kernel. The approaches in [4, 15] do not provide a systematic and comprehensive effort for processing ordered XPATH queries. Although the scheme presented in [3, 4, 6] can support ordered axes, no comprehensive performance study has demonstrated with a variety of ordered XPATH queries. Furthermore, these approaches did not exploit the left-to-right join order technique to improve query plan selection. In [1], Tatarinov et al. proposed the first solution for supporting ordered XML query processing in a relational database. A modified EDGE table [5] was the underlying storage scheme. They described three order encoding methods: global, local, and dewey encodings. The best query performance was achieved with the global encoding for query-mostly workloads and with dewey encoding for a mix of queries and updates. Our focus differs from the above approach in the following ways. First, we focus on query-mostly workloads. Second, we consider a novel order-conscious storage scheme that is more space- and query-efficient and scalable when compared to the global encoding. 3 Background on SUCXENT++ Our approach for ordered XPATH processing relies on the SUCXENT++ approach [10]. We begin our discussion by briefly reviewing the storage scheme of SUCXENT++. Foremost, in the rest of the paper, we always assume 3

4 Path PathId PathExp 3.catalog#.book#.chapter#.para# 4.catalog#.book#.chapter# 5.catalog#.book# PathValue DocId Leaf Order Branch Order PathId Dewey OrderSum Sibling Sum Leaf Value D D D D D D D7 DocRValue DocId Level RValue Attribute DocId LeafOrder PathId LeafValue book book book 03 Figure : XML data in RDBMS document order in our discussions. The SUCXENT++ schema is shown in Figure 1(a). Document stores the document identifier DocId and the name Name of a given input XML document T. We associate each distinct (root-to-leaf) path appearing in T, namely PathExp, with an identifier PathId and store this information in Path table. For each leaf node n in T, we shall create a tuple in the PathValue table. We now elaborate the meaning of the attributes of this relation. Given two leaf nodes n 1 and n, n 1.LeafOrder < n.leaforder iff n 1 precedes n. LeafOrder of the first leaf node in T is 1 and n.leaforder = n 1.LeafOrder+1 iff n 1 is a leaf node immediately preceding n. Given two leaf nodes n 1 and n where n 1.LeafOrder+1 = n.leaforder, n.branchorder is the level of the nearest common ancestor of n 1 and n. That is, n 1 and n intersect at the BranchOrder level. The data value of n is stored in n.leafvalue. To discuss BranchOrderSum and RValue, we introduce some auxiliary definitions. Consider a sequence of leaf nodes C: n 1, n, n 3,..., n r in T. Then, C is a k-consecutive leaf nodes of T iff (a) n i.branchorder k for all i [1,r]; (b) If n 1.LeafOrder > 1, then n 0.BranchOrder < k where n 0.LeafOrder+1 = n 1.LeafOrder; and (c) If n r is not the last leaf node in T, then n r+1.branchorder < k where n r.leaforder+1 = n r+1.leaforder. A sequence C is called a maximal k-consecutive leaf nodes of T, denoted as M k, if there does not exist a k-consecutive leaf nodes C and C < C. Let L max be the largest level of T. Then, RValue of level l, denoted as R l, is 1 if l = L max. Otherwise, R l = R l+1 M l Now we are ready to define the BranchOrderSum attribute. Let N to be the set of leaf nodes preceding a leaf node n. n.branchordersum is 0 if n.leaforder = 1 and m N R m.branchorder otherwise. Based on the definitions above, Prakash et al. [10] defined Property 1 (below) which is essential to determine ancestor-descendant relationships efficiently. Property 1 Given two leaf nodes n 1 and n, n 1.BranchOrderSum - n.branchordersum < R l implies the nearest common ancestor of n 1 and n is at a level greater than l. 4 Extensions of SUCXENT++ To support ordered XML queries, the order information of nodes must be captured in the XML storage scheme. Unfortunately the LeafOrder and BranchOrderSum attributes only encode the global order of all leaf nodes. Since (order) information of non-leaf nodes is not explicitly stored, it must be derived from the attributes of leaf nodes. We now present how the original SUCXENT++ schema is extended to process ordered XPath queries efficiently. First, we move information related to attribute nodes from PathValue table to a new Attribute table. Second, we introduce new attributes to encode the relative order between (both non-leaf and leaf) nodes. Specifically, DeweyOrderSum and SiblingSum are introduced to replace BranchOrderSum. Second, the definition of RValue is modified such that RValue and DeweyOrderSum preserve the properties presented [10]. The modified schema is shown in Figure 1(b) and Figure shows the shredded version of the example XML document. 4

5 Algorithm to Compute DeweyOrderSum 01 for each non-attribute leaf node n i arranged in document order { 0 if (n i.branchorder > 0) { 03 n i.deweyordersum = dsum[ n i.branchorder + 1 ] + ModRValue( n i.branchorder ); 04 for ( level = n i.branchorder + 1; level <= maxdepthofxmldoc; level++ ) 05 dsum[level] = n i.deweyordersum; 06 } 07 else { //initialization for the first leaf node 08 n i.deweyordersum = 0; 09 for ( level = 1; level <= maxdepthofxmldoc; level++ ) 10 dsum[level] = 0; 11 } 1 } Figure 3: Algorithm to compute DeweyOrderSum 4.1 Attribute Table The PathValue table originally stored information related to both element and attribute nodes. However, to avoid mixing the order of element and attribute nodes, we separate the attribute nodes into Attribute table. The Attribute table consists of the following columns: DocId, LeafOrder, PathId, LeafValue. As we shall see later, a non-leaf node can be represented by the first descendant leaf nodes. Therefore, an attribute node is identified by DocId and LeafOrder of its parent node and its PathId. 4. Modified RValue Attribute Conceptually, RValue is used to encode the level of the nearest common ancestor of any pairs of leaf nodes. To ensure a property like Property 1 holds after modifications, intuitively, we magnify the gap between RValues, as shown in Definition 1. Relative order information is then captured in these gaps. Definition 1 [ModifiedRValue] Let L max be the largest level of an XML tree T. ModifiedRValue of level l, denoted as R l, is defined as follows: (i) If l = L max 1 then R l = 1; (ii) If 0 < l < L max 1 then R l = R l+1 M l For example, consider the XML tree shown in Figure 1(c). L max = 4. The values of M 1, M, and M 3 are 6, 3, and 1, respectively. Then, R 3 = 1, R = 1 M = 3, and R 1 = 3 M + 1 = 19. To ensure the evaluation of queries other than ordered XPATH queries is not affected by the above modifications, the RValue attribute in DocumentRValue stores R l instead of R l. 4.3 DeweyOrderSum and SiblingSum Attributes Next, we define the first attribute related to ordered XPATH processing. Consider the path query /catalog/book[1]/chapter[1] and Figure 1(c). Since only leaf nodes are stored in the PathValue table, the new attribute DeweyOrderSum of leaf nodes captures order information of the non-leaf nodes. At first glance, a simple representation of the order information could be a Dewey path. For instance, the Dewey path of the first chapter node of the first book node is However, using such Dewey paths has two major drawbacks. Firstly, string matching of Dewey paths can be computationally expensive. Secondly, simple lexicographical comparisons of two Dewey paths may not always be accurate [1]. Comparing 1. and 10. in lexicographical order will indicate that 10. appears before 1. [1]. Hence, we define DeweyOrderSum for this purpose: Definition [DeweyOrderSum] Consider an XML document T and a leaf node n at level l in T. Ord(n, k) = i iff a is either an ancestor of n or n itself; k is the level of a; and a is the i-th child of its parent. DeweyOrderSum of n, n.deweyordersum, is defined as l j= Φ(j) where Φ(j)=[Ord(n, j)-1] R j 1. 5

6 Algorithm to Compute SiblingSum 01 for each non-attribute leaf node n i arranged in document order { 0 if( n i.branchorder > 0 ) { 03 sord = siblingorder.order( n i.branchorder + 1, elementname[n i.branchorder + 1] ); //siblingorder.order(level, elementname) keep tracks of the same-sibling order //for a node, given the level and the element name for that level 04 n i.siblingsum = ssum[ n i.branchorder ] + (sord-1) * ModRValue( n i.branchorder ); 05 for ( level = n i.branchorder + 1; level <= maxdepthofxmldoc; level++) 06 ssum[level] = n i.siblingsum; 07 for ( level = n i.branchorder + ; level <= n i.depth; level++) 08 siblingorder.order(level, elementname[level]); 09 } 10 else { //initialization for the first leaf node 11 n i.siblingsum = 0; 1 for ( level = 1; level <= maxdepthofxmldoc; level++ ) 13 ssum[level] = 0; 14 for ( level = 1; level <= n i.depth; level++ ) 15 siblingorder.order(level, elementname[level]); 16 } 17 } Figure 4: Algorithm to compute SiblingSum For example, consider the rightmost chapter node in Figure 1(c) which has a Dewey path 1... Using the ModifiedRValue values derived previously, the DeweyOrderSum of this node can then be calculated as follows: n.deweyordersum = (Ord(n, ) 1) R 1 + (Ord(n, 3) 1) R = =. Figure 3 shows the algorithm to derive DeweyOrderSum during document shredding. dsum is an array to store the DeweyOrderSum for each level with respect to the current node n i. Since n i.branchorder is the level of the nearest common ancestor between the current node n i and the previous node, it implies that the local order of current node n i at level BranchOrder + 1 is increased by 1 and the local order of n i at level BranchOrder remains unchanged. Therefore, n i.deweyordersum equals to dsum[branchorder + 1] + ModifiedRValue(BranchOrder) (line 03). dsum is also updated accordingly (lines 04-05). Note that DeweyOrderSum is not sufficient to compute position-based predicates with QName name tests, e.g., chapter[]. Hence, the SiblingSum attribute is introduced to the PathValue table. Definition 3 [SiblingSum] Consider an XML document T and a leaf node n at level l in T. Sibling(n, k) = i iff a is either an ancestor of n or n itself; k is the level of a; and the i-th τ-child of its parent (τ is the tag name of a). SiblingSum of n, n.siblingsum, is l j= Ψ(j) where Ψ(j) = [Sibling(n, j)-1] R j 1. SiblingSum encodes the local order of nodes which are with the same tag name of n, namely same-tag-sibling order. For example, consider the children of the first book element in Figure 1(c). The local orders of title and the first and second chapter nodes are 1, and 3, respectively. On the other hand, the same-tag-sibling order of these nodes are 1, 1 and, respectively. The algorithm to compute SiblingSum is shown in Figure 3. SiblingOrder.Order(level, elementname) is used to calculate Sibling(n i, k) for the current node n i where k = level and τ = elementname. n i.siblingsum equals to ssum[branchorder] + [Sibling(n i, n i.branchorder+1) - 1] ModifiedRValue(BranchOrder) (line 04). 4.4 Preservation of SUCXENT++ s Features The above modifications do not adversely affect the document reconstruction process and efficient evaluation of non-ordered XPATH queries, as discussed in [10]. Recall that given a pair of leaf nodes, Property 1 was used in [10] to efficiently determine the nearest common ancestor of the nodes. Since we have modified the definition of RValue and replaced the BranchOrderSum attribute with the DeweyOrderSum attribute, this property is not applicable to the extended SUCXENT++ scheme. It is necessary to ensure that a corresponding property holds in the extended system. LEMMA 1 l j=k Φ(j) R k 1 where Φ(j) =[Ord(n, j)-1] R j 1, k (,l] and n is a leaf node in an XML document at level l. 6

7 Based on the above lemma, it is straightforward to show that l j=k Φ(j) < R k. Theorem 1 Let n 1 and n be two leaf nodes in an XML document. If R l n 1.DeweyOrderSum - n.deweyordersum < R l then the level of the nearest common ancestor of n 1 and n is l + 1. For example, consider the second leaf node in Figure 1(c). DeweyOrderSum of this node is 3. Let D 1 be the DeweyOrderSum of leaf nodes that have nearest common ancestor at level. Using the above theorem, D 1 falls within the following range: (R 1)/ + 1 D 1 3 < (R 1 1)/ + 1 D 1 3 < 10 which returns the first and fourth leaf nodes (DeweyOrderSum = 0 and 6, respectively). Let D be the DeweyOrderSum of leaf nodes that have nearest common ancestor at level 3. D falls within the following range: (R 3 1)/ + 1 D 3 < (R 1)/ D 3 < which returns the third leaf node (DeweyOrderSum = 4). Now let say we want to get the leaf nodes that have nearest common ancestor at level or deeper and let D 3 be the DeweyOrderSum of these nodes. D 3 falls within the following range: D 3 3 < (R 1 1)/ + 1 D 3 3 < 10 which returns the first four leaf nodes. We illustrate Theorem 1 further with XQUERY example. Consider the following XQUERY on the XML tree in Figure 1(c). FOR RETURN $b IN document( catalog )/catalog/book[1] $b/chapter/para Let D a be DeweyOrderSum of the first leaf node satisfying /catalog/book[1], which is the first title node. The RETURN clause implies the path /catalog/book/chapter/para. In this particular case, the nearest common ancestor between para nodes and book node is at level. This implies that the nearest common ancestor between para nodes and the first leaf node satisfying book node is at level or deeper. Let D b be DeweyOrderSum of the resulting nodes that satisfy both paths. Since the level of intersection is or deeper, D b D a < (R 1 1)/ + 1. From Figure 1(c), D a = 0 and R 1 = 19. Hence, nodes in the query result set must satisfy the inequality: D b 0 < (19 1)/ + 1. Note that DeweyOrderSum of the second and third leaf nodes (para nodes) are 3 and 4. Since 3 0 < 10 and 4 0 < 10, these nodes satisfies the above query. Whereas, the fifth leaf node whose DeweyOrderSum is 19 does not satisfy the above query. The proofs of the lemma, theorems and propositions are given in Appendix A. 5 Ordered XPath Processing This section describes how ordered XPATH queries are supported by the modified schema. First, we propose a method of node order comparison in the absence of non-leaf nodes. Next, we show how ordered XPATH queries are supported in detail. Finally, we present a translation algorithm of ordered XPATH queries and SQL. 5.1 Non-leaf Node Order Comparison Our strategy for comparing the order of non-leaf nodes is based on the following observation. If node n 0 precedes (resp. follows) another node n 1, then descendants of n 0 must also precede (resp. follow) the descendants of n 1. Therefore, instead of comparing the order between non-leaf nodes, we compare the order between their descendant leaf nodes. For this reason, we define a representative leaf node of a non-leaf node n to be its first descendant leaf node. Note that the BranchOrder attribute records the level of the nearest common ancestor of two consecutive leaf nodes. Let C be the sequence of descendant leaf nodes of n and n 1 be the first node in C. We know that the nearest common ancestor of any two consecutive nodes in C is also a descendant of node n. This implies (1) except n 1, BranchOrder of a node in C is at least the level of node n and () the nearest common ancestor of n 1 and its immediately preceding leaf node is not a descendant of node n. Therefore, BranchOrder of n 1 is always smaller than the level of n. We summarize this property in Property. 7

8 Property Let n be a non-leaf node at level l and C = n 1, n, n 3,..., n r be the sequence of descendant leaf nodes of n in document order. Then, n 1.BranchOrder < l and n i.branchorder l, where i (1,r]. Definition 4 [DeweyOrderSum of non-leaf nodes] Let S = i 1, i, i 3,..., i r1 be a sequence of non-leaf sibling nodes of a non-leaf node i 0 in document order. Let C = n 1, n,..., n r be the sequence of leaf nodes of S and n j is denoted as the first descendant leaf node of i j1. Then, i j1.deweyordersum = n j.deweyordersum. In the above definition, DeweyOrderSum of a leaf node is conceptually propagated to its ancestor nodes. Consequently, the following proposition holds. Proposition 1 Let C = n 1, n, n 3,..., n r be a sequence of sibling nodes. Consider n i where 1 < i r and the level of n i is l, where l > 1. Let m be n i or descendant of n i. Then, n 1.DeweyOrderSum+ [Ord(n i ) - Ord(n 1 )] R l 1 m.deweyordersum < n 1.DeweyOrderSum+ [(Ord(n i ) - Ord(n 1 ))+1] R l 1 where Ord(n i) and Ord(n 1 ) are the local order of n i and n 1, respectively. By using the above proposition, we can compare the order of two non-leaf nodes without evaluating every sibling nodes in the sequence. Also, since n 1 is the first sibling, Ord(n 1 ) = 1. Therefore, based on the above proposition, the following holds: n 1.DeweyOrderSum+ [Ord(n i )-1] R l 1 m.deweyordersum < n 1.DeweyOrderSum+ Ord(n i ) R l 1. Similar propositions for SiblingSum can be established in a straightforward manner. 5. Support for Ordered XPath Queries We now present how various types of ordered XPATH queries are supported by the modified SUCXENT++. Due to space constraints, we only focus on how DeweyOrderSum and ModifiedRValue are used for query processing. Similar technique can be applied to evaluations with SiblingSum. Position predicates. Position-based predicates, i.e., predicates of the form position()=i, select the node at the i-th position of the sequence of inner focus context nodes. We propose to compute the i-th node without evaluating every node in the sequence by applying Proposition 1. For example, suppose n 1 be the first book node of the sequence of book nodes (the context nodes) in Figure 1(c). Observe that n 1.DeweyOrderSum = 0 as its representative leaf node is the first leaf node of the XML tree. We now employ the inequality in Proposition 1 to select a sibling node, e.g., the second book node n. Here, Ord(n ) =, l =, R 1 = 19, and n 1.DeweyOrderSum = 0. Then, n.deweyordersum < n i.deweyordersum < 38. The nodes in this range are the descendant leaf nodes of n. Such simple arithmetic calculations can be efficiently implemented in a relational database. fn : last() can be computed by first determining all sibling nodes that satisfy the specific path and then finding the node with the largest DeweyOrderSum. Position predicate on child axes. This class of queries can be translated into a child axis followed by a position predicate, in which one must select the i-th child of the context node. Our strategy is to determine the first child of a context node and then the child s i-th sibling node as described above. First, by using Definition 4, we know that if n is the first child of n 1, then n 1.DeweyOrderSum = n.deweyordersum. Second, Proposition 1 provides us a method for selecting the i-th sibling node of a node. Reconsider Figure 1(c) and the XPATH query /catalog/*[]. The query result is the second child of catalog node. Suppose n 0 is the context node /catalog. Let n 1 be the first sibling node in the sequence returned by the expression /catalog/*. Then, n 0.DeweyOrderSum = n 1.DeweyOrderSum. Since the first sibling in that sequence is n 1 and all siblings of n 1 is in that sequence, we can now utilize Proposition 1 to select the leaf nodes of the second node in the context. The range operator, e.g., [position()= TO 10], can be easily handled in similar fashion. Following and preceding axes. following axis selects all nodes which follow the context node excluding the descendants of the context node. preceding axis, on the other hand, selects all nodes which precede 8

9 DeweyOrderSum D i - ((R L- - 1) / + 1) D i +R D L-1 i + ((R L- - 1) / + 1) D i preceding-sibling following-sibling preceding following D i : DeweyOrderSum of the context node L : level of the context node descendant-or-self Figure 5: Relationship between DeweyOrderSum and RValue. the context node excluding the ancestors of the context node. Similar to position predicates, we summarize a property of DeweyOrderSum to facilitate efficient processing of these axes. Proposition Let n a and n b be two nodes in the XML tree T and n b is a context node at level l b where l b > 1. Then, the following statements hold: 1. n a.deweyordersum n b.deweyordersum+r l b 1 if and only if n a follows n b and is not a descendant of n b ;. Similarly, n a.deweyordersum < n b.deweyordersum if and only if n a precedes n b and n a is neither a descendant nor an ancestor of n b. We illustrate this proposition in Figure 5. Now let us consider the following examples. Suppose that we evaluate the following axis on the first book node n b in Figure 1(c). Here, n b.deweyordersum = 0, l = and R 1 = 19. Let N be the nodes in the result of the evaluation of following axis. Then, by using Proposition, n N must satisfy this inequality: n.deweyordersum Similarly, suppose we evaluate the preceding axis on the last book node n b in Figure 1(c). n b.deweyordersum = 38. Denote the sequence of nodes satisfying the preceding axis to be N. Then n N must satisfy the following inequality: n.deweyordersum < 38. Another example can be found in Figure 1(c). Note that since SUCXENT++ only stores the leaf nodes, returning the internal nodes and the whole subtree as required by following::* and preceding::* axis require extra processing as the resulting XML document may have a very different structure or schema than the original XML document. This is achieved in SUCXENT++ during the result construction phase. Observe that, in our query, if we use QName as the name test (for example following::title), and the path from the root to QName element is unique then no extra processing is required. If the level of the QName is greater than the level of the context node (e.g. /catalog/book[]/preceding::title in Figure 1(c)), then we can use Proposition and PathId to return the resulting nodes. The same also applies if the level of the QName equals to the level of the context node (e.g. /catalog/*[1]/following::book in Figure 1(c)). However, if the level of the QName is less than the level of the context node (e.g. /catalog/*[1]/chapter/para/following::chapter in Figure 1(c)), we need to use Theorem 1 to exclude the nodes that have common ancestor at QName level or deeper. These three cases are illustrated in Figure 6. Following-sibling and preceding-sibling axes. following-sibling axis selects the children of the context node s parent that occur after the context node in document order whereas preceding-sibling axis selects the children of the context node s parent that occur before the context node in document order. Support for following-sibling (resp. preceding-sibling) axis can be achieved with an additional constraint on the following (resp. preceding) axis the selected nodes must be siblings of the context node. Proposition 3 Let n a and n b be two nodes in the XML tree T and n b is the context node at level l b where l b >. Then, the following statements hold: 9

10 L C QName QName L Q preceding following preceding::qname following::qname (a) Case 1: Qname level (L Q) > context node level (L C) QName QName L Q = L C preceding following preceding::qname following::qname (b) Case : Qname level (L Q) = context node level (L C) QName QName QName L Q L C preceding following preceding::qname following::qname (a) Case 3: Qname level (L Q) < context node level (L C) The context node The leaf node that represents the context node Figure 6: following::qname 1. n b.deweyordersum +R l b 1 n a.deweyordersum < n b.deweyordersum +(R l b 1)/ + 1 if and only if n a is a sibling of n b and n a follows n b.. n b.deweyordersum (R l b 1)/ 1 < n a.deweyordersum < n b.deweyordersum if and only if n a is a sibling of n b and n a precedes n b. The above proposition is illustrated in Figure 5. Now let us consider the following examples. Suppose we evaluate the following-sibling axis on the first title node n t in Figure 1(c). Here n t.deweyordersum = 0, l = 3, R 1 = 19, and R = 3. Denote N to be the nodes reachable via the following-sibling axis from n t. Using Proposition 3, n k.deweyordersum < 0 + (19 1)/ + 1 where n k N. That is, 3 n k.deweyordersum < 10. Hence, the second (DeweyOrderSum = 3) and the third (DeweyOrderSum = 6) chapter nodes are in this range. Now, suppose we evaluate the preceding-sibling axis at the last chapter node n c in Figure 1(c). Here n c.deweyordersum =. Let N be the nodes which satisfy the preceding-sibling axis. Therefore, (19 1)/ 1 < n r.deweyordersum < where n r N. That is, 1 < n r.deweyordersum <. Hence, the chapter node with DeweyOrderSum = 19 satisfies this bound. Another example can be found in Figure 1(c). 10

11 processpathexpr (XPath) 01 for every step in the XPath { 0 if (step.getaxis() == CHILD and step.haspredicate() == FALSE) 03 currentpath.add(nametest, step.getaxis()) 04 else { 05 from_sql.add("pathvalue as Vi") 06 if(currentpath.level() > 1) { where_sql.add("vi.pathid in currentpath.getpathid()") where_sql.add("vi.branchorder < currentpath.level()") 09 } 10 processaxis(step, currentpath) 11 processpredicate(step, currentpath) 1 } 13 if (step.islast() and currentpath.needupdate()) { from_sql.add("pathvalue as Vi") where_sql.add("vi.pathid in currentpath.getpathid()") 16 } 17 } 18 select_sql.add("vi.leafvalue, Vi.leafOrder,... ") 19 return select_sql + from_sql + where_sql + where_sql.unionwithattribute() (a)the processpathexpr Algorithm processaxis (step, currentpath) 01 switch (step.getaxis()){ 0 child: 03 where_sql.add("vi.deweyordersum BETWEEN Vi-1.DeweyOrderSum - RValue(currentPath.level() - 1) + 1 AND Vi-1.DeweyOrderSum + RValue(currentPath.level() - 1) - 1 ") 04 following: 05 where_sql.add("vi.deweyordersum >= Vi-1.DeweyOrderSum + * RValue(currentPath.level()) - 1 ") 06 if (currentpath.qnamelevel(nametest) < currentpath.level() 07 where_sql.add("+ RValue(currentPath.QNameLevel(nametest) - 1) - 1") 08 preceding: 09 where_sql.add("vi.deweyordersum < Vi-1.DeweyOrderSum ") 10 if (currentpath.qnamelevel(nametest) < currentpath.level() 11 where_sql.add("- RValue(currentPath.QNameLevel(nametest) - 1) + 1") 1 following-sibling: 13 where_sql.add("vi.deweyordersum BETWEEN Vi-1.DeweyOrderSum + * RValue(currentPath.level()) - 1 AND Vi-1.DeweyOrderSum + RValue(currentPath.level() - 1) - 1 ") 14 preceding-sibling: 15 where_sql.add("vi.deweyordersum BETWEEN Vi-1.DeweyOrderSum - RValue(currentPath.level() - 1) + 1 AND Vi-1.DeweyOrderSum - 1 ") 16 } 17 currentpath.add(nametest, step.getaxis()) (b)the processaxis Algorithm Figure 7: Procedure processpathexpr and Procedure processaxis. processpredicate (step, currentpath) 01 switch (step.getaxis()) { 0 CHILD: 03 n_from = step.getpredicatefrom() n_to = step.getpredicateto() 05 FOLLOWING-SIBLING: 06 n_from = step.getpredicatefrom() 07 n_to = step.getpredicateto() PRECEDING-SIBLING: 09 n_from = - step.getpredicatefrom() 10 n_to = - step.getpredicateto() } 1 switch (step.getpredicatetype()){ 13 position based predicate without name test: 14 where_sql.add("v i.deweyordersum BETWEEN V i-1.deweyordersum + n_from * ( * RValue(currentPath.level()) - 1) AND V i-1.deweyordersum + n_to * ( * RValue(currentPath.level()) - 1) - 1 ") 15 position based predicate with name test: 16 where_sql.add("v i.siblingsum BETWEEN V i-1.siblingsum + n_from * ( * RValue(currentPath.level()) - 1) AND V i-1.siblingsum + n_to * ( * RValue(currentPath.level()) - 1) - 1 ") 17 } Figure 8: Procedure processpredicate. 5.3 Ordered XPath Query Translation Algorithm Based on the properties defined in the previous subsection, we present an algorithm, shown in Figures 7 and 8, for generating SQL from ordered XPATH queries. Our algorithm assumes an XPATH expression is represented as a sequence of steps where a step may be associated with predicates. A SQL statement consists of three clauses: select sql, from sql and where sql. We assume that a clause has an add() method which encapsulates some simple string manipulations and simple SUCXENT++ joins for constructing valid SQL statements. In addition to preprocessing PathId as mentioned in [10], for a single XML document, we also preprocess RValue to reduce the number of joins. The translation consists of three main procedures. processpathexpr (Figure 7(a)): It analyzes the steps of an input XPATH expression (Line 01) and outputs a SQL statement. If the step consists of a child axis only (Lines 0-03), then we simply maintain a global variable currentp ath which records the simple downward path from the root to the context nodes. 1 Otherwise, when the step involves ordered predicates/other axes, we add predicates which select a superset of the next context nodes (Lines 05-09) and then call processaxis and processpredicate (Lines 10-11) with currentp ath to obtain the next context nodes. We add predicates in Lines 08 to determine the representative nodes of the context nodes. Finally, we collect the final results (Line 19). processaxis (Figure 7(b)): This procedure translates a step, together with currentp ath, based on the step type (Line 01). Lines 0-03, and 1-15 encode Theorem 1, Proposition and Proposition 3, respectively. 1 The details for maintaining currentp ath is simple but lengthy. For simplicity, we omitted such discussions. 11

12 01 WITH V (leafvalue, pathid, branchorder, DeweyOrderSum, DocId, LeafOrder ) AS ( 0 SELECT V.leafValue, V.pathID, V.branchOrder, V.DeweyOrderSum, V.DocId, V.LeafOrder 03 FROM PathValue V1, PathValue V 04 WHERE V1.docId = 1 05 AND V1.SiblingSum BETWEEN * ( * 10-1) AND * ( * 10-1) AND V1.pathid in (5,4,3,) 07 AND V1.branchOrder < 08 AND V.docId = V1.docId 09 AND V.DeweyOrderSum BETWEEN V1.DeweyOrderSum AND V1.DeweyOrderSum AND V.pathid in (4,3,) 11 AND V.DeweyOrderSum BETWEEN V1.DeweyOrderSum + 1 * ( * - 1) AND V1.DeweyOrderSum + 3 * ( * - 1) ) 13 SELECT V.*, 1 AS Attr 14 FROM V 15 UNION ALL 16 SELECT A.leafValue, A.pathID, V.branchOrder, V.DeweyOrderSum, A.DocId, A.LeafOrder, 0 AS Attr 17 FROM Attribute A, V 18 WHERE A.DocId = V.DocId AND A.LeafOrder = V.LeafOrder 19 AND A.PathId in (0) 0 ORDER BY DocId, DeweyOrderSum, Attr Figure 9: SQL example: /catalog/book[1]/*[position()= to 3] 01 WITH V (leafvalue, pathid, branchorder, DeweyOrderSum, DocId, LeafOrder ) AS ( 0 SELECT DISTINCT V.leafValue, V.pathID, V.branchOrder, V.DeweyOrderSum, V.DocId, V.LeafOrder 03 FROM PathValue V1, PathValue V 04 WHERE V1.docId = 1 05 AND V1.pathid in (4,3) 06 AND V1.branchOrder < 3 07 AND V.docId = V1.docId 08 AND V.DeweyOrderSum >= V1.DeweyOrderSum + * AND V.pathid in (5,4,3,) 11 ) 1 SELECT V.*, 1 AS Attr 13 FROM V 14 UNION ALL 15 SELECT A.leafValue, A.pathID, V.branchOrder, V.DeweyOrderSum, A.DocId, A.LeafOrder, 0 AS Attr 16 FROM Attribute A, V 17 WHERE A.DocId = V.DocId AND A.LeafOrder = V.LeafOrder 18 AND A.PathId in (1) 19 ORDER BY DocId, DeweyOrderSum, Attr Figure 10: SQL example: /catalog/book/chapter/following::book processpredicate (Figure 8(a)): This procedure mainly translates position predicates. Lines determine the range of position specified by the predicate. Given these, Lines 1-17 implement Proposition 1. We now illustrate the details of the translation algorithms with five examples related to translation of position-based predicates. Example 1 [Position-based predicates] Consider the path expression /catalog/book[1]/ *[position()= to 3]. The translated SQL is shown in Figure 9. /catalog/book[1] is translated to Lines Theorem 1 is used to get the children of /catalog/book[1] (lines 08-10), and /*[] is translated to Lines 11. Lines are used to union the resulting element nodes with the attribute nodes and the last line is to order the result by document order. Example [SQL for following axis] Consider the path expression /catalog/book/chapter/ following::book. The translated SQL is shown in Figure 10. /catalog/book/chapter is translated to Lines Proposition is used to get the following nodes (line 08) and since the level of book is higher then the level of /catalog/book/chapter, then Theorem 1 is used to exclude the nodes that have common ancestor at level or deeper (line 09). PathId is used to return only the book nodes (line 10). Example 3 [SQL for preceding axis] Consider the path expression /catalog/book/chapter/ preceding::book. The translated SQL is similar to Figure 10 except that the predicate in line 08 is 1

13 01 WITH V (leafvalue, pathid, branchorder, DeweyOrderSum, DocId, LeafOrder ) AS ( 0 SELECT DISTINCT V.leafValue, V.pathID, V.branchOrder, V.DeweyOrderSum, V.DocId, V.LeafOrder 03 FROM PathValue V1, PathValue V 04 WHERE V1.docId = 1 05 AND V1.SiblingSum BETWEEN * ( * 10-1) AND 0 + * ( * 10-1) AND V1.pathid in (5,4,3,) 07 AND V1.branchOrder < 08 AND V.docId = V1.docId 09 AND V.DeweyOrderSum >= V1.DeweyOrderSum + * AND V.pathid in (5,4,3,) 11 AND V.DeweyOrderSum BETWEEN V1.DeweyOrderSum + 1 * ( * 10-1) AND V1.DeweyOrderSum + * ( * 10-1) ) 13 SELECT V.*, 1 AS Attr 14 FROM V 15 UNION ALL 16 SELECT A.leafValue, A.pathID, V.branchOrder, V.DeweyOrderSum, A.DocId, A.LeafOrder, 0 AS Attr 17 FROM Attribute A, V 18 WHERE A.DocId = V.DocId AND A.LeafOrder = V.LeafOrder 19 AND A.PathId in (1) 0 ORDER BY DocId, DeweyOrderSum, Attr Figure 11: SQL example: /catalog/book[]/following-sibling::*[1] replaced with V.DeweyOrderSum < V1.DeweyOrderSum due to Proposition and line 09 is replaced with due to Theorem 1. Example 4 [SQL for following-sibling axis] Consider the path expression /catalog/book[]/ following-sibling::*[1]. The translated SQL is shown in Figure 11. /catalog/book[] is translated to Lines /following-sibling::* is translated to Lines Since the level of /catalog/book is, the translated SQL for following-sibling is similar to following axis (line 9). *[1] is translated to line 11. Example 5 [SQL for preceding-sibling axis] Consider the path expression /catalog/book[]/ preceding-sibling::*[1]. The translated SQL is similar to Figure 11 except that the predicate in line 09 is replaced with V.DeweyOrderSum < V1.DeweyOrderSum. 6 Join Order Enforcement Due to the tree-unaware nature of the underlying relational storage scheme as well as the lack of appropriate XML statistics, relational optimizers may generate inefficient query plans. In order to address this problem, some approaches have resorted to manual tuning of query plans [1] while others invade the database kernel to make it tree-aware [3, 4]. The former approach has not been scalable as it requires significant human intervention whereas the later approach may require non-trivial modifications of the internals of a RDBMS. In this section, we propose a simple yet effective technique to generate better query plans automatically without invading the database kernel. As discussed in Section 5.3, in order to evaluate an (ordered) XPATH query in SUCXENT++, each XPATH axis is translated into a join between the PathValue table and intermediate results (i.e., the context nodes). For example, in Figures 9-11, PathValue V1 returns the representative nodes of the context nodes to calculate PathValue V. Due to the lack of tree awareness, the relational optimizer is not capable of transforming the order of joins intelligently. Consequently, it may generate poor join order that typically requires caching large intermediate results in the database bufferpool. This is particularly important to NL joins, where large and deep loops are prohibitive. For example, the first few joins of a right-to-left join order may easily yield a large number of context nodes. To respond to this, we propose to enforce a left-to-right join order on the translated SQL query. Also, this evaluation order naturally corresponds to the order of XPATH steps specified in the XPATH expression. By employing this technique, the relational optimizer does not explore the large number of permutations of join order. We apply join order if the translated SQL query involves more than one PathValue 13

14 Total Number ID Node Attribute Total DC10 5,34 15,000 40,34 DC100,4,00 150,000,39,00 DC1000,44,61 1,500,000 3,94,61 DBLP 8,,945 1,665,930 9,888,875 (a) Features of Dataset Size (MB) Max Depth ID Query Res. Card. D1 /dblp/*[100000]/author D /dblp/article/author[] 190,838 D3 /dblp/*[600000]/pages/preceding-sibling::* 6 D4 /dblp/*[600000]/pages/following-sibling::* 5 (c) Benchmark queries for DBLP ID Query Res. Card. (10MB) Res. Card. (100MB) Res. Card. (1000MB) Q1 /catalog/item[1000] Q /catalog/*[1000] /catalog/item[position()=1000 to 10000]/ Q3 *[position()= to 7] 104,7 66,81 67,00 /catalog/item[position()=1000 to 10000]/authors/ Q4 author 65,161 39, ,350 (b) Benchmark queries for DC10, DC100, and DC1000 ID Query Res. Card. (10MB) Res. Card. (100MB) Res. Card. (1000MB) Q5 /catalog/*[1500]/publisher/following-sibling::* Q6 /catalog/*[1500]/publisher/following-sibling::*[5] Q7 /catalog/*[1500]/publisher/preceding-sibling::* Q8 /catalog/*[1500]/publisher/preceding-sibling::*[] Q9 /catalog/*[x]/following::title 50,500 5,000 Q10 /catalog/*[y]/preceding::title 49,499 4,499 X = 50, 500, 5000 for DC10, DC100, DC1000 respectively; Y = 50, 500, 5000 for DC10, DC100, DC1000 respectively Figure 1: Dataset and Benchmark Queries. relation. In addition, if the PathValue table appears in the SQL query only once, we let the relational optimizer to decide the plan for the join between the PathValue table and the Attribute table. The above enforcement can easily be implemented by query hints in commercial databases. Regarding our implementation, we use OPTION(FORCE ORDER) to implement the above technique in SUCXENT++. The strength of this approach lies in its simplicity in implementing on any commercial RDBMS that supports query hints. 7 Performance Study In this section, we present the results of our performance evaluation on our proposed approach, a tree-unaware schema-oblivious approach (GLOBAL-ORDER [1]), a tree-unaware schema-conscious approach (SHARED- INLINING [11]), and a tree-aware approach (MonetDB [3]). Prototypes for modified SUCXENT++ (denoted as SX), SUCXENT++ with join order enforcement (denoted as SX-JO), GLOBAL-ORDER (denoted as GO) and SHARED-INLINING (denoted as SI) were implemented with JDK 1.5. We used the Windows version of MON- ETDB/XQuery (denoted as MXQ) downloaded from The experiments were conducted on an Intel Xeon GHz machine running on Windows XP with 1GB of RAM. The RDBMS used was Microsoft SQL Server 005 Developer Edition. Note that we did not study the performance of XML support of SQL Server 005 as it can only evaluate the first two ordered queries in Figure 1(b). 7.1 Experimental Setup Data and query sets. In our experiments, XBENCH [13] dataset was used for synthetic data. Data-centric (DC) documents were considered with data sizes ranging from 10MB to 1GB. In addition, we used a real dataset, namely DBLP XML [16]. Figure 1 (a) shows the characteristics of the datasets used. Two sets of queries were designed to cover different types of ordered XPATH queries. In additional, the cardinality of the results was varied. Figures 1 (b) and 1 (c) show the benchmark queries on XBENCH and DBLP, respectively. XPATH queries with descendant axes were not included as they had been studied in [10]. Test methodology. The XPATH queries were executed in the reconstruct mode where not only the non-leaf nodes, but also all their descendants, were selected. Appropriate indexes were constructed for all approaches (except for MONETDB) through a careful analysis on the benchmark queries. Prior to our experiments, we ensured that statistics on relations were collected. The bufferpool of the RDBMS was cleared before each run. Each query was executed 6 times and the results from the first run were always discarded. 14

Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases

Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Boon-Siew Seah 1,2, Klarinda G. Widjanarko 1,2, Sourav S. Bhowmick 1,2, Byron Choi 1, and Erwin Leonardi 1,2

More information

Efficient Evaluation of Nearest Common Ancestor in XML Twig Queries Using Tree-Unaware RDBMS

Efficient Evaluation of Nearest Common Ancestor in XML Twig Queries Using Tree-Unaware RDBMS Efficient Evaluation of Nearest Common Ancestor in XML Twig Queries Using Tree-Unaware RDBMS Klarinda G. Widjanarko Erwin Leonardi Sourav S Bhowmick School of Computer Engineering Nanyang Technological

More information

Non-Directional XPath Processing in a Tree-Unaware RDBMS

Non-Directional XPath Processing in a Tree-Unaware RDBMS Non-Directional XPath Processing in a Tree-Unaware RDBMS Sourav S Bhowmick Curtis Dyreson Erwin Leonardi Zhifeng Ng School of Computer Engineering Nanyang Technological University, Singapore Department

More information

Xandy: A Scalable Change Detection Technique for Ordered XML Documents Using Relational Databases

Xandy: A Scalable Change Detection Technique for Ordered XML Documents Using Relational Databases Xandy: A Scalable Change Detection Technique for Ordered XML Documents Using Relational Databases Erwin Leonardi Sourav S. Bhowmick School of Computer Engineering, Nanyang Technological University, Singapore

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Xandy: Detecting Changes on Large Unordered XML Documents Using Relational Databases

Xandy: Detecting Changes on Large Unordered XML Documents Using Relational Databases : Detecting Changes on Large Unordered XML Documents Using Relational Databases Erwin Leonardi Sourav S. Bhowmick Sanjay Madria School of Computer Engineering Department of Computer Science Nanyang Technological

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

A Dynamic Labeling Scheme using Vectors

A Dynamic Labeling Scheme using Vectors A Dynamic Labeling Scheme using Vectors Liang Xu, Zhifeng Bao, Tok Wang Ling School of Computing, National University of Singapore {xuliang, baozhife, lingtw}@comp.nus.edu.sg Abstract. The labeling problem

More information

Part XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440

Part XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440 Part XVII Staircase Join Tree-Aware Relational (X)Query Processing Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440 Outline of this part 1 XPath Accelerator Tree aware relational

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Labeling Dynamic XML Documents: An Order-Centric Approach

Labeling Dynamic XML Documents: An Order-Centric Approach 1 Labeling Dynamic XML Documents: An Order-Centric Approach Liang Xu, Tok Wang Ling, and Huayu Wu School of Computing National University of Singapore Abstract Dynamic XML labeling schemes have important

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Index-Driven XQuery Processing in the exist XML Database

Index-Driven XQuery Processing in the exist XML Database Index-Driven XQuery Processing in the exist XML Database Wolfgang Meier wolfgang@exist-db.org The exist Project XML Prague, June 17, 2006 Outline 1 Introducing exist 2 Node Identification Schemes and Indexing

More information

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014 Outline Gerênciade Dados daweb -DCC922 - XML Query Processing ( Apresentação basedaem material do livro-texto [Abiteboul et al., 2012]) 2014 Motivation Deep-first Tree Traversal Naïve Page-based Storage

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

4 Fractional Dimension of Posets from Trees

4 Fractional Dimension of Posets from Trees 57 4 Fractional Dimension of Posets from Trees In this last chapter, we switch gears a little bit, and fractionalize the dimension of posets We start with a few simple definitions to develop the language

More information

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,

More information

Indexing XML documents for XPath query processing in external memory

Indexing XML documents for XPath query processing in external memory Data & Knowledge Engineering xxx (2005) xxx xxx www.elsevier.com/locate/datak Indexing XML documents for XPath query processing in external memory Qun Chen a, *, Andrew Lim a, Kian Win Ong b, Jiqing Tang

More information

TwigList: Make Twig Pattern Matching Fast

TwigList: Make Twig Pattern Matching Fast TwigList: Make Twig Pattern Matching Fast Lu Qin, Jeffrey Xu Yu, and Bolin Ding The Chinese University of Hong Kong, China {lqin,yu,blding}@se.cuhk.edu.hk Abstract. Twig pattern matching problem has been

More information

Full-Text and Structural XML Indexing on B + -Tree

Full-Text and Structural XML Indexing on B + -Tree Full-Text and Structural XML Indexing on B + -Tree Toshiyuki Shimizu 1 and Masatoshi Yoshikawa 2 1 Graduate School of Information Science, Nagoya University shimizu@dl.itc.nagoya-u.ac.jp 2 Information

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

A Structural Numbering Scheme for XML Data

A Structural Numbering Scheme for XML Data A Structural Numbering Scheme for XML Data Alfred M. Martin WS2002/2003 February/March 2003 Based on workout made during the EDBT 2002 Workshops Dao Dinh Khal, Masatoshi Yoshikawa, and Shunsuke Uemura

More information

1 The size of the subtree rooted in node a is 5. 2 The leaf-to-root paths of nodes b, c meet in node d

1 The size of the subtree rooted in node a is 5. 2 The leaf-to-root paths of nodes b, c meet in node d Enhancing tree awareness 15. Staircase Join XPath Accelerator Tree aware relational XML resentation Tree awareness? 15. Staircase Join XPath Accelerator Tree aware relational XML resentation We now know

More information

BlossomTree: Evaluating XPaths in FLWOR Expressions

BlossomTree: Evaluating XPaths in FLWOR Expressions BlossomTree: Evaluating XPaths in FLWOR Expressions Ning Zhang University of Waterloo School of Computer Science nzhang@uwaterloo.ca Shishir K. Agrawal Indian Institute of Technology, Bombay Department

More information

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013

XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 Assured and security Deep-Secure XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 This technical note describes the extensible Data

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Diversity Coloring for Distributed Storage in Mobile Networks

Diversity Coloring for Distributed Storage in Mobile Networks Diversity Coloring for Distributed Storage in Mobile Networks Anxiao (Andrew) Jiang and Jehoshua Bruck California Institute of Technology Abstract: Storing multiple copies of files is crucial for ensuring

More information

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

An Extended Byte Carry Labeling Scheme for Dynamic XML Data Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 5488 5492 An Extended Byte Carry Labeling Scheme for Dynamic XML Data YU Sheng a,b WU Minghui a,b, * LIU Lin a,b a School of Computer

More information

XML Data Stream Processing: Extensions to YFilter

XML Data Stream Processing: Extensions to YFilter XML Data Stream Processing: Extensions to YFilter Shaolei Feng and Giridhar Kumaran January 31, 2007 Abstract Running XPath queries on XML data steams is a challenge. Current approaches that store the

More information

An approach to the model-based fragmentation and relational storage of XML-documents

An approach to the model-based fragmentation and relational storage of XML-documents An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Navigation- vs. Index-Based XML Multi-Query Processing

Navigation- vs. Index-Based XML Multi-Query Processing Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com

More information

Mining Frequently Changing Substructures from Historical Unordered XML Documents

Mining Frequently Changing Substructures from Historical Unordered XML Documents 1 Mining Frequently Changing Substructures from Historical Unordered XML Documents Q Zhao S S. Bhowmick M Mohania Y Kambayashi Abstract Recently, there is an increasing research efforts in XML data mining.

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases Yangjun Chen Department of Applied Computer Science University of Winnipeg Winnipeg, Manitoba, Canada R3B 2E9 y.chen@uwinnipeg.ca

More information

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS

TRIE BASED METHODS FOR STRING SIMILARTIY JOINS TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH

More information

On k-dimensional Balanced Binary Trees*

On k-dimensional Balanced Binary Trees* journal of computer and system sciences 52, 328348 (1996) article no. 0025 On k-dimensional Balanced Binary Trees* Vijay K. Vaishnavi Department of Computer Information Systems, Georgia State University,

More information

Lecture 3 February 9, 2010

Lecture 3 February 9, 2010 6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees

More information

XML Index Recommendation with Tight Optimizer Coupling

XML Index Recommendation with Tight Optimizer Coupling XML Index Recommendation with Tight Optimizer Coupling Technical Report CS-2007-22 July 11, 2007 Iman Elghandour University of Waterloo Andrey Balmin IBM Almaden Research Center Ashraf Aboulnaga University

More information

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML

Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University

More information

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Twig Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li, Junichi Tatemura Wang-Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan NEC Laboratories

More information

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models. Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW XML and Databases Lecture 10 XPath Evaluation using RDBMS Sebastian Maneth NICTA and UNSW CSE@UNSW -- Semester 1, 2009 Outline 1. Recall pre / post encoding 2. XPath with //, ancestor, @, and text() 3.

More information

An Efficient XML Index Structure with Bottom-Up Query Processing

An Efficient XML Index Structure with Bottom-Up Query Processing An Efficient XML Index Structure with Bottom-Up Query Processing Dong Min Seo, Jae Soo Yoo, and Ki Hyung Cho Department of Computer and Communication Engineering, Chungbuk National University, 48 Gaesin-dong,

More information

We assume uniform hashing (UH):

We assume uniform hashing (UH): We assume uniform hashing (UH): the probe sequence of each key is equally likely to be any of the! permutations of 0,1,, 1 UH generalizes the notion of SUH that produces not just a single number, but a

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Avoiding Sorting and Grouping In Processing Queries

Avoiding Sorting and Grouping In Processing Queries Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents

CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents CIT. Journal of Computing and Information Technology, Vol. 26, No. 2, June 2018, 99 114 doi: 10.20532/cit.2018.1003955 99 CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Updating XML documents

Updating XML documents Grindei Manuela Lidia Updating XML documents XQuery Update XQuery is the a powerful functional language, which enables accessing different nodes of an XML document. However, updating could not be done

More information

An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering

An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering T. Ropinski, F. Steinicke, K. Hinrichs Institut für Informatik, Westfälische Wilhelms-Universität Münster

More information

HEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see

HEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see SIAM J. COMPUT. Vol. 15, No. 4, November 1986 (C) 1986 Society for Industrial and Applied Mathematics OO6 HEAPS ON HEAPS* GASTON H. GONNET" AND J. IAN MUNRO," Abstract. As part of a study of the general

More information

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Query Solvers on Database Covers

Query Solvers on Database Covers Query Solvers on Database Covers Wouter Verlaek Kellogg College University of Oxford Supervised by Prof. Dan Olteanu A thesis submitted for the degree of Master of Science in Computer Science Trinity 2018

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

ADT 2009 Other Approaches to XQuery Processing

ADT 2009 Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath

More information

Benchmarking Database Representations of RDF/S Stores

Benchmarking Database Representations of RDF/S Stores Benchmarking Database Representations of RDF/S Stores Yannis Theoharis 1, Vassilis Christophides 1, Grigoris Karvounarakis 2 1 Computer Science Department, University of Crete and Institute of Computer

More information

Index-Trees for Descendant Tree Queries on XML documents

Index-Trees for Descendant Tree Queries on XML documents Index-Trees for Descendant Tree Queries on XML documents (long version) Jérémy arbay University of Waterloo, School of Computer Science, 200 University Ave West, Waterloo, Ontario, Canada, N2L 3G1 Phone

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Semi-structured Data. 8 - XPath

Semi-structured Data. 8 - XPath Semi-structured Data 8 - XPath Andreas Pieris and Wolfgang Fischl, Summer Term 2016 Outline XPath Terminology XPath at First Glance Location Paths (Axis, Node Test, Predicate) Abbreviated Syntax What is

More information

Foundations of Computer Science Spring Mathematical Preliminaries

Foundations of Computer Science Spring Mathematical Preliminaries Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College

More information

Schema-Based XML-to-SQL Query Translation Using Interval Encoding

Schema-Based XML-to-SQL Query Translation Using Interval Encoding 2011 Eighth International Conference on Information Technology: New Generations Schema-Based XML-to-SQL Query Translation Using Interval Encoding Mustafa Atay Department of Computer Science Winston-Salem

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

LABELING DYNAMIC XML DOCUMENTS: AN ORDER-CENTRIC APPROACH XU LIANG

LABELING DYNAMIC XML DOCUMENTS: AN ORDER-CENTRIC APPROACH XU LIANG LABELING DYNAMIC XML DOCUMENTS: AN ORDER-CENTRIC APPROACH XU LIANG A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE NATIONAL UNIVERSITY OF SINGAPORE 2010 1 Acknowledgments

More information

An efficient XML query pattern mining algorithm for ebxml applications in e-commerce

An efficient XML query pattern mining algorithm for ebxml applications in e-commerce Vol. 8(18), pp. 777-790, 28 September, 2014 DOI: 10.5897/AJBM2011.644 Article Number: 0350C7647676 ISSN 1993-8233 Copyright 2014 Author(s) retain the copyright of this article http://www.academicjournals.org/ajbm

More information

Discrete mathematics

Discrete mathematics Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many

More information

BLAS : An Efficient XPath Processing System

BLAS : An Efficient XPath Processing System BLAS : An Efficient XPath Processing System Yi Chen University of Pennsylvania yicn@cis.upenn.edu Susan B. Davidson University of Pennsylvania and INRIA-FUTURS (France) susan@cis.upenn.edu Yifeng Zheng

More information

Indexing Keys in Hierarchical Data

Indexing Keys in Hierarchical Data University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan

More information

Ecient XPath Axis Evaluation for DOM Data Structures

Ecient XPath Axis Evaluation for DOM Data Structures Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be

More information

COMP9319 Web Data Compression & Search. Cloud and data optimization XPath containment Distributed path expression processing

COMP9319 Web Data Compression & Search. Cloud and data optimization XPath containment Distributed path expression processing COMP9319 Web Data Compression & Search Cloud and data optimization XPath containment Distributed path expression processing DATA OPTIMIZATION ON CLOUD Cloud Virtualization Cloud layers Cloud computing

More information

CSCI2100B Data Structures Trees

CSCI2100B Data Structures Trees CSCI2100B Data Structures Trees Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong Introduction General Tree

More information

Indexing XML Data Stored in a Relational Database

Indexing XML Data Stored in a Relational Database Indexing XML Data Stored in a Relational Database Shankar Pal, Istvan Cseri, Oliver Seeliger, Gideon Schaller, Leo Giakoumakis, Vasili Zolotov VLDB 200 Presentation: Alex Bradley Discussion: Cody Brown

More information

Data Centric Integrated Framework on Hotel Industry. Bridging XML to Relational Database

Data Centric Integrated Framework on Hotel Industry. Bridging XML to Relational Database Data Centric Integrated Framework on Hotel Industry Bridging XML to Relational Database Introduction extensible Markup Language (XML) is a promising Internet standard for data representation and data exchange

More information

Efficient pebbling for list traversal synopses

Efficient pebbling for list traversal synopses Efficient pebbling for list traversal synopses Yossi Matias Ely Porat Tel Aviv University Bar-Ilan University & Tel Aviv University Abstract 1 Introduction 1.1 Applications Consider a program P running

More information

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches E. Miller R. Libeskind-Hadas D. Barnard W. Chang K. Dresner W. M. Turner

More information

NON-CENTRALIZED DISTINCT L-DIVERSITY

NON-CENTRALIZED DISTINCT L-DIVERSITY NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk

More information

Motivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion

Motivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion JSON Schema-less into RDBMS Most of the material was taken from the Internet and the paper JSON data management: sup- porting schema-less development in RDBMS, Liu, Z.H., B. Hammerschmidt, and D. McMahon,

More information

Answering Aggregate Queries Over Large RDF Graphs

Answering Aggregate Queries Over Large RDF Graphs 1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo

More information

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University Query Optimization Shuigeng Zhou December 9, 2009 School of Computer Science Fudan University Outline Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational

More information

CPS352 Lecture - Indexing

CPS352 Lecture - Indexing Objectives: CPS352 Lecture - Indexing Last revised 2/25/2019 1. To explain motivations and conflicting goals for indexing 2. To explain different types of indexes (ordered versus hashed; clustering versus

More information

Reduced branching-factor algorithms for constraint satisfaction problems

Reduced branching-factor algorithms for constraint satisfaction problems Reduced branching-factor algorithms for constraint satisfaction problems Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW Outline XML and Databases Lecture 10 XPath Evaluation using RDBMS 1. Recall / encoding 2. XPath with //,, @, and text() 3. XPath with / and -sibling: use / size / level encoding Sebastian Maneth NICTA

More information

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests 4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in

More information

Course: The XPath Language

Course: The XPath Language 1 / 27 Course: The XPath Language Pierre Genevès CNRS University of Grenoble, 2012 2013 2 / 27 Why XPath? Search, selection and extraction of information from XML documents are essential for any kind of

More information

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions. THREE LECTURES ON BASIC TOPOLOGY PHILIP FOTH 1. Basic notions. Let X be a set. To make a topological space out of X, one must specify a collection T of subsets of X, which are said to be open subsets of

More information

Adaptive Parallel Compressed Event Matching

Adaptive Parallel Compressed Event Matching Adaptive Parallel Compressed Event Matching Mohammad Sadoghi 1,2 Hans-Arno Jacobsen 2 1 IBM T.J. Watson Research Center 2 Middleware Systems Research Group, University of Toronto April 2014 Mohammad Sadoghi

More information