Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases
|
|
- Jocelyn Spencer
- 5 years ago
- Views:
Transcription
1 Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Boon-Siew Seah 1, Klarinda G. Widjanarko 1, Sourav S. Bhowmick 1, Byron Choi 1 Erwin Leonardi 1, 1 School of Computer Engineering, Nanyang Technological University, Singapore Singapore-MIT Alliance, Nanyang Technological University, Singapore { ,klarinda,assourav,kkchoi,lerwin}@ntu.edu.sg May 3, 007 Abstract In this paper, we present a novel ordered XPATH evaluation in tree-unaware RDBMS. The novelties of our approach lies in the followings. (a) We propose a novel XML storage scheme which comprises only leaf nodes, their corresponding data values, order encodings and their root-to-leaf paths. (b) We propose an algorithm for mapping ordered XPATH queries into SQL queries over the storage scheme. (c) We propose an optimization technique that enforces all mapped SQL queries to be evaluated in a left-to-right join order. By employing these techniques, we show, through a comprehensive experiment, that our approach not only scales well but also performs better than some representative tree-unaware approaches on more than 65% of our benchmark queries with the highest observed gain factor being In addition, our approach reduces significantly the performance gap between tree-aware and tree-unaware approaches and even outperforms a state-of-the-art tree-aware approach for certain benchmark queries. 1 Introduction With the rapid emergence of XML as the de facto standard for exchanging data on the Web, the interest in efficiently querying growing XML data sources has increased. One of the salient features of XML data is that it is order-sensitive. Supporting an ordered data model of XML as well as ordered XML queries, ordered XPATH axes and position predicates in particular, have been the key to successful XML applications, e.g., [1]. In this paper, we present a novel approach to efficiently evaluate ordered XPATH queries in a relational database. Current approaches for evaluating XPATH expressions in relational databases can be arguably categorized into two representative types. They either resort to encoding XML data as tables and translating XML queries into relational queries [3, 4, 5, 6, 10, 11, 15] or store XML data as a rich data type and process XML queries by enhancing the relational infrastructure [9]. The former approach can further be classified into two representative types. Firstly, a host of work on processing XPATH queries on tree-unaware relational databases has been reported [5, 10, 11] these approaches do not modify the database kernels. Secondly, there have been several efforts on enabling relational databases to be tree-aware by invading the database kernel to implement XML support [3, 4, 6, 15]. It has been shown that the latter approaches appear scalable and, in particular, perform orders of magnitude faster than some tree-unaware approaches [3, 6]. In this paper, we focus on supporting ordered XPATH evaluation in a tree-unaware relational environment. There is a considerable benefit in such an approach with respect to portability and ease of implementation on top of an off-the-shelf RDBMS. Although a diverse set of strategies for evaluating XML queries in tree-unaware relational environment have been recently proposed, few have undertaken a comprehensive study on evaluating ordered XPATH queries. Tatarinov et al. [1] is the first to show that it is indeed possible to support ordered
2 XPATH queries in relational databases. However, this approach does not scale well with large XML documents. In fact, as we shall show in Section 7, the GLOBAL-ORDER approach in [1] failed to return results for 0% of our benchmark queries on 1GB dataset in 60 minutes. Furthermore, this approach resorts to manual tuning of the relational optimizer when it failed to produce good query plans. Although such a manual tuning approach works, it is a cumbersome solution. In this paper, we address the above limitations by proposing a novel scheme for ordered XPATH query processing. Our storage strategy is built on top of SUCXENT++ [10], by extending it to support efficient processing of ordered axes and predicates. SUCXENT++ is designed primarily for query-mostly workloads. We exploit SUCXENT++ s strategy to store leaf nodes, their corresponding data values, auxiliary encodings and root-to-leaf paths. In contrast, some approaches, e.g., [6, 15], explicitly store information for all nodes of an XML document. Specifically, the followings remark the novelties of our storage scheme. (1) For each level of an XML document, we store an attribute called RValue which is an enhancement of the original RValue, proposed in [10], for processing recursive XPATH queries. () For each leaf node we store three additional attributes namely BranchOrder, DeweyOrderSum and SiblingSum. These attributes are the foundation for our ordered XPATH processing. The key features of these attributes are that they enable us (a) to compare the order between non-leaf nodes by comparing the order between their first descendant leaf nodes only; and (b) to determine the nearest common ancestor of two leaf nodes efficiently. As a result, it is not necessary to store the order information of non-leaf nodes. Furthermore, given any pair of nodes, these attributes enable us to evaluate position-based predicates efficiently. As highlighted in [1], relational optimizers may sometimes produce poor query plans for processing XPATH queries. In this paper, we undertake a novel strategy to address this issue. As opposed to manual tuning efforts, we propose an automatic approach to enforce the optimizer to replace previously generated poor plans with probably better query plans, as verified by our experiments. Unlike tree-aware schemes, our technique is non-invasive in nature. That is, it can easily be incorporated without modifying the internals of relational optimizers. Specifically, we enforce a relational optimizer to follow a left-to-right join order and enforce the relational engine to evaluate the mapped SQL queries according to the XPATH steps specified in the query. The good news is that this technique can select better plans for the majority of our benchmark queries across all benchmark datasets. As we shall see in Section 7, the performance of previously-inefficient queries in SUCXENT++ is significantly improved. The highest observed gain factor is 59. Furthermore, queries that failed to finish in 60 minutes were able to do so now, in the presence of such a join-order enforcement. This is indeed stimulating as it shows that some sophisticated internals of relational optimizers not only are irrelevant to XPATH processing but also often confuse XPATH query optimization in relational databases. Overall a joinorder-conscious SUCXENT++ significantly outperforms both GLOBAL-ORDER and SHARED-INLINING[11] in at least 65% of the benchmark queries with the highest observed gain factors being 1939 and 880, respectively. To the best of our knowledge, this is the first effort on exploiting a non-invasive automatic technique to improve query performance in the context of XPATH evaluation in relational environment. Recently, [3] showed that MONETDB is among the most efficient and scalable tree-aware relational-based XQuery processor and outperforms the current generation of XQuery systems significantly. Consequently, we investigated how our proposed technique compared to MONETDB. Our study revealed some interesting results. First, although MONETDB is and 3-74 times faster than GLOBAL-ORDER and SHARED-INLINING, respectively, for the majority of the benchmark queries, this performance gap is significantly reduced when MONETDB is compared to SUCXENT++. Our results show that not only MONETDB is now times faster than SUCXENT++ with join-order enforcement but surprisingly our approach is faster than MONETDB for 33% of benchmark queries! Additionally, MONETDB (Win3 builds) failed to shred 1GB dataset as it is vulnerable to the virtual memory fragmentation in Windows environment. Note that, this is in contrary to the results in [3] where MONETDB was built on top of Linux.6.11 operating system (8GB RAM), using a 64-bit address space, and was able to efficiently shred 11GB dataset. In summary, the main contributions of this paper are as follows. In Section 4, we describe our novel schemaoblivious relational storage scheme for XML. In Section 5, we present how ordered XPATH queries are supported in our proposed storage scheme. In Section 6, we proposed a novel left-to-right join order-based technique to
3 Document (DocId, Name) Path (PathId, PathExp) PathValue(DocId, PathId, LeafOrder, BranchOrder, BranchOrderSum, LeafValue) DocumentRValue (DocId, Level, RValue) (a) Original Schema of SUCXENT++ Document (DocId, Name) Path (PathId, PathExp) PathValue(DocId, DeweyOrderSum, PathId, BranchOrder, LeafOrder, SiblingSum, LeafValue) Attribute (DocId, LeafOrder, PathId, LeafValue) DocumentRValue (DocId, Level, RValue) (b) Modified Schema of SUCXENT++ Mod Level M RVal RVal Book Catalog 1 3 Title Chapter Chapter D1 = 0 * 1 Para Para D = 3 D3 = 4 preceding * - number representing local order of the node Di = DeweyOrderSum 1 D4 = 6 A The context node A The leaf node that represents the context node (c) XML Data Para 1 D5 = 19 Book descendant-or-self D6 = 3 Book D7 = 38 following Figure 1: SUCXENT++ schema and example of XML data improve query plan selection of relational query optimizers. To the best of our knowledge, this is the first effort on exploiting such non-invasive automatic technique to improve query performance in the context of XPATH processing in relational environment. Through an extensive experimental study in Section 7, we show that our approach significantly outperforms existing tree-unaware approaches for ordered XPATH queries. Additionally, our approach reduces significantly the performance gap between tree-aware and tree-unaware approaches and even outperform a state-of-the-art tree-aware approach for certain benchmark queries. Related Work Most of the previous tree-unaware approaches, except [1], focused on proposing efficient evaluation for children and descendant-or-self axes and positional predicates in XPATH queries. In this paper, the main focus is on the evaluation for following, preceding, following-sibling, and preceding-sibling axes as well as position-based and range predicates. All previous approaches, reported query performance on small/medium XML documents smaller than 500 MB. We investigate query performance on large synthetic and real datasets. This gives insights on the scalability of the state-of-the-art tree-unaware approaches for ordered XML processing. Compared to the tree-aware schemes [3, 4, 6, 15], our technique is tree-unaware in the sense that it can be built on top of any commercial RDBMS without modifying the database kernel. The approaches in [4, 15] do not provide a systematic and comprehensive effort for processing ordered XPATH queries. Although the scheme presented in [3, 4, 6] can support ordered axes, no comprehensive performance study has demonstrated with a variety of ordered XPATH queries. Furthermore, these approaches did not exploit the left-to-right join order technique to improve query plan selection. In [1], Tatarinov et al. proposed the first solution for supporting ordered XML query processing in a relational database. A modified EDGE table [5] was the underlying storage scheme. They described three order encoding methods: global, local, and dewey encodings. The best query performance was achieved with the global encoding for query-mostly workloads and with dewey encoding for a mix of queries and updates. Our focus differs from the above approach in the following ways. First, we focus on query-mostly workloads. Second, we consider a novel order-conscious storage scheme that is more space- and query-efficient and scalable when compared to the global encoding. 3 Background on SUCXENT++ Our approach for ordered XPATH processing relies on the SUCXENT++ approach [10]. We begin our discussion by briefly reviewing the storage scheme of SUCXENT++. Foremost, in the rest of the paper, we always assume 3
4 Path PathId PathExp 3.catalog#.book#.chapter#.para# 4.catalog#.book#.chapter# 5.catalog#.book# PathValue DocId Leaf Order Branch Order PathId Dewey OrderSum Sibling Sum Leaf Value D D D D D D D7 DocRValue DocId Level RValue Attribute DocId LeafOrder PathId LeafValue book book book 03 Figure : XML data in RDBMS document order in our discussions. The SUCXENT++ schema is shown in Figure 1(a). Document stores the document identifier DocId and the name Name of a given input XML document T. We associate each distinct (root-to-leaf) path appearing in T, namely PathExp, with an identifier PathId and store this information in Path table. For each leaf node n in T, we shall create a tuple in the PathValue table. We now elaborate the meaning of the attributes of this relation. Given two leaf nodes n 1 and n, n 1.LeafOrder < n.leaforder iff n 1 precedes n. LeafOrder of the first leaf node in T is 1 and n.leaforder = n 1.LeafOrder+1 iff n 1 is a leaf node immediately preceding n. Given two leaf nodes n 1 and n where n 1.LeafOrder+1 = n.leaforder, n.branchorder is the level of the nearest common ancestor of n 1 and n. That is, n 1 and n intersect at the BranchOrder level. The data value of n is stored in n.leafvalue. To discuss BranchOrderSum and RValue, we introduce some auxiliary definitions. Consider a sequence of leaf nodes C: n 1, n, n 3,..., n r in T. Then, C is a k-consecutive leaf nodes of T iff (a) n i.branchorder k for all i [1,r]; (b) If n 1.LeafOrder > 1, then n 0.BranchOrder < k where n 0.LeafOrder+1 = n 1.LeafOrder; and (c) If n r is not the last leaf node in T, then n r+1.branchorder < k where n r.leaforder+1 = n r+1.leaforder. A sequence C is called a maximal k-consecutive leaf nodes of T, denoted as M k, if there does not exist a k-consecutive leaf nodes C and C < C. Let L max be the largest level of T. Then, RValue of level l, denoted as R l, is 1 if l = L max. Otherwise, R l = R l+1 M l Now we are ready to define the BranchOrderSum attribute. Let N to be the set of leaf nodes preceding a leaf node n. n.branchordersum is 0 if n.leaforder = 1 and m N R m.branchorder otherwise. Based on the definitions above, Prakash et al. [10] defined Property 1 (below) which is essential to determine ancestor-descendant relationships efficiently. Property 1 Given two leaf nodes n 1 and n, n 1.BranchOrderSum - n.branchordersum < R l implies the nearest common ancestor of n 1 and n is at a level greater than l. 4 Extensions of SUCXENT++ To support ordered XML queries, the order information of nodes must be captured in the XML storage scheme. Unfortunately the LeafOrder and BranchOrderSum attributes only encode the global order of all leaf nodes. Since (order) information of non-leaf nodes is not explicitly stored, it must be derived from the attributes of leaf nodes. We now present how the original SUCXENT++ schema is extended to process ordered XPath queries efficiently. First, we move information related to attribute nodes from PathValue table to a new Attribute table. Second, we introduce new attributes to encode the relative order between (both non-leaf and leaf) nodes. Specifically, DeweyOrderSum and SiblingSum are introduced to replace BranchOrderSum. Second, the definition of RValue is modified such that RValue and DeweyOrderSum preserve the properties presented [10]. The modified schema is shown in Figure 1(b) and Figure shows the shredded version of the example XML document. 4
5 Algorithm to Compute DeweyOrderSum 01 for each non-attribute leaf node n i arranged in document order { 0 if (n i.branchorder > 0) { 03 n i.deweyordersum = dsum[ n i.branchorder + 1 ] + ModRValue( n i.branchorder ); 04 for ( level = n i.branchorder + 1; level <= maxdepthofxmldoc; level++ ) 05 dsum[level] = n i.deweyordersum; 06 } 07 else { //initialization for the first leaf node 08 n i.deweyordersum = 0; 09 for ( level = 1; level <= maxdepthofxmldoc; level++ ) 10 dsum[level] = 0; 11 } 1 } Figure 3: Algorithm to compute DeweyOrderSum 4.1 Attribute Table The PathValue table originally stored information related to both element and attribute nodes. However, to avoid mixing the order of element and attribute nodes, we separate the attribute nodes into Attribute table. The Attribute table consists of the following columns: DocId, LeafOrder, PathId, LeafValue. As we shall see later, a non-leaf node can be represented by the first descendant leaf nodes. Therefore, an attribute node is identified by DocId and LeafOrder of its parent node and its PathId. 4. Modified RValue Attribute Conceptually, RValue is used to encode the level of the nearest common ancestor of any pairs of leaf nodes. To ensure a property like Property 1 holds after modifications, intuitively, we magnify the gap between RValues, as shown in Definition 1. Relative order information is then captured in these gaps. Definition 1 [ModifiedRValue] Let L max be the largest level of an XML tree T. ModifiedRValue of level l, denoted as R l, is defined as follows: (i) If l = L max 1 then R l = 1; (ii) If 0 < l < L max 1 then R l = R l+1 M l For example, consider the XML tree shown in Figure 1(c). L max = 4. The values of M 1, M, and M 3 are 6, 3, and 1, respectively. Then, R 3 = 1, R = 1 M = 3, and R 1 = 3 M + 1 = 19. To ensure the evaluation of queries other than ordered XPATH queries is not affected by the above modifications, the RValue attribute in DocumentRValue stores R l instead of R l. 4.3 DeweyOrderSum and SiblingSum Attributes Next, we define the first attribute related to ordered XPATH processing. Consider the path query /catalog/book[1]/chapter[1] and Figure 1(c). Since only leaf nodes are stored in the PathValue table, the new attribute DeweyOrderSum of leaf nodes captures order information of the non-leaf nodes. At first glance, a simple representation of the order information could be a Dewey path. For instance, the Dewey path of the first chapter node of the first book node is However, using such Dewey paths has two major drawbacks. Firstly, string matching of Dewey paths can be computationally expensive. Secondly, simple lexicographical comparisons of two Dewey paths may not always be accurate [1]. Comparing 1. and 10. in lexicographical order will indicate that 10. appears before 1. [1]. Hence, we define DeweyOrderSum for this purpose: Definition [DeweyOrderSum] Consider an XML document T and a leaf node n at level l in T. Ord(n, k) = i iff a is either an ancestor of n or n itself; k is the level of a; and a is the i-th child of its parent. DeweyOrderSum of n, n.deweyordersum, is defined as l j= Φ(j) where Φ(j)=[Ord(n, j)-1] R j 1. 5
6 Algorithm to Compute SiblingSum 01 for each non-attribute leaf node n i arranged in document order { 0 if( n i.branchorder > 0 ) { 03 sord = siblingorder.order( n i.branchorder + 1, elementname[n i.branchorder + 1] ); //siblingorder.order(level, elementname) keep tracks of the same-sibling order //for a node, given the level and the element name for that level 04 n i.siblingsum = ssum[ n i.branchorder ] + (sord-1) * ModRValue( n i.branchorder ); 05 for ( level = n i.branchorder + 1; level <= maxdepthofxmldoc; level++) 06 ssum[level] = n i.siblingsum; 07 for ( level = n i.branchorder + ; level <= n i.depth; level++) 08 siblingorder.order(level, elementname[level]); 09 } 10 else { //initialization for the first leaf node 11 n i.siblingsum = 0; 1 for ( level = 1; level <= maxdepthofxmldoc; level++ ) 13 ssum[level] = 0; 14 for ( level = 1; level <= n i.depth; level++ ) 15 siblingorder.order(level, elementname[level]); 16 } 17 } Figure 4: Algorithm to compute SiblingSum For example, consider the rightmost chapter node in Figure 1(c) which has a Dewey path 1... Using the ModifiedRValue values derived previously, the DeweyOrderSum of this node can then be calculated as follows: n.deweyordersum = (Ord(n, ) 1) R 1 + (Ord(n, 3) 1) R = =. Figure 3 shows the algorithm to derive DeweyOrderSum during document shredding. dsum is an array to store the DeweyOrderSum for each level with respect to the current node n i. Since n i.branchorder is the level of the nearest common ancestor between the current node n i and the previous node, it implies that the local order of current node n i at level BranchOrder + 1 is increased by 1 and the local order of n i at level BranchOrder remains unchanged. Therefore, n i.deweyordersum equals to dsum[branchorder + 1] + ModifiedRValue(BranchOrder) (line 03). dsum is also updated accordingly (lines 04-05). Note that DeweyOrderSum is not sufficient to compute position-based predicates with QName name tests, e.g., chapter[]. Hence, the SiblingSum attribute is introduced to the PathValue table. Definition 3 [SiblingSum] Consider an XML document T and a leaf node n at level l in T. Sibling(n, k) = i iff a is either an ancestor of n or n itself; k is the level of a; and the i-th τ-child of its parent (τ is the tag name of a). SiblingSum of n, n.siblingsum, is l j= Ψ(j) where Ψ(j) = [Sibling(n, j)-1] R j 1. SiblingSum encodes the local order of nodes which are with the same tag name of n, namely same-tag-sibling order. For example, consider the children of the first book element in Figure 1(c). The local orders of title and the first and second chapter nodes are 1, and 3, respectively. On the other hand, the same-tag-sibling order of these nodes are 1, 1 and, respectively. The algorithm to compute SiblingSum is shown in Figure 3. SiblingOrder.Order(level, elementname) is used to calculate Sibling(n i, k) for the current node n i where k = level and τ = elementname. n i.siblingsum equals to ssum[branchorder] + [Sibling(n i, n i.branchorder+1) - 1] ModifiedRValue(BranchOrder) (line 04). 4.4 Preservation of SUCXENT++ s Features The above modifications do not adversely affect the document reconstruction process and efficient evaluation of non-ordered XPATH queries, as discussed in [10]. Recall that given a pair of leaf nodes, Property 1 was used in [10] to efficiently determine the nearest common ancestor of the nodes. Since we have modified the definition of RValue and replaced the BranchOrderSum attribute with the DeweyOrderSum attribute, this property is not applicable to the extended SUCXENT++ scheme. It is necessary to ensure that a corresponding property holds in the extended system. LEMMA 1 l j=k Φ(j) R k 1 where Φ(j) =[Ord(n, j)-1] R j 1, k (,l] and n is a leaf node in an XML document at level l. 6
7 Based on the above lemma, it is straightforward to show that l j=k Φ(j) < R k. Theorem 1 Let n 1 and n be two leaf nodes in an XML document. If R l n 1.DeweyOrderSum - n.deweyordersum < R l then the level of the nearest common ancestor of n 1 and n is l + 1. For example, consider the second leaf node in Figure 1(c). DeweyOrderSum of this node is 3. Let D 1 be the DeweyOrderSum of leaf nodes that have nearest common ancestor at level. Using the above theorem, D 1 falls within the following range: (R 1)/ + 1 D 1 3 < (R 1 1)/ + 1 D 1 3 < 10 which returns the first and fourth leaf nodes (DeweyOrderSum = 0 and 6, respectively). Let D be the DeweyOrderSum of leaf nodes that have nearest common ancestor at level 3. D falls within the following range: (R 3 1)/ + 1 D 3 < (R 1)/ D 3 < which returns the third leaf node (DeweyOrderSum = 4). Now let say we want to get the leaf nodes that have nearest common ancestor at level or deeper and let D 3 be the DeweyOrderSum of these nodes. D 3 falls within the following range: D 3 3 < (R 1 1)/ + 1 D 3 3 < 10 which returns the first four leaf nodes. We illustrate Theorem 1 further with XQUERY example. Consider the following XQUERY on the XML tree in Figure 1(c). FOR RETURN $b IN document( catalog )/catalog/book[1] $b/chapter/para Let D a be DeweyOrderSum of the first leaf node satisfying /catalog/book[1], which is the first title node. The RETURN clause implies the path /catalog/book/chapter/para. In this particular case, the nearest common ancestor between para nodes and book node is at level. This implies that the nearest common ancestor between para nodes and the first leaf node satisfying book node is at level or deeper. Let D b be DeweyOrderSum of the resulting nodes that satisfy both paths. Since the level of intersection is or deeper, D b D a < (R 1 1)/ + 1. From Figure 1(c), D a = 0 and R 1 = 19. Hence, nodes in the query result set must satisfy the inequality: D b 0 < (19 1)/ + 1. Note that DeweyOrderSum of the second and third leaf nodes (para nodes) are 3 and 4. Since 3 0 < 10 and 4 0 < 10, these nodes satisfies the above query. Whereas, the fifth leaf node whose DeweyOrderSum is 19 does not satisfy the above query. The proofs of the lemma, theorems and propositions are given in Appendix A. 5 Ordered XPath Processing This section describes how ordered XPATH queries are supported by the modified schema. First, we propose a method of node order comparison in the absence of non-leaf nodes. Next, we show how ordered XPATH queries are supported in detail. Finally, we present a translation algorithm of ordered XPATH queries and SQL. 5.1 Non-leaf Node Order Comparison Our strategy for comparing the order of non-leaf nodes is based on the following observation. If node n 0 precedes (resp. follows) another node n 1, then descendants of n 0 must also precede (resp. follow) the descendants of n 1. Therefore, instead of comparing the order between non-leaf nodes, we compare the order between their descendant leaf nodes. For this reason, we define a representative leaf node of a non-leaf node n to be its first descendant leaf node. Note that the BranchOrder attribute records the level of the nearest common ancestor of two consecutive leaf nodes. Let C be the sequence of descendant leaf nodes of n and n 1 be the first node in C. We know that the nearest common ancestor of any two consecutive nodes in C is also a descendant of node n. This implies (1) except n 1, BranchOrder of a node in C is at least the level of node n and () the nearest common ancestor of n 1 and its immediately preceding leaf node is not a descendant of node n. Therefore, BranchOrder of n 1 is always smaller than the level of n. We summarize this property in Property. 7
8 Property Let n be a non-leaf node at level l and C = n 1, n, n 3,..., n r be the sequence of descendant leaf nodes of n in document order. Then, n 1.BranchOrder < l and n i.branchorder l, where i (1,r]. Definition 4 [DeweyOrderSum of non-leaf nodes] Let S = i 1, i, i 3,..., i r1 be a sequence of non-leaf sibling nodes of a non-leaf node i 0 in document order. Let C = n 1, n,..., n r be the sequence of leaf nodes of S and n j is denoted as the first descendant leaf node of i j1. Then, i j1.deweyordersum = n j.deweyordersum. In the above definition, DeweyOrderSum of a leaf node is conceptually propagated to its ancestor nodes. Consequently, the following proposition holds. Proposition 1 Let C = n 1, n, n 3,..., n r be a sequence of sibling nodes. Consider n i where 1 < i r and the level of n i is l, where l > 1. Let m be n i or descendant of n i. Then, n 1.DeweyOrderSum+ [Ord(n i ) - Ord(n 1 )] R l 1 m.deweyordersum < n 1.DeweyOrderSum+ [(Ord(n i ) - Ord(n 1 ))+1] R l 1 where Ord(n i) and Ord(n 1 ) are the local order of n i and n 1, respectively. By using the above proposition, we can compare the order of two non-leaf nodes without evaluating every sibling nodes in the sequence. Also, since n 1 is the first sibling, Ord(n 1 ) = 1. Therefore, based on the above proposition, the following holds: n 1.DeweyOrderSum+ [Ord(n i )-1] R l 1 m.deweyordersum < n 1.DeweyOrderSum+ Ord(n i ) R l 1. Similar propositions for SiblingSum can be established in a straightforward manner. 5. Support for Ordered XPath Queries We now present how various types of ordered XPATH queries are supported by the modified SUCXENT++. Due to space constraints, we only focus on how DeweyOrderSum and ModifiedRValue are used for query processing. Similar technique can be applied to evaluations with SiblingSum. Position predicates. Position-based predicates, i.e., predicates of the form position()=i, select the node at the i-th position of the sequence of inner focus context nodes. We propose to compute the i-th node without evaluating every node in the sequence by applying Proposition 1. For example, suppose n 1 be the first book node of the sequence of book nodes (the context nodes) in Figure 1(c). Observe that n 1.DeweyOrderSum = 0 as its representative leaf node is the first leaf node of the XML tree. We now employ the inequality in Proposition 1 to select a sibling node, e.g., the second book node n. Here, Ord(n ) =, l =, R 1 = 19, and n 1.DeweyOrderSum = 0. Then, n.deweyordersum < n i.deweyordersum < 38. The nodes in this range are the descendant leaf nodes of n. Such simple arithmetic calculations can be efficiently implemented in a relational database. fn : last() can be computed by first determining all sibling nodes that satisfy the specific path and then finding the node with the largest DeweyOrderSum. Position predicate on child axes. This class of queries can be translated into a child axis followed by a position predicate, in which one must select the i-th child of the context node. Our strategy is to determine the first child of a context node and then the child s i-th sibling node as described above. First, by using Definition 4, we know that if n is the first child of n 1, then n 1.DeweyOrderSum = n.deweyordersum. Second, Proposition 1 provides us a method for selecting the i-th sibling node of a node. Reconsider Figure 1(c) and the XPATH query /catalog/*[]. The query result is the second child of catalog node. Suppose n 0 is the context node /catalog. Let n 1 be the first sibling node in the sequence returned by the expression /catalog/*. Then, n 0.DeweyOrderSum = n 1.DeweyOrderSum. Since the first sibling in that sequence is n 1 and all siblings of n 1 is in that sequence, we can now utilize Proposition 1 to select the leaf nodes of the second node in the context. The range operator, e.g., [position()= TO 10], can be easily handled in similar fashion. Following and preceding axes. following axis selects all nodes which follow the context node excluding the descendants of the context node. preceding axis, on the other hand, selects all nodes which precede 8
9 DeweyOrderSum D i - ((R L- - 1) / + 1) D i +R D L-1 i + ((R L- - 1) / + 1) D i preceding-sibling following-sibling preceding following D i : DeweyOrderSum of the context node L : level of the context node descendant-or-self Figure 5: Relationship between DeweyOrderSum and RValue. the context node excluding the ancestors of the context node. Similar to position predicates, we summarize a property of DeweyOrderSum to facilitate efficient processing of these axes. Proposition Let n a and n b be two nodes in the XML tree T and n b is a context node at level l b where l b > 1. Then, the following statements hold: 1. n a.deweyordersum n b.deweyordersum+r l b 1 if and only if n a follows n b and is not a descendant of n b ;. Similarly, n a.deweyordersum < n b.deweyordersum if and only if n a precedes n b and n a is neither a descendant nor an ancestor of n b. We illustrate this proposition in Figure 5. Now let us consider the following examples. Suppose that we evaluate the following axis on the first book node n b in Figure 1(c). Here, n b.deweyordersum = 0, l = and R 1 = 19. Let N be the nodes in the result of the evaluation of following axis. Then, by using Proposition, n N must satisfy this inequality: n.deweyordersum Similarly, suppose we evaluate the preceding axis on the last book node n b in Figure 1(c). n b.deweyordersum = 38. Denote the sequence of nodes satisfying the preceding axis to be N. Then n N must satisfy the following inequality: n.deweyordersum < 38. Another example can be found in Figure 1(c). Note that since SUCXENT++ only stores the leaf nodes, returning the internal nodes and the whole subtree as required by following::* and preceding::* axis require extra processing as the resulting XML document may have a very different structure or schema than the original XML document. This is achieved in SUCXENT++ during the result construction phase. Observe that, in our query, if we use QName as the name test (for example following::title), and the path from the root to QName element is unique then no extra processing is required. If the level of the QName is greater than the level of the context node (e.g. /catalog/book[]/preceding::title in Figure 1(c)), then we can use Proposition and PathId to return the resulting nodes. The same also applies if the level of the QName equals to the level of the context node (e.g. /catalog/*[1]/following::book in Figure 1(c)). However, if the level of the QName is less than the level of the context node (e.g. /catalog/*[1]/chapter/para/following::chapter in Figure 1(c)), we need to use Theorem 1 to exclude the nodes that have common ancestor at QName level or deeper. These three cases are illustrated in Figure 6. Following-sibling and preceding-sibling axes. following-sibling axis selects the children of the context node s parent that occur after the context node in document order whereas preceding-sibling axis selects the children of the context node s parent that occur before the context node in document order. Support for following-sibling (resp. preceding-sibling) axis can be achieved with an additional constraint on the following (resp. preceding) axis the selected nodes must be siblings of the context node. Proposition 3 Let n a and n b be two nodes in the XML tree T and n b is the context node at level l b where l b >. Then, the following statements hold: 9
10 L C QName QName L Q preceding following preceding::qname following::qname (a) Case 1: Qname level (L Q) > context node level (L C) QName QName L Q = L C preceding following preceding::qname following::qname (b) Case : Qname level (L Q) = context node level (L C) QName QName QName L Q L C preceding following preceding::qname following::qname (a) Case 3: Qname level (L Q) < context node level (L C) The context node The leaf node that represents the context node Figure 6: following::qname 1. n b.deweyordersum +R l b 1 n a.deweyordersum < n b.deweyordersum +(R l b 1)/ + 1 if and only if n a is a sibling of n b and n a follows n b.. n b.deweyordersum (R l b 1)/ 1 < n a.deweyordersum < n b.deweyordersum if and only if n a is a sibling of n b and n a precedes n b. The above proposition is illustrated in Figure 5. Now let us consider the following examples. Suppose we evaluate the following-sibling axis on the first title node n t in Figure 1(c). Here n t.deweyordersum = 0, l = 3, R 1 = 19, and R = 3. Denote N to be the nodes reachable via the following-sibling axis from n t. Using Proposition 3, n k.deweyordersum < 0 + (19 1)/ + 1 where n k N. That is, 3 n k.deweyordersum < 10. Hence, the second (DeweyOrderSum = 3) and the third (DeweyOrderSum = 6) chapter nodes are in this range. Now, suppose we evaluate the preceding-sibling axis at the last chapter node n c in Figure 1(c). Here n c.deweyordersum =. Let N be the nodes which satisfy the preceding-sibling axis. Therefore, (19 1)/ 1 < n r.deweyordersum < where n r N. That is, 1 < n r.deweyordersum <. Hence, the chapter node with DeweyOrderSum = 19 satisfies this bound. Another example can be found in Figure 1(c). 10
11 processpathexpr (XPath) 01 for every step in the XPath { 0 if (step.getaxis() == CHILD and step.haspredicate() == FALSE) 03 currentpath.add(nametest, step.getaxis()) 04 else { 05 from_sql.add("pathvalue as Vi") 06 if(currentpath.level() > 1) { where_sql.add("vi.pathid in currentpath.getpathid()") where_sql.add("vi.branchorder < currentpath.level()") 09 } 10 processaxis(step, currentpath) 11 processpredicate(step, currentpath) 1 } 13 if (step.islast() and currentpath.needupdate()) { from_sql.add("pathvalue as Vi") where_sql.add("vi.pathid in currentpath.getpathid()") 16 } 17 } 18 select_sql.add("vi.leafvalue, Vi.leafOrder,... ") 19 return select_sql + from_sql + where_sql + where_sql.unionwithattribute() (a)the processpathexpr Algorithm processaxis (step, currentpath) 01 switch (step.getaxis()){ 0 child: 03 where_sql.add("vi.deweyordersum BETWEEN Vi-1.DeweyOrderSum - RValue(currentPath.level() - 1) + 1 AND Vi-1.DeweyOrderSum + RValue(currentPath.level() - 1) - 1 ") 04 following: 05 where_sql.add("vi.deweyordersum >= Vi-1.DeweyOrderSum + * RValue(currentPath.level()) - 1 ") 06 if (currentpath.qnamelevel(nametest) < currentpath.level() 07 where_sql.add("+ RValue(currentPath.QNameLevel(nametest) - 1) - 1") 08 preceding: 09 where_sql.add("vi.deweyordersum < Vi-1.DeweyOrderSum ") 10 if (currentpath.qnamelevel(nametest) < currentpath.level() 11 where_sql.add("- RValue(currentPath.QNameLevel(nametest) - 1) + 1") 1 following-sibling: 13 where_sql.add("vi.deweyordersum BETWEEN Vi-1.DeweyOrderSum + * RValue(currentPath.level()) - 1 AND Vi-1.DeweyOrderSum + RValue(currentPath.level() - 1) - 1 ") 14 preceding-sibling: 15 where_sql.add("vi.deweyordersum BETWEEN Vi-1.DeweyOrderSum - RValue(currentPath.level() - 1) + 1 AND Vi-1.DeweyOrderSum - 1 ") 16 } 17 currentpath.add(nametest, step.getaxis()) (b)the processaxis Algorithm Figure 7: Procedure processpathexpr and Procedure processaxis. processpredicate (step, currentpath) 01 switch (step.getaxis()) { 0 CHILD: 03 n_from = step.getpredicatefrom() n_to = step.getpredicateto() 05 FOLLOWING-SIBLING: 06 n_from = step.getpredicatefrom() 07 n_to = step.getpredicateto() PRECEDING-SIBLING: 09 n_from = - step.getpredicatefrom() 10 n_to = - step.getpredicateto() } 1 switch (step.getpredicatetype()){ 13 position based predicate without name test: 14 where_sql.add("v i.deweyordersum BETWEEN V i-1.deweyordersum + n_from * ( * RValue(currentPath.level()) - 1) AND V i-1.deweyordersum + n_to * ( * RValue(currentPath.level()) - 1) - 1 ") 15 position based predicate with name test: 16 where_sql.add("v i.siblingsum BETWEEN V i-1.siblingsum + n_from * ( * RValue(currentPath.level()) - 1) AND V i-1.siblingsum + n_to * ( * RValue(currentPath.level()) - 1) - 1 ") 17 } Figure 8: Procedure processpredicate. 5.3 Ordered XPath Query Translation Algorithm Based on the properties defined in the previous subsection, we present an algorithm, shown in Figures 7 and 8, for generating SQL from ordered XPATH queries. Our algorithm assumes an XPATH expression is represented as a sequence of steps where a step may be associated with predicates. A SQL statement consists of three clauses: select sql, from sql and where sql. We assume that a clause has an add() method which encapsulates some simple string manipulations and simple SUCXENT++ joins for constructing valid SQL statements. In addition to preprocessing PathId as mentioned in [10], for a single XML document, we also preprocess RValue to reduce the number of joins. The translation consists of three main procedures. processpathexpr (Figure 7(a)): It analyzes the steps of an input XPATH expression (Line 01) and outputs a SQL statement. If the step consists of a child axis only (Lines 0-03), then we simply maintain a global variable currentp ath which records the simple downward path from the root to the context nodes. 1 Otherwise, when the step involves ordered predicates/other axes, we add predicates which select a superset of the next context nodes (Lines 05-09) and then call processaxis and processpredicate (Lines 10-11) with currentp ath to obtain the next context nodes. We add predicates in Lines 08 to determine the representative nodes of the context nodes. Finally, we collect the final results (Line 19). processaxis (Figure 7(b)): This procedure translates a step, together with currentp ath, based on the step type (Line 01). Lines 0-03, and 1-15 encode Theorem 1, Proposition and Proposition 3, respectively. 1 The details for maintaining currentp ath is simple but lengthy. For simplicity, we omitted such discussions. 11
12 01 WITH V (leafvalue, pathid, branchorder, DeweyOrderSum, DocId, LeafOrder ) AS ( 0 SELECT V.leafValue, V.pathID, V.branchOrder, V.DeweyOrderSum, V.DocId, V.LeafOrder 03 FROM PathValue V1, PathValue V 04 WHERE V1.docId = 1 05 AND V1.SiblingSum BETWEEN * ( * 10-1) AND * ( * 10-1) AND V1.pathid in (5,4,3,) 07 AND V1.branchOrder < 08 AND V.docId = V1.docId 09 AND V.DeweyOrderSum BETWEEN V1.DeweyOrderSum AND V1.DeweyOrderSum AND V.pathid in (4,3,) 11 AND V.DeweyOrderSum BETWEEN V1.DeweyOrderSum + 1 * ( * - 1) AND V1.DeweyOrderSum + 3 * ( * - 1) ) 13 SELECT V.*, 1 AS Attr 14 FROM V 15 UNION ALL 16 SELECT A.leafValue, A.pathID, V.branchOrder, V.DeweyOrderSum, A.DocId, A.LeafOrder, 0 AS Attr 17 FROM Attribute A, V 18 WHERE A.DocId = V.DocId AND A.LeafOrder = V.LeafOrder 19 AND A.PathId in (0) 0 ORDER BY DocId, DeweyOrderSum, Attr Figure 9: SQL example: /catalog/book[1]/*[position()= to 3] 01 WITH V (leafvalue, pathid, branchorder, DeweyOrderSum, DocId, LeafOrder ) AS ( 0 SELECT DISTINCT V.leafValue, V.pathID, V.branchOrder, V.DeweyOrderSum, V.DocId, V.LeafOrder 03 FROM PathValue V1, PathValue V 04 WHERE V1.docId = 1 05 AND V1.pathid in (4,3) 06 AND V1.branchOrder < 3 07 AND V.docId = V1.docId 08 AND V.DeweyOrderSum >= V1.DeweyOrderSum + * AND V.pathid in (5,4,3,) 11 ) 1 SELECT V.*, 1 AS Attr 13 FROM V 14 UNION ALL 15 SELECT A.leafValue, A.pathID, V.branchOrder, V.DeweyOrderSum, A.DocId, A.LeafOrder, 0 AS Attr 16 FROM Attribute A, V 17 WHERE A.DocId = V.DocId AND A.LeafOrder = V.LeafOrder 18 AND A.PathId in (1) 19 ORDER BY DocId, DeweyOrderSum, Attr Figure 10: SQL example: /catalog/book/chapter/following::book processpredicate (Figure 8(a)): This procedure mainly translates position predicates. Lines determine the range of position specified by the predicate. Given these, Lines 1-17 implement Proposition 1. We now illustrate the details of the translation algorithms with five examples related to translation of position-based predicates. Example 1 [Position-based predicates] Consider the path expression /catalog/book[1]/ *[position()= to 3]. The translated SQL is shown in Figure 9. /catalog/book[1] is translated to Lines Theorem 1 is used to get the children of /catalog/book[1] (lines 08-10), and /*[] is translated to Lines 11. Lines are used to union the resulting element nodes with the attribute nodes and the last line is to order the result by document order. Example [SQL for following axis] Consider the path expression /catalog/book/chapter/ following::book. The translated SQL is shown in Figure 10. /catalog/book/chapter is translated to Lines Proposition is used to get the following nodes (line 08) and since the level of book is higher then the level of /catalog/book/chapter, then Theorem 1 is used to exclude the nodes that have common ancestor at level or deeper (line 09). PathId is used to return only the book nodes (line 10). Example 3 [SQL for preceding axis] Consider the path expression /catalog/book/chapter/ preceding::book. The translated SQL is similar to Figure 10 except that the predicate in line 08 is 1
13 01 WITH V (leafvalue, pathid, branchorder, DeweyOrderSum, DocId, LeafOrder ) AS ( 0 SELECT DISTINCT V.leafValue, V.pathID, V.branchOrder, V.DeweyOrderSum, V.DocId, V.LeafOrder 03 FROM PathValue V1, PathValue V 04 WHERE V1.docId = 1 05 AND V1.SiblingSum BETWEEN * ( * 10-1) AND 0 + * ( * 10-1) AND V1.pathid in (5,4,3,) 07 AND V1.branchOrder < 08 AND V.docId = V1.docId 09 AND V.DeweyOrderSum >= V1.DeweyOrderSum + * AND V.pathid in (5,4,3,) 11 AND V.DeweyOrderSum BETWEEN V1.DeweyOrderSum + 1 * ( * 10-1) AND V1.DeweyOrderSum + * ( * 10-1) ) 13 SELECT V.*, 1 AS Attr 14 FROM V 15 UNION ALL 16 SELECT A.leafValue, A.pathID, V.branchOrder, V.DeweyOrderSum, A.DocId, A.LeafOrder, 0 AS Attr 17 FROM Attribute A, V 18 WHERE A.DocId = V.DocId AND A.LeafOrder = V.LeafOrder 19 AND A.PathId in (1) 0 ORDER BY DocId, DeweyOrderSum, Attr Figure 11: SQL example: /catalog/book[]/following-sibling::*[1] replaced with V.DeweyOrderSum < V1.DeweyOrderSum due to Proposition and line 09 is replaced with due to Theorem 1. Example 4 [SQL for following-sibling axis] Consider the path expression /catalog/book[]/ following-sibling::*[1]. The translated SQL is shown in Figure 11. /catalog/book[] is translated to Lines /following-sibling::* is translated to Lines Since the level of /catalog/book is, the translated SQL for following-sibling is similar to following axis (line 9). *[1] is translated to line 11. Example 5 [SQL for preceding-sibling axis] Consider the path expression /catalog/book[]/ preceding-sibling::*[1]. The translated SQL is similar to Figure 11 except that the predicate in line 09 is replaced with V.DeweyOrderSum < V1.DeweyOrderSum. 6 Join Order Enforcement Due to the tree-unaware nature of the underlying relational storage scheme as well as the lack of appropriate XML statistics, relational optimizers may generate inefficient query plans. In order to address this problem, some approaches have resorted to manual tuning of query plans [1] while others invade the database kernel to make it tree-aware [3, 4]. The former approach has not been scalable as it requires significant human intervention whereas the later approach may require non-trivial modifications of the internals of a RDBMS. In this section, we propose a simple yet effective technique to generate better query plans automatically without invading the database kernel. As discussed in Section 5.3, in order to evaluate an (ordered) XPATH query in SUCXENT++, each XPATH axis is translated into a join between the PathValue table and intermediate results (i.e., the context nodes). For example, in Figures 9-11, PathValue V1 returns the representative nodes of the context nodes to calculate PathValue V. Due to the lack of tree awareness, the relational optimizer is not capable of transforming the order of joins intelligently. Consequently, it may generate poor join order that typically requires caching large intermediate results in the database bufferpool. This is particularly important to NL joins, where large and deep loops are prohibitive. For example, the first few joins of a right-to-left join order may easily yield a large number of context nodes. To respond to this, we propose to enforce a left-to-right join order on the translated SQL query. Also, this evaluation order naturally corresponds to the order of XPATH steps specified in the XPATH expression. By employing this technique, the relational optimizer does not explore the large number of permutations of join order. We apply join order if the translated SQL query involves more than one PathValue 13
14 Total Number ID Node Attribute Total DC10 5,34 15,000 40,34 DC100,4,00 150,000,39,00 DC1000,44,61 1,500,000 3,94,61 DBLP 8,,945 1,665,930 9,888,875 (a) Features of Dataset Size (MB) Max Depth ID Query Res. Card. D1 /dblp/*[100000]/author D /dblp/article/author[] 190,838 D3 /dblp/*[600000]/pages/preceding-sibling::* 6 D4 /dblp/*[600000]/pages/following-sibling::* 5 (c) Benchmark queries for DBLP ID Query Res. Card. (10MB) Res. Card. (100MB) Res. Card. (1000MB) Q1 /catalog/item[1000] Q /catalog/*[1000] /catalog/item[position()=1000 to 10000]/ Q3 *[position()= to 7] 104,7 66,81 67,00 /catalog/item[position()=1000 to 10000]/authors/ Q4 author 65,161 39, ,350 (b) Benchmark queries for DC10, DC100, and DC1000 ID Query Res. Card. (10MB) Res. Card. (100MB) Res. Card. (1000MB) Q5 /catalog/*[1500]/publisher/following-sibling::* Q6 /catalog/*[1500]/publisher/following-sibling::*[5] Q7 /catalog/*[1500]/publisher/preceding-sibling::* Q8 /catalog/*[1500]/publisher/preceding-sibling::*[] Q9 /catalog/*[x]/following::title 50,500 5,000 Q10 /catalog/*[y]/preceding::title 49,499 4,499 X = 50, 500, 5000 for DC10, DC100, DC1000 respectively; Y = 50, 500, 5000 for DC10, DC100, DC1000 respectively Figure 1: Dataset and Benchmark Queries. relation. In addition, if the PathValue table appears in the SQL query only once, we let the relational optimizer to decide the plan for the join between the PathValue table and the Attribute table. The above enforcement can easily be implemented by query hints in commercial databases. Regarding our implementation, we use OPTION(FORCE ORDER) to implement the above technique in SUCXENT++. The strength of this approach lies in its simplicity in implementing on any commercial RDBMS that supports query hints. 7 Performance Study In this section, we present the results of our performance evaluation on our proposed approach, a tree-unaware schema-oblivious approach (GLOBAL-ORDER [1]), a tree-unaware schema-conscious approach (SHARED- INLINING [11]), and a tree-aware approach (MonetDB [3]). Prototypes for modified SUCXENT++ (denoted as SX), SUCXENT++ with join order enforcement (denoted as SX-JO), GLOBAL-ORDER (denoted as GO) and SHARED-INLINING (denoted as SI) were implemented with JDK 1.5. We used the Windows version of MON- ETDB/XQuery (denoted as MXQ) downloaded from The experiments were conducted on an Intel Xeon GHz machine running on Windows XP with 1GB of RAM. The RDBMS used was Microsoft SQL Server 005 Developer Edition. Note that we did not study the performance of XML support of SQL Server 005 as it can only evaluate the first two ordered queries in Figure 1(b). 7.1 Experimental Setup Data and query sets. In our experiments, XBENCH [13] dataset was used for synthetic data. Data-centric (DC) documents were considered with data sizes ranging from 10MB to 1GB. In addition, we used a real dataset, namely DBLP XML [16]. Figure 1 (a) shows the characteristics of the datasets used. Two sets of queries were designed to cover different types of ordered XPATH queries. In additional, the cardinality of the results was varied. Figures 1 (b) and 1 (c) show the benchmark queries on XBENCH and DBLP, respectively. XPATH queries with descendant axes were not included as they had been studied in [10]. Test methodology. The XPATH queries were executed in the reconstruct mode where not only the non-leaf nodes, but also all their descendants, were selected. Appropriate indexes were constructed for all approaches (except for MONETDB) through a careful analysis on the benchmark queries. Prior to our experiments, we ensured that statistics on relations were collected. The bufferpool of the RDBMS was cleared before each run. Each query was executed 6 times and the results from the first run were always discarded. 14
Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases
Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Boon-Siew Seah 1,2, Klarinda G. Widjanarko 1,2, Sourav S. Bhowmick 1,2, Byron Choi 1, and Erwin Leonardi 1,2
More informationEfficient Evaluation of Nearest Common Ancestor in XML Twig Queries Using Tree-Unaware RDBMS
Efficient Evaluation of Nearest Common Ancestor in XML Twig Queries Using Tree-Unaware RDBMS Klarinda G. Widjanarko Erwin Leonardi Sourav S Bhowmick School of Computer Engineering Nanyang Technological
More informationNon-Directional XPath Processing in a Tree-Unaware RDBMS
Non-Directional XPath Processing in a Tree-Unaware RDBMS Sourav S Bhowmick Curtis Dyreson Erwin Leonardi Zhifeng Ng School of Computer Engineering Nanyang Technological University, Singapore Department
More informationXandy: A Scalable Change Detection Technique for Ordered XML Documents Using Relational Databases
Xandy: A Scalable Change Detection Technique for Ordered XML Documents Using Relational Databases Erwin Leonardi Sourav S. Bhowmick School of Computer Engineering, Nanyang Technological University, Singapore
More informationEvaluating XPath Queries
Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But
More informationPart XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321
Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends
More informationCHAPTER 3 LITERATURE REVIEW
20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations
More informationXandy: Detecting Changes on Large Unordered XML Documents Using Relational Databases
: Detecting Changes on Large Unordered XML Documents Using Relational Databases Erwin Leonardi Sourav S. Bhowmick Sanjay Madria School of Computer Engineering Department of Computer Science Nanyang Technological
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationA Dynamic Labeling Scheme using Vectors
A Dynamic Labeling Scheme using Vectors Liang Xu, Zhifeng Bao, Tok Wang Ling School of Computing, National University of Singapore {xuliang, baozhife, lingtw}@comp.nus.edu.sg Abstract. The labeling problem
More informationPart XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440
Part XVII Staircase Join Tree-Aware Relational (X)Query Processing Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440 Outline of this part 1 XPath Accelerator Tree aware relational
More informationPathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data
PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg
More informationXML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9
XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2
More informationLabeling Dynamic XML Documents: An Order-Centric Approach
1 Labeling Dynamic XML Documents: An Order-Centric Approach Liang Xu, Tok Wang Ling, and Huayu Wu School of Computing National University of Singapore Abstract Dynamic XML labeling schemes have important
More informationAccelerating XML Structural Matching Using Suffix Bitmaps
Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,
More informationTreewidth and graph minors
Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under
More informationIndex-Driven XQuery Processing in the exist XML Database
Index-Driven XQuery Processing in the exist XML Database Wolfgang Meier wolfgang@exist-db.org The exist Project XML Prague, June 17, 2006 Outline 1 Introducing exist 2 Node Identification Schemes and Indexing
More informationOutline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014
Outline Gerênciade Dados daweb -DCC922 - XML Query Processing ( Apresentação basedaem material do livro-texto [Abiteboul et al., 2012]) 2014 Motivation Deep-first Tree Traversal Naïve Page-based Storage
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationOne of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while
1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling
More informationDistributed minimum spanning tree problem
Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with
More information4 Fractional Dimension of Posets from Trees
57 4 Fractional Dimension of Posets from Trees In this last chapter, we switch gears a little bit, and fractionalize the dimension of posets We start with a few simple definitions to develop the language
More informationFM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data
FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,
More informationIndexing XML documents for XPath query processing in external memory
Data & Knowledge Engineering xxx (2005) xxx xxx www.elsevier.com/locate/datak Indexing XML documents for XPath query processing in external memory Qun Chen a, *, Andrew Lim a, Kian Win Ong b, Jiqing Tang
More informationTwigList: Make Twig Pattern Matching Fast
TwigList: Make Twig Pattern Matching Fast Lu Qin, Jeffrey Xu Yu, and Bolin Ding The Chinese University of Hong Kong, China {lqin,yu,blding}@se.cuhk.edu.hk Abstract. Twig pattern matching problem has been
More informationFull-Text and Structural XML Indexing on B + -Tree
Full-Text and Structural XML Indexing on B + -Tree Toshiyuki Shimizu 1 and Masatoshi Yoshikawa 2 1 Graduate School of Information Science, Nagoya University shimizu@dl.itc.nagoya-u.ac.jp 2 Information
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationA Structural Numbering Scheme for XML Data
A Structural Numbering Scheme for XML Data Alfred M. Martin WS2002/2003 February/March 2003 Based on workout made during the EDBT 2002 Workshops Dao Dinh Khal, Masatoshi Yoshikawa, and Shunsuke Uemura
More information1 The size of the subtree rooted in node a is 5. 2 The leaf-to-root paths of nodes b, c meet in node d
Enhancing tree awareness 15. Staircase Join XPath Accelerator Tree aware relational XML resentation Tree awareness? 15. Staircase Join XPath Accelerator Tree aware relational XML resentation We now know
More informationBlossomTree: Evaluating XPaths in FLWOR Expressions
BlossomTree: Evaluating XPaths in FLWOR Expressions Ning Zhang University of Waterloo School of Computer Science nzhang@uwaterloo.ca Shishir K. Agrawal Indian Institute of Technology, Bombay Department
More informationXDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013
Assured and security Deep-Secure XDS An Extensible Structure for Trustworthy Document Content Verification Simon Wiseman CTO Deep- Secure 3 rd June 2013 This technical note describes the extensible Data
More informationFUTURE communication networks are expected to support
1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,
More informationDiversity Coloring for Distributed Storage in Mobile Networks
Diversity Coloring for Distributed Storage in Mobile Networks Anxiao (Andrew) Jiang and Jehoshua Bruck California Institute of Technology Abstract: Storing multiple copies of files is crucial for ensuring
More informationAn Extended Byte Carry Labeling Scheme for Dynamic XML Data
Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 5488 5492 An Extended Byte Carry Labeling Scheme for Dynamic XML Data YU Sheng a,b WU Minghui a,b, * LIU Lin a,b a School of Computer
More informationXML Data Stream Processing: Extensions to YFilter
XML Data Stream Processing: Extensions to YFilter Shaolei Feng and Giridhar Kumaran January 31, 2007 Abstract Running XPath queries on XML data steams is a challenge. Current approaches that store the
More informationAn approach to the model-based fragmentation and relational storage of XML-documents
An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible
More informationCSE 190D Spring 2017 Final Exam Answers
CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join
More informationNavigation- vs. Index-Based XML Multi-Query Processing
Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com
More informationMining Frequently Changing Substructures from Historical Unordered XML Documents
1 Mining Frequently Changing Substructures from Historical Unordered XML Documents Q Zhao S S. Bhowmick M Mohania Y Kambayashi Abstract Recently, there is an increasing research efforts in XML data mining.
More informationInteger Programming ISE 418. Lecture 7. Dr. Ted Ralphs
Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint
More informationConsistency and Set Intersection
Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study
More informationBottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases
Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases Yangjun Chen Department of Applied Computer Science University of Winnipeg Winnipeg, Manitoba, Canada R3B 2E9 y.chen@uwinnipeg.ca
More informationTRIE BASED METHODS FOR STRING SIMILARTIY JOINS
TRIE BASED METHODS FOR STRING SIMILARTIY JOINS Venkat Charan Varma Buddharaju #10498995 Department of Computer and Information Science University of MIssissippi ENGR-654 INFORMATION SYSTEM PRINCIPLES RESEARCH
More informationOn k-dimensional Balanced Binary Trees*
journal of computer and system sciences 52, 328348 (1996) article no. 0025 On k-dimensional Balanced Binary Trees* Vijay K. Vaishnavi Department of Computer Information Systems, Georgia State University,
More informationLecture 3 February 9, 2010
6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees
More informationXML Index Recommendation with Tight Optimizer Coupling
XML Index Recommendation with Tight Optimizer Coupling Technical Report CS-2007-22 July 11, 2007 Iman Elghandour University of Waterloo Andrey Balmin IBM Almaden Research Center Ashraf Aboulnaga University
More informationKikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML
Kikori-KS An Effective and Efficient Keyword Search System for Digital Libraries in XML Toshiyuki Shimizu 1, Norimasa Terada 2, and Masatoshi Yoshikawa 1 1 Graduate School of Informatics, Kyoto University
More informationTwig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents
Twig Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li, Junichi Tatemura Wang-Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan NEC Laboratories
More informationThese notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.
Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationXML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW
XML and Databases Lecture 10 XPath Evaluation using RDBMS Sebastian Maneth NICTA and UNSW CSE@UNSW -- Semester 1, 2009 Outline 1. Recall pre / post encoding 2. XPath with //, ancestor, @, and text() 3.
More informationAn Efficient XML Index Structure with Bottom-Up Query Processing
An Efficient XML Index Structure with Bottom-Up Query Processing Dong Min Seo, Jae Soo Yoo, and Ki Hyung Cho Department of Computer and Communication Engineering, Chungbuk National University, 48 Gaesin-dong,
More informationWe assume uniform hashing (UH):
We assume uniform hashing (UH): the probe sequence of each key is equally likely to be any of the! permutations of 0,1,, 1 UH generalizes the notion of SUH that produces not just a single number, but a
More informationEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management
More informationAvoiding Sorting and Grouping In Processing Queries
Avoiding Sorting and Grouping In Processing Queries Outline Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion Motivation
More informationV Advanced Data Structures
V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,
More informationCBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents
CIT. Journal of Computing and Information Technology, Vol. 26, No. 2, June 2018, 99 114 doi: 10.20532/cit.2018.1003955 99 CBSL A Compressed Binary String Labeling Scheme for Dynamic Update of XML Documents
More information2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006
2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,
More informationUpdating XML documents
Grindei Manuela Lidia Updating XML documents XQuery Update XQuery is the a powerful functional language, which enables accessing different nodes of an XML document. However, updating could not be done
More informationAn Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering
An Efficient Approach for Emphasizing Regions of Interest in Ray-Casting based Volume Rendering T. Ropinski, F. Steinicke, K. Hinrichs Institut für Informatik, Westfälische Wilhelms-Universität Münster
More informationHEAPS ON HEAPS* Downloaded 02/04/13 to Redistribution subject to SIAM license or copyright; see
SIAM J. COMPUT. Vol. 15, No. 4, November 1986 (C) 1986 Society for Industrial and Applied Mathematics OO6 HEAPS ON HEAPS* GASTON H. GONNET" AND J. IAN MUNRO," Abstract. As part of a study of the general
More informationPathfinder/MonetDB: A High-Performance Relational Runtime for XQuery
Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information
More informationContents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...
Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing
More informationImplementation Techniques
V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight
More informationData Structure. IBPS SO (IT- Officer) Exam 2017
Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationQuery Solvers on Database Covers
Query Solvers on Database Covers Wouter Verlaek Kellogg College University of Oxford Supervised by Prof. Dan Olteanu A thesis submitted for the degree of Master of Science in Computer Science Trinity 2018
More informationBinary Decision Diagrams
Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table
More informationADT 2009 Other Approaches to XQuery Processing
Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath
More informationBenchmarking Database Representations of RDF/S Stores
Benchmarking Database Representations of RDF/S Stores Yannis Theoharis 1, Vassilis Christophides 1, Grigoris Karvounarakis 2 1 Computer Science Department, University of Crete and Institute of Computer
More informationIndex-Trees for Descendant Tree Queries on XML documents
Index-Trees for Descendant Tree Queries on XML documents (long version) Jérémy arbay University of Waterloo, School of Computer Science, 200 University Ave West, Waterloo, Ontario, Canada, N2L 3G1 Phone
More informationV Advanced Data Structures
V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,
More informationSemi-structured Data. 8 - XPath
Semi-structured Data 8 - XPath Andreas Pieris and Wolfgang Fischl, Summer Term 2016 Outline XPath Terminology XPath at First Glance Location Paths (Axis, Node Test, Predicate) Abbreviated Syntax What is
More informationFoundations of Computer Science Spring Mathematical Preliminaries
Foundations of Computer Science Spring 2017 Equivalence Relation, Recursive Definition, and Mathematical Induction Mathematical Preliminaries Mohammad Ashiqur Rahman Department of Computer Science College
More informationSchema-Based XML-to-SQL Query Translation Using Interval Encoding
2011 Eighth International Conference on Information Technology: New Generations Schema-Based XML-to-SQL Query Translation Using Interval Encoding Mustafa Atay Department of Computer Science Winston-Salem
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationLABELING DYNAMIC XML DOCUMENTS: AN ORDER-CENTRIC APPROACH XU LIANG
LABELING DYNAMIC XML DOCUMENTS: AN ORDER-CENTRIC APPROACH XU LIANG A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE NATIONAL UNIVERSITY OF SINGAPORE 2010 1 Acknowledgments
More informationAn efficient XML query pattern mining algorithm for ebxml applications in e-commerce
Vol. 8(18), pp. 777-790, 28 September, 2014 DOI: 10.5897/AJBM2011.644 Article Number: 0350C7647676 ISSN 1993-8233 Copyright 2014 Author(s) retain the copyright of this article http://www.academicjournals.org/ajbm
More informationDiscrete mathematics
Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many
More informationBLAS : An Efficient XPath Processing System
BLAS : An Efficient XPath Processing System Yi Chen University of Pennsylvania yicn@cis.upenn.edu Susan B. Davidson University of Pennsylvania and INRIA-FUTURS (France) susan@cis.upenn.edu Yifeng Zheng
More informationIndexing Keys in Hierarchical Data
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan
More informationEcient XPath Axis Evaluation for DOM Data Structures
Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be
More informationCOMP9319 Web Data Compression & Search. Cloud and data optimization XPath containment Distributed path expression processing
COMP9319 Web Data Compression & Search Cloud and data optimization XPath containment Distributed path expression processing DATA OPTIMIZATION ON CLOUD Cloud Virtualization Cloud layers Cloud computing
More informationCSCI2100B Data Structures Trees
CSCI2100B Data Structures Trees Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong Introduction General Tree
More informationIndexing XML Data Stored in a Relational Database
Indexing XML Data Stored in a Relational Database Shankar Pal, Istvan Cseri, Oliver Seeliger, Gideon Schaller, Leo Giakoumakis, Vasili Zolotov VLDB 200 Presentation: Alex Bradley Discussion: Cody Brown
More informationData Centric Integrated Framework on Hotel Industry. Bridging XML to Relational Database
Data Centric Integrated Framework on Hotel Industry Bridging XML to Relational Database Introduction extensible Markup Language (XML) is a promising Internet standard for data representation and data exchange
More informationEfficient pebbling for list traversal synopses
Efficient pebbling for list traversal synopses Yossi Matias Ely Porat Tel Aviv University Bar-Ilan University & Tel Aviv University Abstract 1 Introduction 1.1 Applications Consider a program P running
More informationOn The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches
On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches E. Miller R. Libeskind-Hadas D. Barnard W. Chang K. Dresner W. M. Turner
More informationNON-CENTRALIZED DISTINCT L-DIVERSITY
NON-CENTRALIZED DISTINCT L-DIVERSITY Chi Hong Cheong 1, Dan Wu 2, and Man Hon Wong 3 1,3 Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong {chcheong, mhwong}@cse.cuhk.edu.hk
More informationMotivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion
JSON Schema-less into RDBMS Most of the material was taken from the Internet and the paper JSON data management: sup- porting schema-less development in RDBMS, Liu, Z.H., B. Hammerschmidt, and D. McMahon,
More informationAnswering Aggregate Queries Over Large RDF Graphs
1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo
More informationQuery Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University
Query Optimization Shuigeng Zhou December 9, 2009 School of Computer Science Fudan University Outline Introduction Catalog Information for Cost Estimation Estimation of Statistics Transformation of Relational
More informationCPS352 Lecture - Indexing
Objectives: CPS352 Lecture - Indexing Last revised 2/25/2019 1. To explain motivations and conflicting goals for indexing 2. To explain different types of indexes (ordered versus hashed; clustering versus
More informationReduced branching-factor algorithms for constraint satisfaction problems
Reduced branching-factor algorithms for constraint satisfaction problems Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il
More informationXML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW
Outline XML and Databases Lecture 10 XPath Evaluation using RDBMS 1. Recall / encoding 2. XPath with //,, @, and text() 3. XPath with / and -sibling: use / size / level encoding Sebastian Maneth NICTA
More information4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests
4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in
More informationCourse: The XPath Language
1 / 27 Course: The XPath Language Pierre Genevès CNRS University of Grenoble, 2012 2013 2 / 27 Why XPath? Search, selection and extraction of information from XML documents are essential for any kind of
More informationTHREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.
THREE LECTURES ON BASIC TOPOLOGY PHILIP FOTH 1. Basic notions. Let X be a set. To make a topological space out of X, one must specify a collection T of subsets of X, which are said to be open subsets of
More informationAdaptive Parallel Compressed Event Matching
Adaptive Parallel Compressed Event Matching Mohammad Sadoghi 1,2 Hans-Arno Jacobsen 2 1 IBM T.J. Watson Research Center 2 Middleware Systems Research Group, University of Toronto April 2014 Mohammad Sadoghi
More information