XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries

Size: px
Start display at page:

Download "XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries"

Transcription

1 XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries Sherif Sakr National ICT Australia (NICTA) Sydney, Australia sherif.sakr@nicta.com.au Abstract. Estimating the sizes of query results and intermediate results is a crucial part of any effective query optimization process. Due to several reasons, the selectivity estimation problem in the XML domain is more complicated than that in the relational domain. Several research efforts have proposed selectivity estimation approaches in the XML domain. Lacking of a suitable benchmark was one of the main reasons which prevented a real assessment and comparison between the approaches to be conducted. This paper is a first step towards a comprehensive assessment of the available selectivity estimation approaches of XML queries along with their strengths and weaknesses. We propose a selectivity estimation benchmark for XML queries, XSelMark. It consists of a set of 25 queries organized into seven groups and covers the main aspects of selectivity estimation of XML queries. These queries have been designed with respect to an XML document instance of a popular benchmark for XML data management, XMark. In addition, we suggest some criteria of assessing the capability and quality of XML queries selectivity estimation approaches. Finally, we use the proposed benchmark to assess the capabilities of the-state-of-the-art of the selectivity estimation approaches. 1 Introduction Modern implementations of query processors are heavily relying for their efficient performance on sophisticated optimizer components to achieve a proper selection of many optimization decisions such as: access paths, join orders and materialization strategies. Estimating the sizes of query results and intermediate results is a crucial part of any effective query optimization process. In fact, the selectivity estimation problem in the XML domain is more complicated than that in the relational domain. There are several reasons behind this such as: 1) the absence of strict schema notion in the XML data. 2) the dualism between structural and value-based querying. 3) the high expressiveness of the XML query languages [8]. 4) the non-uniform distribution of tags and data. 5) the correlation and dependencies between the occurrences of the elements. In the recent past, several research efforts have proposed different selectivity estimation approaches in the XML domain [9, 19, 20, 24]. However, these approaches are never

2 comprehensively assessed, evaluated and compared. One of the main reasons for this situation is that there is a lack of a suitable benchmark that facilitates the ability to conduct such real assessments and comparisons. This implies that there is no clear view about the state-of-the-art in this domain, which in turn makes it difficult to decide what further steps should be taken next. Although the XML research community has proposed several benchmarks [4, 5, 10, 16, 17, 21, 23] which are very useful for their intended targets and perspectives, none of these benchmarks fits in the context of being able to assess and evaluate the different selectivity estimation approaches of XML queries. The author of this paper has been faced with this problem during his work in [20]. In general, XML benchmarks can be classified into two main categories: 1) Application (Macro) benchmarks [4, 5, 17, 21, 23] which are used to evaluate the overall performance of an XML management system. Hence, this kind of benchmarks are not very useful for conducting a detailed assessment of specific aspects of an implementation that need improvement. 2) Micro-benchmarks [10, 16] which are designed to assess the performance of specific features of a system. In [16], Michiels et al. have motivated the crucial need of different microbenchmarks in order to get a good understanding of the different aspects in implementing efficient query processors in the XML domain. Therefore, the goal of this paper is to contribute and develop an XML Micro-benchmark, XSel- Mark, which is mainly focussed on exercising the selectivity estimation aspects of XML queries. The proposed benchmark is considered as a first step to bring an overview of the state-of-the-art of the available approaches in the domain of selectivity estimation of XML queries along with their strengths and weaknesses. It aims of to be a guide for researchers and implementors in benchmarking and improving their research efforts in this domain. XSelMark consists of 25 queries organized into seven groups where each group is intended to address the challenges posed by the different aspects of XML query result size estimation. The remainder of this paper is organized as follows. Section 2 briefly gives an overview on the related benchmarks in the XML domain. Section 3 describes the main aspects of the selectivity estimation problem in the XML domain. Section 4 presents the set of queries of the XSelMark benchmark. An overview and an assessment of the supported features of the-state-of-the-art in the selectivity estimation approaches of XML queries is presented in Section 5 before we conclude Section 6. 2 Related Work Several benchmarks for the evaluation of XML data management systems have been proposed by the XML research community [4, 5, 10, 16, 17, 21, 23]. Most of these benchmarks are application oriented [4, 5, 17, 21, 23], while few others are designed as Micro-benchmarks [10, 16]. In this section we give a brief overview about the state-of-the-art of XML benchmarks. XMach-1 [4] is a scalable multi-user benchmark. It is based on a web application and considers text documents and catalog data. It only defines a small

3 number of XML queries that cover multiple functions and update operations for which system performance is determined. The benchmark consists of 8 queries and 3 update operations. The goal of the benchmark is to test how many queries per second the query engine can execute and to stress the XML systems under a multi-user workload. XOO7 [5] is considered to be the XML counterpart of the OO7 benchmark [6] which is geared towards object repositories. Besides mapping the database and original queries of OO7 into XML, XOO7 is enriched with document and navigational queries that are specific for XML databases. The goal of XOO7 is to evaluate the performance of XML management systems. XBench [23] is a comprehensive XML database benchmark that covers a large number of XML database applications. These applications are characterized by whether they are data-centric or text-centric and whether they consist of a single document or multiple documents. XBench workload covers the functionality of XQuery as captured in the Use Cases. XMark [21] is a single-user benchmark. The database model is based on an internet auction site and consists of one big regularly structured XML document with text and non-text data. It provides a concise and comprehensive set of queries which allows users and developers to assess the performance characteristics of the different XML engines. The TPOX benchmark [17] is an application-level XML database benchmark based on a financial application scenario. It is used to evaluate the performance of XML database systems. It is mainly focussed on exercising all aspects of XML database management systems such as: storage, indexing, logging, transaction processing and concurrency control. The work load of TPOX consists of insert, update and delete operations as well as query operations. XPathMark [10] is a Micro XPath 1.0 benchmark for XMark. It presents a set of XPath queries which covers the major aspects of the XPath language including different axes, node tests, Boolean operators, references, and functions. The targets of XPathMark is to assess the functional completeness, correctness, efficiency and data scalability of XPath implementations. MemBeR [16] is another Micro-Benchmark which has a main focus to benchmark the XQuery engines with respect to the efficiency of their implementation to four important XQuery constructs: XPath navigation, XPath predicates, XQuery FLWORs and XQuery Node Construction. These four constructs form the foundation of the language and thus their efficient implementation greatly impact the overall query engine performance. 3 Main Aspects of Selectivity Estimation in the XML Domain When looking for an efficient, capable and accurate selectivity estimation approach for XML queries, there are several issues that need to be addressed. From the experience of our work in [20], the major issues of this problem include:

4 It should support structural and data value queries. In principal, all XML query languages can involve structural conditions in addition to the valuebased conditions. Therefore, any complete selectivity estimation system for the XML queries requires maintaining statistical summary information about both of the structure and the data values of the the underlying XML documents. A recommended way of doing this is to apply the XMill approach [14] in separating the structural part of the XML document from the data part and then group the related data values according to their path and data types into homogenous sets. A suitable summary structure for each set can then be easily selected. For example, the most common approaches in summarizing the numerical data values is histograms or wavelets while several tree synopses could be used to summarize the structural part. It must be practical. In general, one of the main usages of the selectivity estimation approaches is to accelerate the performance of the query evaluation process. Thus, while theoretical guarantees are important for any proposed approach, practical considerations is much more important. The performance characteristics of the selectivity estimation process is a crucial aspect for any approach. The selectivity estimation process of any query or sub-query must be much faster than the real evaluation process. In other words, the cost savings on the query evaluation process using the selectivity information must be higher than the cost of performing the selectivity estimation process. In addition, the required summary structure(s) for achieving the selectivity estimation process must be efficient in terms of memory and space consumption. It should be strongly capable. The standard query language for XML namely XPath and XQuery are very rich languages. It provides a rich set of functions and features. These features include structure and content-based search, path expressions, element construction, join, sort, duplicate elimination and aggregation operations. Thus, a good selectivity estimation approach should be able to provide accurate estimates for a wide range of these features. In addition, it should maintain a set of special summary information about the underlying source XML documents. For example, a universal assumptions about uniform distribution of the elements structure and the data values may lead to many potential estimation errors because of the irregular nature of many XML documents. It should be composable. The XML query languages, specially XQuery, are compositional in nature as sub-expressions are combined with each other to form the final query. Hence, a good selectivity estimation approach should be able to estimate the selectivity of the final expressions as well as each sub-expressions. This feature is crucial for any cost-based query optimizer to enable a proper selection of a cheap execution plans according to the feeded selectivity information of each sub-expression. It must be accurate. On the one hand, providing an accurate estimation for the query optimizer can effectively accelerate evaluation process of any query. However, on the other hand, providing the query optimizer with incorrect

5 selectivity information will lead the query optimizer to incorrect decisions and consequently to inefficient execution plans. It should be independent. It is recommended that the selectivity estimation process be independent of the actual evaluation process and it can be used with different query engines which are applying different evaluation mechanisms. 4 XSelMark Benchmark Queries XMark [21] is a well-known benchmark for XML data management. The XMark database is modelling an internet auction web site. XMark comes with an XML generator that produces XML documents according to a numeric scaling factor proportional to the document size. We base the queries of our proposed benchmark on the structure of the XMark document auction.xml which is described in detail in [21]. The set of queries of our proposed benchmark, XSelMark, represents a mix of XML queries which covers a wide set of the major selectivity estimation aspects in the domain of XML queries. They are designed in a way to allow a realistic assessment for the advantages and shortcomings of the proposed XML selectivity estimation approaches and to identify their respective impact. The set of queries are expressed using two standard XML query languages XPath and XQuery. They are concise, easy to read and understand and available at the web page of the benchmark [1]. 4.1 Group 1: Path Expressions Path expression is a fundamental building block on querying XML data. This group of queries investigates the ability of the selectivity estimation approaches on dealing with the structural XML queries. Q1) Path expression with non-recursive axes: Find the names of all persons. /site/people/person/name/text() where non-recursive axes are child, parent, attribute, following-sibling and precedingsibling. Q2) Path expression with recursive axes: Find all description nodes descendant of all item nodes. /site//item//description where recursive axes are descendant, descendant-or-self, ancestor and ancestoror-self. Q3) Path expression with wild cards: Return the item subtrees of all regions. /site/regions/*//item/*

6 Q4) Path expression with ordered-based axes: Return the description nodes which are following the tags with the name closed auction. /site//closed_auction/following::description where ordered-based axes are following, following-sibling, preceding and precedingsibling. Supporting such type of queries requires the selectivity estimation approach to capture specific statistical information about the order of the elements in the XML documents. Q5) Branching XPath Expressions: Return the names of all persons who have age information in their profiles. /site//person[profile/age]/name 4.2 Group 2: Twig Expressions Q6) Simple twig expression: Return the names and descriptions of all items. for $b in //item return ($b/name,$b/description) Q7) Twig expression with element construction: Return the restructured results of the names and descriptions of all items. for $b in //item return <Result> <name>{$b/name}</name> <price>{$b/price}</price> </Result> 4.3 Group 3: Predicates The estimation of predicate selectivity is a well-known problem in database theory and practice. Most common solutions of this problem rely on histograms for capturing the distribution of data values, and on the use of the uniform distribution when nothing is known about the data involved in the predicate. In the context of XML, predicate selectivity estimation poses new challenges such as: 1) The predicates can be structural-based as well as value based. 2) Positional predicates represents a special form of predicates over the order information of the elements in the XML document. 3) XML elements are usually distributed in a non-uniform way, hence assuming a simple uniform distribution of the elements structure may lead to many potential estimation errors especially when the operated sequence of nodes are constructed by merging nodes from different groups of data elements. Q8) Positional Predicates: Return the third bidder of each open auction. /site/open_auctions/open_auction/bidder[3]

7 Q9) Equality Predicates: Return the closed auctions with price equal to 40. /site//closed_auction[price = 40] Q10) Range Predicates: Return the closed auctions with price less than 40. /site//closed_auction[price < 40] where the range predicates uses any of the operators (<,, =,! =, >, ). Q11) Conjunctive/Disjunctive Predicates: Return the closed auctions with price greater than 40 and less than 100. /site//closed_auction[price > 40 and price < 100] where conjunctive predicates can use any of the conjunctive/disjunctive operators (AND, OR). Q12) Predicates with merged nodes from different paths: Return the african and asian items with id value greater than 100. for $b in (/site/africa/item, /site/asia/item) where data($b/@id)> 100 return $b An accurate estimation of such query should consider the different distribution for the data values nodes resulting from each different path expression as well as the percentage of each path in construcing the nodes of the operated sequence. Q13) Predicates with merged nodes from different paths and hybrid nature: Return the price nodes and quantity nodes with value greater than 100. for $b in (/site//price,/site//quantity) where data($b) > 1 and data($b) > 100 return $b This query is more challenging than the previous one because the resulting nodes of the operated sequence are representing completely different data items (price, quantity) which may have totally different distributions for their data values. Q14) String Predicates: Return all persons with id value greater than person200. /site/people/person[@id > "person200"] 4.4 Group 4: Value-Based Joins (Theta Joins) This group of queries assess the ability and the accuracy of the selectivity estimation approaches on effective and accurate dealing with value-based join operations between the data values of XML nodes. Q15) Value-based join instances where the values of each operand are constructed by path expression: Return all pairs of increase value and price value where the increase value is greater than the price value.

8 for $x in /site//increase, $y in /site//price where data($x) > data($y) return <pair>{$x,$y}</pair> Q16) Value-based join instances where the values of one operand are constructed by path expression and the values of the other operand are constructed by path expression manipulated with arithmetic expression: Return all pairs of increase value and price value where the increase value is greater than the price value multiplied by 2. for $x in /site//increase, $y in /site//price where data($x) > data($y) * 2 return <pair>{$x,$y}</pair> Q17) Equi-Joins of data values: Return all pairs of increase value and price value where the increase value is equal to the price value. for $x in /site//increase, $y in /site//price where data($x) = data($y) return <pair>{$x,$y}</pair> 4.5 Group 5: Arithmetic and Comparison operations over Data Value Statistics This group of queries assess the ability of the selectivity estimation approaches on their ability of not only being able to capture summary information about the data values of the XML elements but also on their ability of applying arithmetic and comparison operations over these summary information in a consistent and accurate way which does not hurt the quality of the selectivity estimation results. Q18) Arithmetic over Data Value Statistics 1: Return all pairs of increase value and price value where the sum of the increase value and the price value is greater than 100. for $x in /site//increase, $y in /site//price where data($x) + data($y) > 100 return <pair>{$x,$y}</pair> Q19) Arithmetic over Data Value Statistics 2: Return all pairs of increase value and price value where the sum of the increase value and the price value is equal to 100. for $x in /site//increase,$y in /site//price where data($x) + data($y) = 100 return <pair>{$x,$y}</pair> Q20) Arithmetic and Comparison operations over Data Value Statistics 3: Return all triples of increase value, price value and income where the sum of the increase value and the income value is greater than the sum of the price value and the income value.

9 for $x in /site//increase, $y in /site//price, $z in where data($x) + data($z) > data($y) + data($z) return <pair>{$x,$y,$z}</pair> 4.6 Group 6: Nested Expressions XQuery, as with many other XML query languages such as SQL/XML [7], is a free nesting language, where nested queries can be used for many targets such as reshaping elements or computing aggregate values. Since the result of nested queries may be the input for navigational or filtering operations in the outer query, predicting the size of nested query results will require building on-the-fly statistics about these intermediate results. Q21) Let - Aggregates: Return the names of persons and the number of items that they bought. for $p in /site/people/person let $a := for $t in /site//closed_auction where $t/buyer/@person = $p/@id return $t return <item> <person>{$p/name/text()}</person> <count>{count($a)}</count> </item> Q22) Predicates with values constructed by aggregate function: Return the open auctions with sum of bidder increases that are greater than for $b in /site/open_auctions/open_auction where sum(data($b/bidder/increase)) > 1000 return <increase>{$b}</increase> 4.7 Group 7: Data Dependent Estimations This group of queries requires capturing additional specific forms of summary information about the data values of the underlying XML documents. Q23) Full Text Search: Return the names of all items whose description contains the word gold. /site//item[contains(description, gold )] Q24) Distinct Operator: Return the distinct price values. for $b in distinct-values(//price/text()) Q25) Existential Document Order: Return the open auctions where a certain person issued a bid before another person.

10 for $b in /site/open_auctions/open_auction where some $pr1 in = "person20"], $pr2 in = "person51"] satisfies $pr1 << $pr2 return <history>{$b}</history> 5 XML Selectivity Estimation Approaches: state-of-the-art In this section we give an overview of the state-of-the-art of the selectivity estimation approaches in the XML domain after which we will use the set of XSelMark queries to assess the capabilities and features supported by the functionality of each approach. The work of Aboulnaga et al. [2] is considered to be the first to deal with the selectivity estimation of simple path expressions. They presented two different techniques for capturing the structure of the XML documents and for providing accurate selectivity estimations for simple path expressions The first technique is a summarizing tree structure called a path tree. It is a tree containing each distinct rooted path in the database. The second technique is a statistical structure called Markov table. This table, implemented as an ordinary hash table, contains any distinct path of length up to m and its selectivity. The presented techniques only work for simple path expressions that are without predicates, inline conditions, recursive axes and order-based axes. In [15], the authors present an XPathLearner as a selectivity estimation system for XPath expressions which employs the same summarization and estimation techniques presented in [2] with two main modifications. The first modification is that it gathers and refines the required statistical information in an on-line manner from query feedbacks and the second modification is that it supports the handling of predicates by storing statistical information for each distinct tag-value pair in the source XML document. The work of Zhang et al. in [24] is mainly focusing on the handling of XPath expressions which involve only structural conditions. The main idea behind the paper is to provide an efficient treatment of recursive XML documents and the accurate estimation of recursive queries. The authors define a summary structure for summarizing the source XML documents into a compact graph structure called XSEED. Relying on the defined statistic graph structure, the authors propose an algorithm for the selectivity estimation of the structural XPath expressions. In [11] Freire et al. have presented an XML Schema-based statistics collection technique called StatiX. StatiX leverages the available information in the XML Schema to capture both structural and value statistics about the source XML documents. These structural and value statistics are collected in the form of histograms. The StatiX systems is employed in a cost-based XML-to-relational

11 storage mapping engine which tries to generate efficient relational configurations for the XML documents, LegoDB [3]. In [22] Wang et al. have proposed a special histogram structure for the selectivity estimation of XPath queries in a dynamic context named as Bloom Histogram. The Bloom Histogram keeps a count of the statistics for paths in XML data. A bloom histogram H is constructed by sorting the frequency values of the distinct paths in XML data and then grouping the paths with similar frequency values into buckets. Although, Bloom Histogram is designed to deal with data updates and the estimation error is theoretically bounded by its size, it is very limited as it deals only with simple forms of path expressions. In [13], Li et al. have described a framework for estimating the selectivity of XPath expressions with a main focus on the order-based axes (following, preceding, following-sibling, preceding-sibling). They used two histogram structures to aggregate the path and order information of XML data called p-histogram and o-histogram. A p-histogram is built for each distinct element tag to summarize the pathid-frequency information. In this histogram, each bucket contains a set of path ids and their average frequency value. The o-histogram summarizes the path-order information of each distinct element tag name to capture the siblingorder information based on the path ids. In [9] Fisher et al. have proposed the SLT XML tree synopsis. The main idea of this synopsis is to remove the repeated patterns in the XML tree and to replace the multiple occurrences of equal subtrees by pointers to a single occurrence of the subtree. They described an algorithm for representing the resulting DAG structures using a special form of grammars alled an SLT grammar (straight line tree grammar). A tree automata is designed to run over the generated lossy SLT grammars to estimate the selectivity of queries containing all XPath axes, including the order-sensitive ones. The proposed synopes can deal only with structural XPath queries with no support of any form of predicate queries or XQuery expressions. In [19] Polyzotis et al. have proposed the XCluster synopses as a clusteringbased framework that can capture the key correlations between and across structure and values of different types. XCluster is considered to be a generalized form of the XSketch tree synopses which is a previous work of the authors represented in [18]. It employs the well-known histogram techniques for numeric and string values, and introduces the class of end-biased term histograms for summarizing the distribution of unique terms within textual XML content. This approach can support twig queries with predicates on numeric content, string content, and textual content. However, the authors did not mention how XCluster can be extended to deal with more complicated query situations such as value-based join operations and nested expressions. The work of [20] has described the design and implementation of a relational algebraic based framework for estimating the selectivity of XQuery expressions. In this approach, XML queries are translated into relational algebraic plans [12]. Summary information about the structure and the data values of the underlying XML documents are kept separately. Then by exploiting the relational alge-

12 braic infrastructure, the special properties of the generated algebraic plans, the summary information and a set of inference rules, the relational estimation approach is able to provide accurate selectivity estimations in the context of XML and XQuery domains. The framework enjoys the flexibility of integrating any XPath or predicate selectivity estimation technique, which enables it to support the selectivity estimation of a large subset of the powerful XML query language XQuery and to provide estimates not only of the whole XQuery expression but also of each sub-expression as well as the selectivity of each iteration in the context of FLWOR. Features Assessment One of the main goals of XSelMark benchmark is to provide a framework of assessing the completeness of the selectivity estimation approaches of XML queries. We used the set of XSelMark benchmark queries for an initial assessment of the supported features by the state-of-the-art. Table 1 lists the set of queries supported by each approach where the symbol X is used to indicate the ability of the approach to support the associated query and the symbol - is used to indicate the inability to support the associated query. The assessment has shown some interesting preliminary results: 1) Most of the selectivity estimation approaches [11, 13, 15, 24, 22] are limited on their abilities to support only small subsets of the XML query languages. They are only able to deal with structural XPath queries. 2) The two synopses of [13, 9] are the only two synopses which are able to support the selectivity estimation of order-sensitive XPath axes. 3) The approaches of [19, 20] cover a wider range of the XML query features. The synopsis of [19] is the only one which is able to deal with the estimation of full text search queries while [20] is able to uniquely deal with many of the features of XQuery languages such as join operation and different type of predicates. 6 Conclusion Several research efforts have been invested on designing Macro-Benchmarks to assess the overall performance of XML data management systems. There is currently a big demand for several Micro-Benchmarks which assess specific aspects in the XPath, XQuery and XML management system domains. Several research efforts have proposed different selectivity estimation approaches in the XML domain. Due to the lack of a suitable benchmark, it was difficult to assess, evaluate and compare these approaches and in order to get a clear view about the state-of-the-art. This paper is considered as a first step towards a comprehensive assessment of the available selectivity estimation approaches of XML queries. We proposed XSelMark as a Micro-Benchmark to assess the state-of-the-art of the selectivity estimation approach of XML queries. An initial assessment for the features and capabilities of the current approaches has shown that most of them are limited to supporting the estimation of the structural XPath queries. Hence, several avenues for further research and development are still widely open

13 XPath- XSEED StatiX Path-Order Bloom SLT XCluster Relational Learner [15] [24] [11] Histogram [13] Histogram [22] Gramar [9] [19] Alg. Est. [20] Q1 X X X X X X X X Q2 X X X X X X X X Q3 X X X X X X X X Q X - X - X Q5 X X X X - X X - Q X X Q X X Q X - X Q9 X - X X X Q10 X - X X X Q11 X X X Q X Q X Q X X Q X Q X Q X Q X Q X Q X Q X Q Q X - Q Q Table 1. An assessment of the capabilities of the state-of-the-art of the selectivity estimation approaches using XSelMark benchmark. in this domain to provide accurate, capable and complete frameworks aligned with the rich querying capabilities of the standard XML query languages. We believe that XSelMark is useful for both researchers and developers. It identifies the major aspects of selectivity estimation of XML queries, helps researchers to discover the strengths and weaknesses of the current approaches and provides the researchers and developers with a clearer view of developing more enhanced mechanisms of selectivity estimation of XML queries. In addition, we believe that the selectivity estimation problem is an important research field which has many useful applications other than being a crucial piece for an effective query optimization process such as: 1) allowing the query engines to provide the users with an early feedback about the expected outcome of their queries and the associated computational efforts. 2) providing the query engines with hints on the possible avenues to optimize the resource allocation of the execution process. 3) playing an effective role for efficient approximate query answering techniques. As a future work, we are planning to use XSelMark to perform more detailed assessment of the selectivity estimation approaches of XML queries in terms of their accuracy, performance and memory requirements. References 1. XSelMark: A Micro-Benchmark of Selectivity Estimation of XML Queries A. Aboulnaga, A. Alameldeen, and J. Naughton. Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In VLDB, 2001.

14 3. P. Bohannon, J. Freire, P. Roy, and J. Siméon. From XML Schema to Relations: A Cost-Based Approach to XML Storage. In ICDE, T. Böhme and E. Rahm. XMach-1: A Benchmark for XML Data Management. In BTW, S. Bressan, M. Lee, Y. Li, Z. Lacroix, and U. Nambiar. The XOO7 Benchmark. In VLDB 2002 Workshops, London, UK, M. Carey, D. DeWitt, and J. Naughton. The OO7 Benchmark. SIGMOD Record (ACM Special Interest Group on Management of Data), 22, Andrew Eisenberg and Jim Melton. Advancements in SQL/XML. SIGMOD Record, 33(3):79 86, M. Fernández, A. Malhotra, J. Marsh, M. Nagy, and N. Walsh. XQuery 1.0 and XPath 2.0 Data Model (XDM). World Wide Web Consortium Proposed Recommendation, November D. Fisher and S. Maneth. Structural Selectivity Estimation for XML Documents. In ICDE, M. Franceschet. XPathMark: An XPath Benchmark for the XMark Generated Data. Database and XML Technologies, J. Freire, J. Haritsa, M. Ramanath, P. Roy, and J. Siméon. StatiX: making XML count. In SIGMOD, T. Grust, S. Sakr, and J. Teubner. XQuery on SQL Hosts. In VLDB, H. Li, M. Lee, W. Hsu, and G. Cong. An Estimation System for XPath Expressions. In ICDE, H. Liefke and D. Suciu. XMill: An efficient compressor for XML data. In W. Chen, J. F. Naughton, and P. A. Bernstein, editors, SIGMOD, L. Lim, M. Wang, S. admanabhan, J. Vitter, and R. Parr. XPathLearner: An On-line Self-Tuning Markov Histogram for XML Path Selectivity Estimation. In VLDB, P. Michiels, I. Manolescu, and C. Miachon. Toward microbenchmarking XQuery. Information System, 33(2), M. Nicola, I. Kogan, and B. Schiefer. An XML transaction processing benchmark. In SIGMOD, N. Polyzotis and M. Garofalakis. Structure and Value Synopses for XML Data Graphs. In VLDB, N. Polyzotis and M. Garofalakis. XCluster Synopses for Structured XML Content. In ICDE, S. Sakr. Cardinality-Aware and Purely Relational Implementation of an XQuery Processor. PhD thesis, University of Konstanz, A. Schmidt, F. Waas, M. Kersten, M. Carey, I. Manolescu, and R. Busse. XMark: A Benchmark for XML Data Management. In VLDB, W. Wang, H. Jiang, H. Lu, and J. Xu Yu. Bloom Histogram: Path Selectivity Estimation for XML Data with Updates. In VLDB, B. Yao, T. Özsu, and J. Keenleyside. XBench - A Family of Benchmarks for XML DBMSs. In VLDB Workshop, N. Zhang, T. Özsu, A. Aboulnaga, and I. Ilyas. XSEED: Accurate and Fast Cardinality Estimation for XPath Queries. In ICDE, 2006.

Cardinality estimation of navigational XPath expressions

Cardinality estimation of navigational XPath expressions University of Twente Department of Electrical Engineering, Mathematics and Computer Science Database group Cardinality estimation of navigational XPath expressions Gerben Broenink M.Sc. Thesis 16 June

More information

FlexBench: A Flexible XML Query Benchmark

FlexBench: A Flexible XML Query Benchmark FlexBench: A Flexible XML Query Benchmark Maroš Vranec Irena Mlýnková Department of Software Engineering Faculty of Mathematics and Physics Charles University Prague, Czech Republic maros.vranec@gmail.com

More information

XQuery Optimization in Relational Database Systems

XQuery Optimization in Relational Database Systems XQuery Optimization in Relational Database Systems Riham Abdel Kader Supervised by Maurice van Keulen Univeristy of Twente P.O. Box 217 7500 AE Enschede, The Netherlands r.abdelkader@utwente.nl ABSTRACT

More information

StatiX: Making XML Count

StatiX: Making XML Count StatiX: Making XML Count * Prasan Roy Jerome Simeon Bell Labs - Lucent Technologies Jayant Haritsa Maya Ramanath Indian Institute of Science Statix SIGMOD, 2002 1 Motivation Statistics to estimate cardinality

More information

Estimating the Selectivity of XML Path Expression with predicates by Histograms

Estimating the Selectivity of XML Path Expression with predicates by Histograms Estimating the Selectivity of XML Path Expression with predicates by Histograms Yu Wang 1, Haixun Wang 2, Xiaofeng Meng 1, and Shan Wang 1 1 Information School, Renmin University of China, Beijing 100872,

More information

A Sampling Approach for XML Query Selectivity Estimation

A Sampling Approach for XML Query Selectivity Estimation A Sampling Approach for XML Query Selectivity Estimation Wen-Chi Hou Computer Science Department Southern Illinois University Carbondale Carbondale, IL 62901, U.S.A. hou@cs.siu.edu Cheng Luo Department

More information

Multi-User Evaluation of XML Data Management Systems with XMach-1

Multi-User Evaluation of XML Data Management Systems with XMach-1 Multi-User Evaluation of XML Data Management Systems with XMach-1 Timo Böhme, Erhard Rahm University of Leipzig, Germany {boehme, rahm}@informatik.uni-leipzig.de http://dbs.uni-leipzig.de Abstract. XMach-1

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

Schema-Based XML-to-SQL Query Translation Using Interval Encoding

Schema-Based XML-to-SQL Query Translation Using Interval Encoding 2011 Eighth International Conference on Information Technology: New Generations Schema-Based XML-to-SQL Query Translation Using Interval Encoding Mustafa Atay Department of Computer Science Winston-Salem

More information

XPathMark: an XPath Benchmark for the XMark Generated Data

XPathMark: an XPath Benchmark for the XMark Generated Data XPathMark: an XPath Benchmark for the XMark Generated Data Massimo Franceschet Informatics Institute, University of Amsterdam, Kruislaan 403 1098 SJ Amsterdam, The Netherlands Dipartimento di Scienze,

More information

Symmetrically Exploiting XML

Symmetrically Exploiting XML Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference

More information

Integrating Path Index with Value Index for XML data

Integrating Path Index with Value Index for XML data Integrating Path Index with Value Index for XML data Jing Wang 1, Xiaofeng Meng 2, Shan Wang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080 Beijing, China cuckoowj@btamail.net.cn

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

ADT 2009 Other Approaches to XQuery Processing

ADT 2009 Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath

More information

A Framework for Estimating XML Query Cardinality

A Framework for Estimating XML Query Cardinality A Framework for Estimating XML Query Cardinality Carlo Sartiani Dipartimento di Informatica - Università di Pisa Via Buonarroti 2, Pisa, Italy sartiani@di.unipi.it ABSTRACT Tools for querying and processing

More information

An XML Routing Synopsis for Unstructured P2P Networks

An XML Routing Synopsis for Unstructured P2P Networks An XML Routing Synopsis for Unstructured P2P Networks Qiang Wang University of Waterloo q6wang@uwaterloo.ca Abhay Kumar Jha IIT, Bombay abhaykj@cse.iitb.ac.in M. Tamer Özsu University of Waterloo tozsu@uwaterloo.ca

More information

A Scheme for Evaluating XML Engine on RDBMS

A Scheme for Evaluating XML Engine on RDBMS I.J.Modern Education and Computer Science, 2011, 2, 51-60 Published Online April 2011 in MECS (http://www.mecs-press.org/) A Scheme for Evaluating XML Engine on RDBMS Guannan Si, Zhengji Zhou, Nan Li,

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

XQuery Implementation Paradigms (06472)

XQuery Implementation Paradigms (06472) Executive Summary of Dagstuhl Seminar XQuery Implementation Paradigms (06472) Nov 19 22, 2006 Organizers: Peter A. Boncz (CWI Amsterdam, NL) Torsten Grust (TU München, DE) Jérôme Siméon (IBM TJ Watson

More information

Parameterized XPath Views

Parameterized XPath Views Parameterized XPath Views Timo Böhme, Erhard Rahm Database Group University of Leipzig {boehme,rahm}@informatik.uni-leipzig.de Abstract: We present a new approach for accelerating the execution of XPath

More information

Ecient XPath Axis Evaluation for DOM Data Structures

Ecient XPath Axis Evaluation for DOM Data Structures Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

Effective Schema-Based XML Query Optimization Techniques

Effective Schema-Based XML Query Optimization Techniques Effective Schema-Based XML Query Optimization Techniques Guoren Wang and Mengchi Liu School of Computer Science Carleton University, Canada {wanggr, mengchi}@scs.carleton.ca Bing Sun, Ge Yu, and Jianhua

More information

StatiX: Making XML Count

StatiX: Making XML Count StatiX: Making XML Count Juliana Freire 1 Jayant R. Haritsa 2 Maya Ramanath 2 Prasan Roy 1 Jérôme Siméon 1 1 Bell Labs 2 Indian Institute of Science fjuliana,prasan,simeong@research.bell-labs.com fharitsa,mayag@dsl.serc.iisc.ernet.in

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

A Clustering-based Scheme for Labeling XML Trees

A Clustering-based Scheme for Labeling XML Trees 84 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.9A, September 2006 A Clustering-based Scheme for Labeling XML Trees Sadegh Soltan, and Masoud Rahgozar, University of

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

Module 9: Selectivity Estimation

Module 9: Selectivity Estimation Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock

More information

An Efficient Eigenvalue-based P2P XML Routing Framework

An Efficient Eigenvalue-based P2P XML Routing Framework An Efficient Eigenvalue-based P2P XML Routing Framework Qiang Wang Univ. of Waterloo, Canada q6wang@uwaterloo.ca M. Tamer Özsu Univ. of Waterloo, Canada tozsu@uwaterloo.ca Abstract Many emerging applications

More information

Querying and Updating XML with XML Schema constraints in an RDBMS

Querying and Updating XML with XML Schema constraints in an RDBMS Querying and Updating XML with XML Schema constraints in an RDBMS H. Georgiadis I. Varlamis V. Vassalos Department of Informatics Athens University of Economics and Business Athens, Greece {harisgeo,varlamis,vassalos}@aueb.gr

More information

Fractional XSketch Synopses for XML Databases

Fractional XSketch Synopses for XML Databases Fractional XSketch Synopses for XML Databases Natasha Drukh 1, Neoklis Polyzotis 2, Minos Garofalakis 3, and Yossi Matias 1 1 School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel, kreimern@post.tau.ac.il,

More information

Summarization of XML Documents

Summarization of XML Documents Summarization of XML Documents Hesham Elzentani, Prof. dr Mladen Veinović Abstract EXtensible Markup Language (XML) has become a standard of data exchange and representation in many applications. An XML

More information

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ

Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ 45 Object Placement in Shared Nothing Architecture Zhen He, Jeffrey Xu Yu and Stephen Blackburn Λ Department of Computer Science The Australian National University Canberra, ACT 2611 Email: fzhen.he, Jeffrey.X.Yu,

More information

XML Query Processing and Optimization

XML Query Processing and Optimization XML Query Processing and Optimization Ning Zhang School of Computer Science University of Waterloo nzhang@uwaterloo.ca Abstract. In this paper, I summarize my research on optimizing XML queries. This work

More information

XML Systems & Benchmarks

XML Systems & Benchmarks XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise

More information

TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing

TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing American Journal of Applied Sciences 5 (9): 99-25, 28 ISSN 546-9239 28 Science Publications TwigINLAB: A Decomposition-Matching-Merging Approach To Improving XML Query Processing Su-Cheng Haw and Chien-Sing

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Element Algebra. 1 Introduction. M. G. Manukyan

Element Algebra. 1 Introduction. M. G. Manukyan Element Algebra M. G. Manukyan Yerevan State University Yerevan, 0025 mgm@ysu.am Abstract. An element algebra supporting the element calculus is proposed. The input and output of our algebra are xdm-elements.

More information

Data Structures for Maintaining Path Statistics in Distributed XML Stores

Data Structures for Maintaining Path Statistics in Distributed XML Stores Data Structures for Maintaining Path Statistics in Distributed XML Stores c Yury Soldak Department of Computer Science, Saint-Petersburg State University University Prospekt 28 Saint-Petersburg Russian

More information

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over

More information

The Michigan Benchmark: A Micro-Benchmark for XML Query Performance Diagnostics

The Michigan Benchmark: A Micro-Benchmark for XML Query Performance Diagnostics The Michigan Benchmark: A Micro-Benchmark for XML Query Performance Diagnostics Jignesh M. Patel and H. V. Jagadish Department of Electrical Engineering and Computer Science The University of Michigan,

More information

Set-at-a-time Access to XML through DOM

Set-at-a-time Access to XML through DOM Set-at-a-time Access to XML through DOM Hai Chen Frank Wm. Tompa School of Computer Science University of Waterloo Waterloo,ON,Canada +1-519-888-4567 {h24chen,fwtompa@db.uwaterloo.ca ABSTRACT To support

More information

τ-xsynopses - A System for Run-time Management of XML Synopses

τ-xsynopses - A System for Run-time Management of XML Synopses τ-xsynopses - A System for Run-time Management of XML Synopses Natasha Drukh School of Computer Science Tel Aviv University kreimern@cs.tau.ac.il Leon Portman School of Computer Science Tel Aviv University

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

1 Introduction. Philippe Michiels. Jan Hidders University of Antwerp. University of Antwerp. Roel Vercammen. University of Antwerp

1 Introduction. Philippe Michiels. Jan Hidders University of Antwerp. University of Antwerp. Roel Vercammen. University of Antwerp OPTIMIZING SORTING AND DUPLICATE ELIMINATION IN XQUERY PATH EXPRESSIONS Jan Hidders University of Antwerp jan.hidders@ua.ac.be Roel Vercammen University of Antwerp roel.vercammen@ua.ac.be Abstract Philippe

More information

Relational Model: History

Relational Model: History Relational Model: History Objectives of Relational Model: 1. Promote high degree of data independence 2. Eliminate redundancy, consistency, etc. problems 3. Enable proliferation of non-procedural DML s

More information

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today XML Query Processing CPS 216 Advanced Database Systems Announcements (March 31) 2 Course project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

Efficient XQuery Evaluation of Grouping Conditions with Duplicate Removals

Efficient XQuery Evaluation of Grouping Conditions with Duplicate Removals Efficient XQuery uation of Grouping Conditions with Duplicate Removals Norman May and Guido Moerkotte University of Mannheim B6, 29 68131 Mannheim, Germany norman moer@db.informatik.uni-mannheim.de Abstract.

More information

MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing

MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing ADT 2010 MonetDB/XQuery (2/2): High-Performance, Purely Relational XQuery Processing http://pathfinder-xquery.org/ http://monetdb-xquery.org/ Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/

More information

XML Tree Structure Compression

XML Tree Structure Compression XML Tree Structure Compression Sebastian Maneth NICTA & University of NSW Joint work with N. Mihaylov and S. Sakr Melbourne, Nov. 13 th, 2008 Outline -- XML Tree Structure Compression 1. Motivation 2.

More information

Compacting XML Structures Using a Dynamic Labeling Scheme

Compacting XML Structures Using a Dynamic Labeling Scheme Erschienen in: Lecture Notes in Computer Science (LNCS) ; 5588 (2009). - S. 158-170 https://dx.doi.org/10.1007/978-3-642-02843-4_16 Compacting XML Structures Using a Dynamic Labeling Scheme Ramez Alkhatib

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement. COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations

More information

An Efficient XML Index Structure with Bottom-Up Query Processing

An Efficient XML Index Structure with Bottom-Up Query Processing An Efficient XML Index Structure with Bottom-Up Query Processing Dong Min Seo, Jae Soo Yoo, and Ki Hyung Cho Department of Computer and Communication Engineering, Chungbuk National University, 48 Gaesin-dong,

More information

ADT 2010 ADT XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing

ADT 2010 ADT XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing 1 XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ MonetDB/XQuery: Updates Schedule 9.11.1: RDBMS back-end support

More information

QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS

QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS Petr Lukáš, Radim Bača, and Michal Krátký Petr Lukáš, Radim Bača, and Michal Krátký Department of Computer Science, VŠB

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

Query Processing and Optimization in Native XML Databases

Query Processing and Optimization in Native XML Databases Query Processing and Optimization in Native XML Databases Ning Zhang David R. Cheriton School of Computer Science University of Waterloo nzhang@uwaterloo.ca Technical Report CS-2006-29 August 2006 Abstract

More information

MemBeR: A Micro-benchmark Repository for XQuery

MemBeR: A Micro-benchmark Repository for XQuery MemBeR: A Micro-benchmark Repository for XQuery Loredana Afanasiev 1, Ioana Manolescu 2, and Philippe Michiels 3 1 University of Amsterdam, The Netherlands, lafanasi@science.uva.nl 2 INRIA Futurs & LRI,

More information

Using an Oracle Repository to Accelerate XPath Queries

Using an Oracle Repository to Accelerate XPath Queries Using an Oracle Repository to Accelerate XPath Queries Colm Noonan, Cian Durrigan, and Mark Roantree Interoperable Systems Group, Dublin City University, Dublin 9, Ireland {cnoonan, cdurrigan, mark}@computing.dcu.ie

More information

Full-Text and Structural XML Indexing on B + -Tree

Full-Text and Structural XML Indexing on B + -Tree Full-Text and Structural XML Indexing on B + -Tree Toshiyuki Shimizu 1 and Masatoshi Yoshikawa 2 1 Graduate School of Information Science, Nagoya University shimizu@dl.itc.nagoya-u.ac.jp 2 Information

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 7 - Query execution CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 7 - Query execution References Generalized Search Trees for Database Systems. J. M. Hellerstein, J. F. Naughton

More information

Analysis of Different Approaches for Storing GML Documents

Analysis of Different Approaches for Storing GML Documents Analysis of Different Approaches for Storing GML Documents J. E. Córcoles Secc. Tecnología de la Información Universidad de Castilla-La Mancha Campus Universitario s/n.02071.albacete. Spain +34967599200

More information

XML Index Recommendation with Tight Optimizer Coupling

XML Index Recommendation with Tight Optimizer Coupling XML Index Recommendation with Tight Optimizer Coupling Technical Report CS-2007-22 July 11, 2007 Iman Elghandour University of Waterloo Andrey Balmin IBM Almaden Research Center Ashraf Aboulnaga University

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

Database Management

Database Management Database Management - 2011 Model Answers 1. a. A data model should comprise a structural part, an integrity part and a manipulative part. The relational model provides standard definitions for all three

More information

SFilter: A Simple and Scalable Filter for XML Streams

SFilter: A Simple and Scalable Filter for XML Streams SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,

More information

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel

More information

Data Centric Integrated Framework on Hotel Industry. Bridging XML to Relational Database

Data Centric Integrated Framework on Hotel Industry. Bridging XML to Relational Database Data Centric Integrated Framework on Hotel Industry Bridging XML to Relational Database Introduction extensible Markup Language (XML) is a promising Internet standard for data representation and data exchange

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

The Michigan Benchmark: Towards XML Query Performance Diagnostics

The Michigan Benchmark: Towards XML Query Performance Diagnostics The Michigan Benchmark: Towards XML Query Performance Diagnostics Kanda Runapongsa Jignesh M. Patel H. V. Jagadish Yun Chen Shurug Al-Khalifa University of Michigan 1301 Beal Avenue; Ann Arbor, MI 48109-2122;

More information

XML Filtering Technologies

XML Filtering Technologies XML Filtering Technologies Introduction Data exchange between applications: use XML Messages processed by an XML Message Broker Examples Publish/subscribe systems [Altinel 00] XML message routing [Snoeren

More information

TDDD43. Theme 1.2: XML query languages. Fang Wei- Kleiner h?p:// TDDD43

TDDD43. Theme 1.2: XML query languages. Fang Wei- Kleiner h?p://  TDDD43 Theme 1.2: XML query languages Fang Wei- Kleiner h?p://www.ida.liu.se/~ Query languages for XML Xpath o Path expressions with conditions o Building block of other standards (XQuery, XSLT, XLink, XPointer,

More information

CXHist : An On-line Classification-Based Histogram for XML String Selectivity Estimation

CXHist : An On-line Classification-Based Histogram for XML String Selectivity Estimation CXHist : An On-line Classification-Based Histogram for XML String Selectivity Estimation Lipyeow Lim 1 Min Wang 1 Jeffrey Scott Vitter 2 1 IBM T. J. Watson Research Center 19 Skyline Drive Hawthorne, NY

More information

Extending database technology: a new document data type

Extending database technology: a new document data type Extending database technology: a new document data type Stefania Leone Departement of Informatics, University of Zurich Binzmuehlestr. 14, 8050 Zurich, Switzerland leone@ifi.unizh.ch Abstract. Our research

More information

SQL, XQuery, and SPARQL:Making the Picture Prettier

SQL, XQuery, and SPARQL:Making the Picture Prettier SQL, XQuery, and SPARQL:Making the Picture Prettier Jim Melton, Oracle Corporation, Copyright 2007 Oracle, jim.melton@acm.org Introduction Last year, we asked what s wrong with this picture? regarding

More information

TwigList: Make Twig Pattern Matching Fast

TwigList: Make Twig Pattern Matching Fast TwigList: Make Twig Pattern Matching Fast Lu Qin, Jeffrey Xu Yu, and Bolin Ding The Chinese University of Hong Kong, China {lqin,yu,blding}@se.cuhk.edu.hk Abstract. Twig pattern matching problem has been

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Title: STEP: Extending Relational Query Engines for Efficient XML Query Processing

Title: STEP: Extending Relational Query Engines for Efficient XML Query Processing Paper ID: 258 Title: STEP: Extending Relational Query Engines for Efficient XML Query Processing Authors: Feng Tian, David J. DeWitt Topic Area: Core Database Technology Category: Research Subject Area:

More information

XQuery Query Processing in Relational Systems

XQuery Query Processing in Relational Systems XQuery Query Processing in Relational Systems by Yingwen Chen A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

Pathfinder: Compiling XQuery for Execution on the Monet Database Engine

Pathfinder: Compiling XQuery for Execution on the Monet Database Engine Pathfinder: Compiling XQuery for Execution on the Monet Database Engine Jens Teubner University of Konstanz Dept. of Computer & Information Science Box D 188, 78457 Konstanz, Germany teubner@inf.uni-konstanz.de

More information

XML/Relational mapping Introduction of the Main Challenges

XML/Relational mapping Introduction of the Main Challenges HELSINKI UNIVERSITY OF TECHNOLOGY November 30, 2004 Telecommunications Software and Multimedia Laboratory T-111.590 Research Seminar on Digital Media (2-5 cr.): Autumn 2004: Web Service Technologies XML/Relational

More information

XML Native Storage and Query Processing

XML Native Storage and Query Processing XML Native Storage and Query Processing Ning Zhang Facebook M. Tamer Özsu University of Waterloo, Canada ABSTRACT As XML has evolved as a data model for semi-structured data and the de facto standard for

More information

Join Processing for Flash SSDs: Remembering Past Lessons

Join Processing for Flash SSDs: Remembering Past Lessons Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of

More information

A New Way of Generating Reusable Index Labels for Dynamic XML

A New Way of Generating Reusable Index Labels for Dynamic XML A New Way of Generating Reusable Index Labels for Dynamic XML P. Jayanthi, Dr. A. Tamilarasi Department of CSE, Kongu Engineering College, Perundurai 638 052, Erode, Tamilnadu, India. Abstract XML now

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Child Prime Label Approaches to Evaluate XML Structured Queries

Child Prime Label Approaches to Evaluate XML Structured Queries Child Prime Label Approaches to Evaluate XML Structured Queries Shtwai Abdullah Alsubai Department of Computer Science the University of Sheffield This thesis is submitted for the degree of Doctor of Philosophy

More information

Towards microbenchmarking. June 30, 2006

Towards microbenchmarking. June 30, 2006 1 Towards microbenchmarking XQuery June 30, 2006 Ioana Manolescu Cedric Miachon Philippe Michiels INRIA Futurs, France Univ. Paris XI, France Univ. Antwerp, Belgium 2 Plan Micro-benchmark principles Choosing

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

XML: Extensible Markup Language

XML: Extensible Markup Language XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified

More information

RiMOM Results for OAEI 2009

RiMOM Results for OAEI 2009 RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn

More information

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany

More information

Relational Query Optimization

Relational Query Optimization Relational Query Optimization Module 4, Lectures 3 and 4 Database Management Systems, R. Ramakrishnan 1 Overview of Query Optimization Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

An Implementation of Tree Pattern Matching Algorithms for Enhancement of Query Processing Operations in Large XML Trees

An Implementation of Tree Pattern Matching Algorithms for Enhancement of Query Processing Operations in Large XML Trees An Implementation of Tree Pattern Matching Algorithms for Enhancement of Query Processing Operations in Large XML Trees N. Murugesan 1 and R.Santhosh 2 1 PG Scholar, 2 Assistant Professor, Department of

More information

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments Administrivia Midterm on Thursday 10/18 CS 133: Databases Fall 2018 Lec 12 10/16 Prof. Beth Trushkowsky Assignments Lab 3 starts after fall break No problem set out this week Goals for Today Cost-based

More information

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW XML and Databases Lecture 10 XPath Evaluation using RDBMS Sebastian Maneth NICTA and UNSW CSE@UNSW -- Semester 1, 2009 Outline 1. Recall pre / post encoding 2. XPath with //, ancestor, @, and text() 3.

More information