Data Structures for Maintaining Path Statistics in Distributed XML Stores

Size: px
Start display at page:

Download "Data Structures for Maintaining Path Statistics in Distributed XML Stores"

Transcription

1 Data Structures for Maintaining Path Statistics in Distributed XML Stores c Yury Soldak Department of Computer Science, Saint-Petersburg State University University Prospekt 28 Saint-Petersburg Russian Federation ysoldak@acm.org Abstract The paper contains description of distributed XML store model based on notion of distributed XML document. Classification of XPath expressions is defined and the notion of distributed XML document is introduced. Definition of DataGuide-based statistical structure for XML stores is proposed and two possible approaches to maintain its actuality are discussed. Stability of feedback-based approach is shown. Generalization of the structure on distributed case is described. 1 Introduction & Related Work Developed for data exchange on the Web, XML becomes widely accepted. It is very likely that most of data on the Web can be reached in a form of XML documents in the nearest future. Furthermore, data is stored in XML on many sites already. As for the Web, it can be characterized as fairly unpredictable network of heterogeneous data sources [9]. The development of different aspects related to XML-query evaluation on the Web is the topical problem. Particularly this is true for set of remote servers which form a distributed XML store. Many open issues exist in the area of effective distributed XML query evaluation (sec. 2.2). Two papers focused on related problems are the background of the current work. In the first paper [1] two techniques were proposed for estimating the selectivity of simple path expressions over large-scale XML data: path trees and Markov tables. Both techniques summarize complex and large-scale data in a small amount of memory then use this summary for selectivity estimation. An idea of exploiting path tree to store statistical information is obtained from the paper and heavily used in the current work. The second paper [7] introduces XPathLearner, a technique for estimating selectivity of simple path expressions based on a feedback analysis. XPathLearner stores statistics in a Markov table. As considered further, this is not the best solution in the case of distributed XML Proceedings of the Spring Young Researcher s Colloquium on Database and Information Systems, Moscow, Russia, 2006 store. Primary goal of the current paper is to define structure which will be (a) suited for the distributed case and (b) a convenient basis for developing XPathLearner-like solution. Both papers mentioned above study problems of harvesting, updating and storing selectivity statistics in a global scope. In other words, there is no way to estimate the path expression selectivity for the particular site of the store. Therefore, the techniques lack for one of the most needful features for effective distributed queries evaluation (sec. 2.2). The rest of the paper is organized as follows. In the section 2 we describe a distributed data model used in the paper and discuss the problems related to the distributed query evaluation. Then the query optimizer structure, the place and importance of statistics module are discussed (section 3). After that, in the section 4, one can find XPath expressions classification we use. XML Tree Sibling Summarization structure and related issues are considered in the section 5. And, finally, the section 6 contains conclusions. 2 Distributed XML Store 2.1 Model Definition Distributed XML document (DXML document) is a document which contains at least one XInclude[10] or XLink[11] element inside it s body. Definition We name an DXML document locally distributed in the case of all included (or linked) XML fragments and the including document itself reside at the same server. DXML documents as defined above are the building blocks of our distributed XML store. Example of an employee list for a multi-office company is shown in Figure 1. Every office has its own employee list which is managed independently on other offices and located at separate site. Every such list changes constantly and unpredictably depending on hires, dismissals and small changes in personal data of any employee. Any HR manager can add (or remove) some elements into own part of the employee list even in the case of common person description structure is developed. As a result we have true distributed semistructured XML

2 <company xmlns:xi=" xmlns:xl=" > <name>the very big company</name> <staff> <office id="main"> <person position="ceo" office="main"> <name>john Smith</name> </person> <xi:include xi:href="/db/rnd.xml" xi:xpointer="element(/persons/person)" /> <xi:include xi:href="/db/qa.xml" xi:xpointer="element(/persons/person)" /> <xl:link xl:type="simple" xl:href="/db/managers.xml#xpointer(//person)" /> </office> <office id="o1"> <xi:include xi:href=" xi:xpointer="element(/staff/person)" /> </office> <office id="o2"> <xi:include xi:href=" xi:xpointer="element(/staff/persons/person)" /> </office> </staff> </company> Figure 1: Distributed XML document store. This store is the simplest example based on a single distributed document. Of course distributed store can contain any number of documents (distributed or local). Furthermore, it is absolutely not necessary that roots of distributed documents belong to a single server. Having DXML document we can define several separate parts one part for each site. We assume that these sites are independently maintained. So they may perfectly belong to different companies. Sites are some kind of black boxes to each other. The only requirement is the interface to query xml data on each site. There are no restrictions on the type of the interface. Sites can understand queries on any known xml query language. We use XQuery-over-HTTP approach for prototyping. during evaluation of query listed in Figure 2. These sequences might be obtained in several different ways. For example, query evaluator naively obtains all the person elements for each office (sends simple queries to the corresponding servers), then locally joins two (possibly) big sequences. Obviously, described approach is not optimal. Approaches similar to semi-joins for distributed RDBMSs are more attractive. We have to know selectivity of the path expressions (for person and familyname elements in our case) in order to use them. Moreover, number of distinct values for resulting node sequences (so-called distinct selectivity) is of interest too. And finally, it is important to know selectivity with regard to a server, not just abstract selectivity in the global scope. This example shows the crucial role of XPath selectivity estimation for evaluation of queries on DXML documents. XML query optimizer structure and place of an XPath selectivity estimator in it are discussed in the next section. 3 XML Queries Optimization 3.1 Optimizer structure & general issues 2.2 Query Evaluation It would be really useful to query DXML with the conventional XQuery language. The result we need to obtain is equal to the result when all parts of DXML are downloaded from remote servers and merged into temporary local XML document on which existing XQuery evaluator runs our query. Described is the naive implementation and expected to be very slow. It is required to evaluate queries on DXML more effectively. In other words, the query optimizer for local documents should be extended to generate optimal query plans for DXML and the query evaluator should be able to evaluate these new query plans. Figure 2 presents a simple example of the query on distributed XML store. The query obtains information about persons from office 1 which possibly have relatives (i.e. persons with the same family name) working at office 2. Here company.xml is the distributed XML document and information about two offices is included into it with the help of two XInclude elements (see Figure 1). for $p1 in doc( company.xml )//office[@id= o1 ]//person, $p2 in distinct-values(doc( company.xml )//office[@id= o2 ]//person/familyname) where $p1/familyname = $p2 return $p1 Figure 2: Query on distributed XML store It is necessary to join two sequences $p1 and $p2 Figure 3: Query optimizer Classic query optimizer structure is shown in Figure 3. It consists of two main blocks: logical and physical optimization modules. The logical module rewrites a query using chosen XML algebra rules, the physical module generates various physical execution plans and selects the best of them exploiting execution cost estimator. The cost estimator in turn requires various statistics to estimate a cost. Both relational and XML query optimizers expected to implement this simple architecture. Differences between the optimizers are in implementation. XML optimizers is harder to implement. First of all semistructured optimizers work in terms of more complex data structures (tree or graph structures) than their relational counterparts. Furthermore, XML databases area is not so well developed as the relational one. As a result XML

3 optimizer developers forced to make a lot of (ad-hock) decisions which are not grounded theoretically and are not proven to be best as it is the case for RDMSs. For example, XML database developers have no even single widely-accepted XML algebra to use. Implementation of a physical optimizer for XML databases is really the challenge today. We try to make one step forward in that direction developing statistics module which can be exploited by cost estimator. 3.2 Cost Estimator Another challenge is related to the distributed nature of source data. Cost estimator should be aware on this store specifics. Different cost models exist in distributed environment [6]. The very simple model would be to estimate the cost of evaluation of a path expression as k n where n is the estimated selectivity of the expression and k is the host-specific parameter. The parameter would be small for fast database components and large for slow ones or components which are only reachable by slow communication link. The k parameter can also depend on size of elements reachable by the path expression due to the fact that both serialization and transmission of large elements are very costly operations. Structures developed with the purpose of support distributed cost models with necessary statistical information are described in the section 5. 4 XPath Expressions Classification The XQuery language uses XPath expressions to define sequences of XML elements on which operations are performed. XPath expressions are used in XQuery queries in many different ways, so notation of these expressions vary significantly. Several expression types are defined here. These definitions will be used in the following sections. The list below contains 4 characteristics which are necessary to check in order to classify XPath expression: First sequence construction method (the very first step) Presence of predicates (which are not identically true) in step definitions Direction of step axes Presence of branchpoints An element sequence is the input and output of any XPath step. The very first sequence in XPath expression may be defined either by function call (document(), collection()) or by a variable reference. In the first case we name expression functional, in the second case variational. Every XPath step contains predicate expression (omitted in step notion when identically true). An XPath expression is predicative or simple depending on presence of at least one predicate in the expression notion. There are 13 axes defined in XPath specification. In [4] four major directions (ancestor, descendant, following and preceding) are defined. As a result some of XPath axes are co-directed in terms of major directions. For example axes parent and ancestor are co-directed, but parent and following-sibling are not. XPath expression is directed if and only if all its steps are co-directed and multidirected otherwise. In special cases number of major directions can be explicitly defined for multidirected expressions. And finally XPath expression is branched if and only if at least one of steps has branchpoint (name test of kind (a b... c)). Examples: doc( foo.xml )/a/b/c - functional simple directed nonbranched doc( foo.xml )/a//b/c[@e = 1 ] - functional predicative directed nonbranched doc( foo.xml )//a[@b = 3 ]/following::c - functional predicative multidirected nonbranched $v/a/(b[@c = $w] d) - variational predicative directed branched 5 XML Tree Sibling Summarization 5.1 Definition Conception of DataGuides is widely known to semistructured data researchers. It was originally introduced in [3]. From that times till present DataGuides are used as a base for indexes (for example [2]) and structures for statistical information representation [1]. All statistical structures defined in the paper are based on the DataGuide notion. This gives us several benefits. Small amount of memory required to store statistics is one of major benefits and not the single. DataGuide-like structures can be easily extended to support distributed case as shown in the section 5.6. Original DataGuide was developed for the area where only parent-child relations are used. As a result it is impossible to extend the structure to support all kinds of XPath expressions. Only simple directed XPath expressions where major direction is descendant (child, descendant and descendant-self axes) are supported in all the structures considered below. Furthermore, only functional XPath expressions are studied for now and not variational ones. Each branched expression is splitted to several nonbranched which are studied separately. XML Tree Sibling Summarization (XTSS) structure is developed for maintaining XPath selectivity statistics for any ordinary XML document. This is the DataGuide tree where every node keeps number of sibling XML elements with the same name joined to construct the node as well as name of these elements. Every arc defines parent-child relationship for source elements. In Figure 4 an example of XTSS is shown at right and its respective source XML document is shown at left. 5.2 Construction in Offline XQuery query in Figure 5 recursively constructs XTSS for XML document (In our particular case the XML document contains text of Shakespeare s Macbeth) This query was evaluated on Ipedo[5] and exist[8] Native XML DBMS in order to obtain approximations

4 constructor fetches XPath expressions from the user query and their respective selectivities from the query result. Then builds XTSS branch for each XPath and adds it to the (partial) XTSS. In the case of branch already exists in the XTSS, the selectivity value of respecive node is updated. Theoretically whole XTSS can be built this way. Practically, however, we will always have only part of whole XTSS depending on queries evaluated. Moreover, selectivity values expected to be only in the leaves and rarely in the intermediate nodes. Figure 4: XML Tree Sibling Summarization define function xtss( $seq as node, $deep as integer ) as node { let $newdeep := $deep + 1 let $names := for $s in $seq return name($s) let $dnames := distinct-values($names) for $name in $dnames let $nodes := $seq[name() = $name] let $nextnodes := $nodes/* return element {$name} { attribute { c } {count($nodes)}, attribute { d } {$deep}, xtss($nextnodes,$newdeep) } } xtss(document( db/plays/macbeth.xml )/*, 0) Figure 5: XTSS generation XQuery for time required to construct whole XTSS for middlesized XML document. Of course this routine should work much faster when coded as part of query engine. Here we try to define higher bound for XTSS construction time. Node constructors were removed from query to minimize query execution time. The results of this quick experiment are shown in Table 1. XML DBMS Time (secs) Ipedo exist Table 1: XTSS offline construction time Obviously construction of whole XTSS is a costly operation. The process of constructing an XTSS for each document in DB with thousands of documents will run too long. Moreover, XTSS should be maintained and will force us to start described process from time to time. This approach can be very resource consuming and so is not good for statistical structure. The solution is to construct partial XTSS following the online (or feedback) approach. 5.3 Feedback Approach Feedback approach let us construct XTSS branch by branch exploiting results of the user queries. An XTSS (a) /a/b/c & /a/d (b) /a//e added Figure 6: Partial XTSS The partial XTSS obtained after evaluating /a/b/c and /a/d path expressions is shown in Figure 6(a). Any information about processed XPath expressions is valuable when feedback approach is used. Unfortunately complete information is not always available. The most frequent case of that incompleteness is evaluation of steps with descendant axis. In order to fill the gap the partial XTSS notion was enriched with ancestordescendant (also named generalized) arcs marked by * at figures. Figure 6(b) presents an example of partial XTSS with one generalized arc added. Using generalized arcs we can obtain data duplication in our structure. This is not good for structure size and statistics accuracy. Assume two path expressions were evaluated: /a//e at first and then /a/b/e. Depending on structure of the source data, the first expression may (and may not) define sequence of more XML elements. Leaving both branches in the XTSS we obtain the data duplication problem. On the other hand it is possible to leave only one of these branches and possibly hit accuracy. It was decided to leave most concrete branch (/a/b/e in our case) each time we have situation like described. Following that rule we ll avoid data duplication and can hit accuracy in case of generalized expression defines larger sequence than concrete one. This is the price we pay for graceful and predictable statistical structure. The decision is based on our experience in real-world applications development. The problem mentioned above is rarely appear there. In many cases the generalized expressions are used in place of more effective concrete ones in order to reduce the size of a query textual representation. In the case of selectivity of evaluated expression equals 0, the branch is not added to the XTSS or is removed in case the branch exists in the XTSS already. In some cases we don t remove whole branch, but cut it at first branching node reached from the branch leaf. 5.4 Ambiguity During Updates Handling of generalized path expressions faces the maintenance problem: ambiguity during distribution of new

5 selectivity of generalized path expression among all satisfying XTSS nodes. Having generalized path expression let us assume that all the satisfying paths in the source data have corresponding nodes in the XTSS. Otherwise correct selectivity distribution is impossible. Following the feedback way we always should assume something. The new selectivity is achieved after expression evaluation. The question is how to distribute this selectivity among all satisfying nodes. It is clear that common selectivity may decrease or increase. Let us assume the later is true and difference is d n. Let S n and S n 1 are current and previous common selectivities respectively where S n S n 1 = d n. S n = m where m is the number of nodes and s i n is the selectivity of i-th node. At least three distribution approaches exist: equal, proportional and history-based. The equal method distributes difference by the simplest formula: s i n s i n = s i n 1 + d n /m Using proportional method difference is distributed in following way: s i n = s i n 1 S n /S n 1 The additional information is necessary to be stored in each XTSS node in order to use the third approach. This information is the value of increment ˆd i made during recent node update after evaluation of a concrete path expression. Selectivity distribution formila for this case is following: s i n = s i n 1 + d n ˆd i / ˆd, ˆd = m Clearly, described approaches are just simplest and not all the possible ones. More approaches can be developed. For example, it is possible to maintain selectivity alteration frequency for each node and then use this information to distribute selectivity as separate (the fourth) method or to improve the third approach. The store structure and behavior of a stored data define the distribution approach. Unfortunately any approach can t guarantee accurate difference distribution. However we state that regardless of method of use XTSS will contain accurate (or very close to accurate) selectivity values if queries which affect nodes of interest are evaluated several times. In other words XTSS is the stable structure. The next section proves the statement. 5.5 XTSS Stability Let we have the XTSS part of m nodes where each node defines concrete path expression corresponding to the branch ended in that node. These m nodes and only them satisfy generalized path expression q. Selectivity values stored in that nodes are accurate: d i = ṡ i s i = 0, i = 1 : m ˆd i where ṡ i is actual selectivity and s i is stored selectivity. Let us assume source data has been changed so that selectivity values of k XTSS nodes with indexes i I, I = k are no more accurate and should be updated: m D = d i where and i / I di = d i = 0 i I di 0 After evaluation of the q expression we know new common selectivity of m nodes and having previous common selectivity we know D. We should distribute the difference D among m nodes. We don t know which of XTSS nodes actually should be updated, and even don t know the number k of that nodes. So we distribute D among all m candidates following one of the approaches described in the previous section. After that the common selectivity of selected XTSS nodes equals actual selectivity of the q expression and at the same time selectivity values of particular nodes can be wrong (so not accurate). Stability means that the structure tends to contain accurate values. Definition XTSS is accurate if and only if each its node has accurate selectivity value. Since XTSS can be devided into several parts, XTSS is also accurate if and only if each XTSS part is accurate. We use following formula to measure accuracy of XTSS part: m A = d i where m is the number of nodes in measured part and d i is the difference between accurate and stored (in the XTSS node) selectivity value. The part is accurate if and only if A = 0. Proposition 5.1 Let q be a generalized path expression what defines XTSS part of m nodes as described above. Suppose that all elements reachable by q have corresponding nodes in XTSS, select queries are more frequent than update ones, user queries contain both generalized and concrete path expressions Then A 0. Proof Indeed A decreases each time concrete path expression is evaluated because value of the particular node becomes accurate (therefore corresponding d = 0). A remains the same in case of generalized expression evaluation because the distribution approaches do not change A.

6 In some special cases A can stay unchanged for a long time even if concrete and generalized expressions are evaluated constantly. It depends on the set of concrete path expressions which are evaluated. These expressions are frequently used and selectivity values for them stored in XTSS are accurate. So XTSS is accurate for these expressions. If an XTSS node is never accessed to obtain its selectivity (and so refined after expression is evaluated) it can never have accurate selectivity. As a result XTSS is accurate but only for the frequent expressions. This is natural for feedback approach and is enough to say XTSS is stable. The speed of A decreasing depends on a store properties and the only advice to be given is to experiment with the distribution approaches. It is a good idea to turn off the feedback evaluation of generalized path expressions in the case of source data changes so frequently that XTSS has no time for stabilization. 5.6 Generalization of XTSS for Distributed XML Statistical information is required in order to have a possibility to evaluate queries on distributed XML documents efficiently as this is the case for local documents. Having XTSS defined for local documents we ll extend the definition for distributed case introducing Distributed XTSS (DXTSS) notion. DXTSS plays the same role for distributed documents as XTSS for local ones. Both parent-child and ancestor-descendant arc types share a property they define relations between nodes of the same local document. Arcs of the cross-document references appear in the distributed case. They can be not only cross-document, but cross-server too in the case of a distributed document fragments reside at different servers. We name such arcs associative and mark with symbol. In such a way DXTSS is a set of XTSSs connected to each other by associative arcs. One of XTSSs is considered to be main and contains the structure s root. See Figure 7 for example of DXTSS. in our statistical structure in order to use it for cost estimation. The large values not necessarily mean that chain exists, this can be the result of an outdated hardware or overload of a remote server. The real situation is not so important for successful cost estimation. The only crucial information is how long the remote operation lasts. Associative arcs are suited to store that statistics. 6 Conclusions The paper contains description of distributed XML store model based on notion of distributed XML document. It is shown how conventional XQuery language can be used to query stores of that type. Query evaluation issues are discussed and value of path selectivity statistics is shown. XPath expressions classification based on four characteristics is introduced. It can be used by researchers and developers to easily refer to XPath expression classes as we do in the paper. DataGuide-like XML Tree Sibling Summarization structures are defined in the paper. They suited to contain statistical information about XPath expression selectivities and are used by cost estimator module of our XML query optimizer. Generalization of that structure on distributed case is described utilizing associative arcs to put local XTSSs together. Feedback approach to maintain the partial XTSS structure is described and its stability is shown. 7 Acknowledgements I would like to thank my scientific adviser Boris Novikov for his support and valuable comments. Many issues and application patterns of XTSS were discussed with Anton Gubanov and Maxim Lukichev, thank you colleagues for that. References [1] Ashraf Aboulnaga, Alaa R. Alameldeen, and Jeffrey F. Naughton. Estimating the selectivity of XML path expressions for internet scale applications. In The VLDB Journal, pages , [2] A. Fomichev. XML Storing and Processing Techniques. In SYRCoDIS, pages NIIMM, Figure 7: Distributed XTSS It is worth to mention that only one associative arc is allowed in DXTSS branch. This restriction is explained by the fact we consider remote servers to be independent and atomic. So we can t demand any private information (for example, store scheme) from them. It is possible that distributed documents form a chain (or even cycle) including parts of each other. But we ll never know exactly about that evaluating a query on a distributed document. It is acceptably to measure and store connection and/or transmission speed (the k parameter of simple cost function considered in the section 3.2) for each associative arc [3] Roy Goldman and Jennifer Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB 97, Proceedings of 23rd International Conference on Very Large Data Bases, pages Morgan Kaufmann, [4] T. Grust. Accelerating XPath location steps. In Proceedings of ACM Conference on Management of Data (SIGMOD), [5] Ipedo XML database website. Website. [6] Donald Krossmann. The State of the Art in Distributed Query Processing. In ACM Computing Surveys, volume 32, pages , 2000.

7 [7] L. Lim, M. Wang, S. Padmanabhan, J. S. Vitter, and R. Parr. XPathLearner: An On-line Self-Tuning Markov Histogram for XML Path Selectivity Estimation. In VLDB, pages , [8] Wolfgang Meier. exist: An Open Source Native XML Database. In Web, Web-Services, and Database Systems, pages , [9] Marko Smiljanic, Henk M. Blanken, Maurice van Keulen, and Willem Jonker. Distributed XML Database Systems. Technical Report TR-CTIT-02-46, CTIT, University of Twente, The Netherlands, October [10] XML Inclusions (XInclude) Version 1.0, 20 December W3C Recommendation. [11] XML Linking Language (XLink) Version 1.0, 27 June W3C Recommendation.

Full-Text and Structural XML Indexing on B + -Tree

Full-Text and Structural XML Indexing on B + -Tree Full-Text and Structural XML Indexing on B + -Tree Toshiyuki Shimizu 1 and Masatoshi Yoshikawa 2 1 Graduate School of Information Science, Nagoya University shimizu@dl.itc.nagoya-u.ac.jp 2 Information

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

Cardinality estimation of navigational XPath expressions

Cardinality estimation of navigational XPath expressions University of Twente Department of Electrical Engineering, Mathematics and Computer Science Database group Cardinality estimation of navigational XPath expressions Gerben Broenink M.Sc. Thesis 16 June

More information

Some aspects of references behaviour when querying XML with XQuery

Some aspects of references behaviour when querying XML with XQuery Some aspects of references behaviour when querying XML with XQuery c B.Khvostichenko boris.khv@pobox.spbu.ru B.Novikov borisnov@acm.org Abstract During the XQuery query evaluation, the query output is

More information

ADT 2009 Other Approaches to XQuery Processing

ADT 2009 Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath

More information

Indexing XML Data with ToXin

Indexing XML Data with ToXin Indexing XML Data with ToXin Flavio Rizzolo, Alberto Mendelzon University of Toronto Department of Computer Science {flavio,mendel}@cs.toronto.edu Abstract Indexing schemes for semistructured data have

More information

Ecient XPath Axis Evaluation for DOM Data Structures

Ecient XPath Axis Evaluation for DOM Data Structures Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be

More information

Supporting Positional Predicates in Efficient XPath Axis Evaluation for DOM Data Structures

Supporting Positional Predicates in Efficient XPath Axis Evaluation for DOM Data Structures Supporting Positional Predicates in Efficient XPath Axis Evaluation for DOM Data Structures Torsten Grust Jan Hidders Philippe Michiels Roel Vercammen 1 July 7, 2004 Maurice Van Keulen 1 Philippe Michiels

More information

A Structural Numbering Scheme for XML Data

A Structural Numbering Scheme for XML Data A Structural Numbering Scheme for XML Data Alfred M. Martin WS2002/2003 February/March 2003 Based on workout made during the EDBT 2002 Workshops Dao Dinh Khal, Masatoshi Yoshikawa, and Shunsuke Uemura

More information

Keyword Search over Hybrid XML-Relational Databases

Keyword Search over Hybrid XML-Relational Databases SICE Annual Conference 2008 August 20-22, 2008, The University Electro-Communications, Japan Keyword Search over Hybrid XML-Relational Databases Liru Zhang 1 Tadashi Ohmori 1 and Mamoru Hoshi 1 1 Graduate

More information

Symmetrically Exploiting XML

Symmetrically Exploiting XML Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

TDDD43. Theme 1.2: XML query languages. Fang Wei- Kleiner h?p:// TDDD43

TDDD43. Theme 1.2: XML query languages. Fang Wei- Kleiner h?p://  TDDD43 Theme 1.2: XML query languages Fang Wei- Kleiner h?p://www.ida.liu.se/~ Query languages for XML Xpath o Path expressions with conditions o Building block of other standards (XQuery, XSLT, XLink, XPointer,

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

Estimating Result Size and Execution Times for Graph Queries

Estimating Result Size and Execution Times for Graph Queries Estimating Result Size and Execution Times for Graph Queries Silke Trißl 1 and Ulf Leser 1 Humboldt-Universität zu Berlin, Institut für Informatik, Unter den Linden 6, 10099 Berlin, Germany {trissl,leser}@informatik.hu-berlin.de

More information

ADT 2010 ADT XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing

ADT 2010 ADT XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing 1 XQuery Updates in MonetDB/XQuery & Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ MonetDB/XQuery: Updates Schedule 9.11.1: RDBMS back-end support

More information

Element Algebra. 1 Introduction. M. G. Manukyan

Element Algebra. 1 Introduction. M. G. Manukyan Element Algebra M. G. Manukyan Yerevan State University Yerevan, 0025 mgm@ysu.am Abstract. An element algebra supporting the element calculus is proposed. The input and output of our algebra are xdm-elements.

More information

Estimating the Selectivity of XML Path Expression with predicates by Histograms

Estimating the Selectivity of XML Path Expression with predicates by Histograms Estimating the Selectivity of XML Path Expression with predicates by Histograms Yu Wang 1, Haixun Wang 2, Xiaofeng Meng 1, and Shan Wang 1 1 Information School, Renmin University of China, Beijing 100872,

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Semi-structured Data. 8 - XPath

Semi-structured Data. 8 - XPath Semi-structured Data 8 - XPath Andreas Pieris and Wolfgang Fischl, Summer Term 2016 Outline XPath Terminology XPath at First Glance Location Paths (Axis, Node Test, Predicate) Abbreviated Syntax What is

More information

Part XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440

Part XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440 Part XVII Staircase Join Tree-Aware Relational (X)Query Processing Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440 Outline of this part 1 XPath Accelerator Tree aware relational

More information

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis Informatics 1: Data & Analysis Lecture 11: Navigating XML using XPath Ian Stark School of Informatics The University of Edinburgh Tuesday 23 February 2016 Semester 2 Week 6 http://blog.inf.ed.ac.uk/da16

More information

Optimising XML-Based Web Information Systems

Optimising XML-Based Web Information Systems Optimising XML-Based Web Information Systems Colm Noonan and Mark Roantree Interoperable Systems Group, Dublin City University, Ireland - {mark,cnoonan}@computing.dcu.ie Abstract. Many Web Information

More information

Index-Driven XQuery Processing in the exist XML Database

Index-Driven XQuery Processing in the exist XML Database Index-Driven XQuery Processing in the exist XML Database Wolfgang Meier wolfgang@exist-db.org The exist Project XML Prague, June 17, 2006 Outline 1 Introducing exist 2 Node Identification Schemes and Indexing

More information

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views

H2 Spring B. We can abstract out the interactions and policy points from DoDAF operational views 1. (4 points) Of the following statements, identify all that hold about architecture. A. DoDAF specifies a number of views to capture different aspects of a system being modeled Solution: A is true: B.

More information

An Efficient XML Node Identification and Indexing Scheme

An Efficient XML Node Identification and Indexing Scheme An Efficient XML Node Identification and Indexing Scheme Jan-Marco Bremer and Michael Gertz Department of Computer Science University of California, Davis One Shields Ave., Davis, CA 95616, U.S.A. {bremer

More information

XQuery Optimization Based on Rewriting

XQuery Optimization Based on Rewriting XQuery Optimization Based on Rewriting Maxim Grinev Moscow State University Vorob evy Gory, Moscow 119992, Russia maxim@grinev.net Abstract This paper briefly describes major results of the author s dissertation

More information

Integrating Path Index with Value Index for XML data

Integrating Path Index with Value Index for XML data Integrating Path Index with Value Index for XML data Jing Wang 1, Xiaofeng Meng 2, Shan Wang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080 Beijing, China cuckoowj@btamail.net.cn

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

XQuery Optimization in Relational Database Systems

XQuery Optimization in Relational Database Systems XQuery Optimization in Relational Database Systems Riham Abdel Kader Supervised by Maurice van Keulen Univeristy of Twente P.O. Box 217 7500 AE Enschede, The Netherlands r.abdelkader@utwente.nl ABSTRACT

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

An XML-IR-DB Sandwich: Is it Better With an Algebra in Between?

An XML-IR-DB Sandwich: Is it Better With an Algebra in Between? An XML-IR-DB Sandwich: Is it Better With an Algebra in Between? Vojkan Mihajlović Djoerd Hiemstra Henk Ernst Blok Peter M. G. Apers CTIT, University of Twente P.O. Box 217, 7500AE Enschede, The Netherlands

More information

Main Memory and the CPU Cache

Main Memory and the CPU Cache Main Memory and the CPU Cache CPU cache Unrolled linked lists B Trees Our model of main memory and the cost of CPU operations has been intentionally simplistic The major focus has been on determining

More information

XML and Databases. Lecture 9 Properties of XPath. Sebastian Maneth NICTA and UNSW

XML and Databases. Lecture 9 Properties of XPath. Sebastian Maneth NICTA and UNSW XML and Databases Lecture 9 Properties of XPath Sebastian Maneth NICTA and UNSW CSE@UNSW -- Semester 1, 2009 Outline 1. XPath Equivalence 2. No Looking Back: How to Remove Backward Axes 3. Containment

More information

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW

XML and Databases. Lecture 10 XPath Evaluation using RDBMS. Sebastian Maneth NICTA and UNSW XML and Databases Lecture 10 XPath Evaluation using RDBMS Sebastian Maneth NICTA and UNSW CSE@UNSW -- Semester 1, 2009 Outline 1. Recall pre / post encoding 2. XPath with //, ancestor, @, and text() 3.

More information

Lecture 13 Thursday, March 18, 2010

Lecture 13 Thursday, March 18, 2010 6.851: Advanced Data Structures Spring 2010 Lecture 13 Thursday, March 18, 2010 Prof. Erik Demaine Scribe: David Charlton 1 Overview This lecture covers two methods of decomposing trees into smaller subtrees:

More information

Using an Oracle Repository to Accelerate XPath Queries

Using an Oracle Repository to Accelerate XPath Queries Using an Oracle Repository to Accelerate XPath Queries Colm Noonan, Cian Durrigan, and Mark Roantree Interoperable Systems Group, Dublin City University, Dublin 9, Ireland {cnoonan, cdurrigan, mark}@computing.dcu.ie

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Efficient Implementation of XQuery Constructor Expressions

Efficient Implementation of XQuery Constructor Expressions Efficient Implementation of XQuery Constructor Expressions c Maxim Grinev Leonid Novak Ilya Taranov Institute of System Programming Abstract Element constructor is one of most expensive operations of the

More information

Arbori Starter Manual Eugene Perkov

Arbori Starter Manual Eugene Perkov Arbori Starter Manual Eugene Perkov What is Arbori? Arbori is a query language that takes a parse tree as an input and builds a result set 1 per specifications defined in a query. What is Parse Tree? A

More information

Mining XML data: A clustering approach

Mining XML data: A clustering approach Mining XML data: A clustering approach Saraee, MH and Aljibouri, J Title Authors Type URL Published Date 2005 Mining XML data: A clustering approach Saraee, MH and Aljibouri, J Conference or Workshop Item

More information

Processing Rank-Aware Queries in P2P Systems

Processing Rank-Aware Queries in P2P Systems Processing Rank-Aware Queries in P2P Systems Katja Hose, Marcel Karnstedt, Anke Koch, Kai-Uwe Sattler, and Daniel Zinn Department of Computer Science and Automation, TU Ilmenau P.O. Box 100565, D-98684

More information

Semantic Characterizations of XPath

Semantic Characterizations of XPath Semantic Characterizations of XPath Maarten Marx Informatics Institute, University of Amsterdam, The Netherlands CWI, April, 2004 1 Overview Navigational XPath is a language to specify sets and paths in

More information

METAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S.

METAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S. Utah State University From the SelectedWorks of Curtis Dyreson December, 2001 METAXPath Curtis Dyreson, Utah State University Michael H. Böhen Christian S. Jensen Available at: https://works.bepress.com/curtis_dyreson/11/

More information

Informatics 1: Data & Analysis

Informatics 1: Data & Analysis T O Y H Informatics 1: Data & Analysis Lecture 11: Navigating XML using XPath Ian Stark School of Informatics The University of Edinburgh Tuesday 26 February 2013 Semester 2 Week 6 E H U N I V E R S I

More information

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS Department of Computer Science University of Babylon LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS By Faculty of Science for Women( SCIW), University of Babylon, Iraq Samaher@uobabylon.edu.iq

More information

Nested Intervals Tree Encoding with Continued Fractions

Nested Intervals Tree Encoding with Continued Fractions Nested Intervals Tree Encoding with Continued Fractions VADIM TROPASHKO Oracle Corp There is nothing like abstraction To take away your intuition Shai Simonson http://aduniorg/courses/discrete/ We introduce

More information

XML Systems & Benchmarks

XML Systems & Benchmarks XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise

More information

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018

XPath and XQuery. Introduction to Databases CompSci 316 Fall 2018 XPath and XQuery Introduction to Databases CompSci 316 Fall 2018 2 Announcements (Tue. Oct. 23) Homework #3 due in two weeks Project milestone #1 feedback : we are a bit behind, but will definitely release

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

An Efficient XML Index Structure with Bottom-Up Query Processing

An Efficient XML Index Structure with Bottom-Up Query Processing An Efficient XML Index Structure with Bottom-Up Query Processing Dong Min Seo, Jae Soo Yoo, and Ki Hyung Cho Department of Computer and Communication Engineering, Chungbuk National University, 48 Gaesin-dong,

More information

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data

More information

XPath. by Klaus Lüthje Lauri Pitkänen

XPath. by Klaus Lüthje Lauri Pitkänen XPath by Klaus Lüthje Lauri Pitkänen Agenda Introduction History Syntax Additional example and demo Applications Xpath 2.0 Future Introduction Expression language for Addressing portions of an XML document

More information

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data.

Keywords: Binary Sort, Sorting, Efficient Algorithm, Sorting Algorithm, Sort Data. Volume 4, Issue 6, June 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient and

More information

11. XML storage details Introduction Last Lecture Introduction Introduction. XML Databases XML storage details

11. XML storage details Introduction Last Lecture Introduction Introduction. XML Databases XML storage details XML Databases Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 2 11.1 Last Lecture Different methods for storage of XML documents

More information

Course: The XPath Language

Course: The XPath Language 1 / 30 Course: The XPath Language Pierre Genevès CNRS University of Grenoble Alpes, 2017 2018 2 / 30 Why XPath? Search, selection and extraction of information from XML documents are essential for any

More information

UNIVERSITY OF TWENTE. Querying Uncertain Data in XML

UNIVERSITY OF TWENTE. Querying Uncertain Data in XML UNIVERSITY OF TWENTE. Querying Uncertain Data in XML X Daniël Knippers MSc Thesis August 214 Y 1 2 1 2 Graduation committee Dr. ir. Maurice van Keulen Dr. Mena Badieh Habib Morgan Abstract This thesis

More information

How to speed up a database which has gotten slow

How to speed up a database which has gotten slow Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents

More information

QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS

QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS QuickXDB: A Prototype of a Native XML QuickXDB: Prototype of Native XML DBMS DBMS Petr Lukáš, Radim Bača, and Michal Krátký Petr Lukáš, Radim Bača, and Michal Krátký Department of Computer Science, VŠB

More information

Knowledge discovery from XML Database

Knowledge discovery from XML Database Knowledge discovery from XML Database Pravin P. Chothe 1 Prof. S. V. Patil 2 Prof.S. H. Dinde 3 PG Scholar, ADCET, Professor, ADCET Ashta, Professor, SGI, Atigre, Maharashtra, India Maharashtra, India

More information

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

An Extended Byte Carry Labeling Scheme for Dynamic XML Data Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 5488 5492 An Extended Byte Carry Labeling Scheme for Dynamic XML Data YU Sheng a,b WU Minghui a,b, * LIU Lin a,b a School of Computer

More information

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today XML Query Processing CPS 216 Advanced Database Systems Announcements (March 31) 2 Course project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

A Methodology for Integrating XML Data into Data Warehouses

A Methodology for Integrating XML Data into Data Warehouses A Methodology for Integrating XML Data into Data Warehouses Boris Vrdoljak, Marko Banek, Zoran Skočir University of Zagreb Faculty of Electrical Engineering and Computing Address: Unska 3, HR-10000 Zagreb,

More information

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

B-Trees. Version of October 2, B-Trees Version of October 2, / 22 B-Trees Version of October 2, 2014 B-Trees Version of October 2, 2014 1 / 22 Motivation An AVL tree can be an excellent data structure for implementing dictionary search, insertion and deletion Each operation

More information

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion

More information

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages:

XML Technologies. Doc. RNDr. Irena Holubova, Ph.D. Web pages: XML Technologies Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz Web pages: http://www.ksi.mff.cuni.cz/~holubova/nprg036/ Outline Introduction to XML format, overview of XML technologies DTD

More information

Lecture 25 Notes Spanning Trees

Lecture 25 Notes Spanning Trees Lecture 25 Notes Spanning Trees 15-122: Principles of Imperative Computation (Spring 2016) Frank Pfenning 1 Introduction The following is a simple example of a connected, undirected graph with 5 vertices

More information

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW

XML and Databases. Outline. Outline - Lectures. Outline - Assignments. from Lecture 3 : XPath. Sebastian Maneth NICTA and UNSW Outline XML and Databases Lecture 10 XPath Evaluation using RDBMS 1. Recall / encoding 2. XPath with //,, @, and text() 3. XPath with / and -sibling: use / size / level encoding Sebastian Maneth NICTA

More information

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = (

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = ( Floating Point Numbers in Java by Michael L. Overton Virtually all modern computers follow the IEEE 2 floating point standard in their representation of floating point numbers. The Java programming language

More information

Dominance Constraints and Dominance Graphs

Dominance Constraints and Dominance Graphs Dominance Constraints and Dominance Graphs David Steurer Saarland University Abstract. Dominance constraints logically describe trees in terms of their adjacency and dominance, i.e. reachability, relation.

More information

Querying Tree-Structured Data Using Dimension Graphs

Querying Tree-Structured Data Using Dimension Graphs Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos 1 and Theodore Dalamagas 2 1 Dept. of Computer Science New Jersey Institute of Technology Newark, NJ 07102 dth@cs.njit.edu 2 School

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees Computer Science 0 Data Structures Siena College Fall 08 Topic Notes: Trees We ve spent a lot of time looking at a variety of structures where there is a natural linear ordering of the elements in arrays,

More information

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height = M-ary Search Tree B-Trees Section 4.7 in Weiss Maximum branching factor of M Complete tree has height = # disk accesses for find: Runtime of find: 2 Solution: B-Trees specialized M-ary search trees Each

More information

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 253 Rank-aware XML Data Model and Algebra: Towards Unifying Exact

More information

XML Databases 11. XML storage details

XML Databases 11. XML storage details XML Databases 11. XML storage details Silke Eckstein Andreas Kupfer Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 11. XML storage details 11.1 Introduction

More information

Course: The XPath Language

Course: The XPath Language 1 / 27 Course: The XPath Language Pierre Genevès CNRS University of Grenoble, 2012 2013 2 / 27 Why XPath? Search, selection and extraction of information from XML documents are essential for any kind of

More information

Data Analytics and Boolean Algebras

Data Analytics and Boolean Algebras Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com

More information

Application-Tailored XML Storage

Application-Tailored XML Storage Application-Tailored XML Storage Maxim Grinev, Ivan Shcheklein Institute for System Programming of the Russian Academy of Sciences maxim@grinev.net, shcheklein@ispras.ru Abstract Several native approaches

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

XML Index Recommendation with Tight Optimizer Coupling

XML Index Recommendation with Tight Optimizer Coupling XML Index Recommendation with Tight Optimizer Coupling Technical Report CS-2007-22 July 11, 2007 Iman Elghandour University of Waterloo Andrey Balmin IBM Almaden Research Center Ashraf Aboulnaga University

More information

COMP9319 Web Data Compression & Search. Cloud and data optimization XPath containment Distributed path expression processing

COMP9319 Web Data Compression & Search. Cloud and data optimization XPath containment Distributed path expression processing COMP9319 Web Data Compression & Search Cloud and data optimization XPath containment Distributed path expression processing DATA OPTIMIZATION ON CLOUD Cloud Virtualization Cloud layers Cloud computing

More information

XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries

XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries XSelMark: A Micro-Benchmark for Selectivity Estimation Approaches of XML Queries Sherif Sakr National ICT Australia (NICTA) Sydney, Australia sherif.sakr@nicta.com.au Abstract. Estimating the sizes of

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

4 Fractional Dimension of Posets from Trees

4 Fractional Dimension of Posets from Trees 57 4 Fractional Dimension of Posets from Trees In this last chapter, we switch gears a little bit, and fractionalize the dimension of posets We start with a few simple definitions to develop the language

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System

International Journal of Advance Engineering and Research Development. Performance Enhancement of Search System Scientific Journal of Impact Factor(SJIF): 3.134 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Performance Enhancement of Search System Ms. Uma P Nalawade

More information

SFilter: A Simple and Scalable Filter for XML Streams

SFilter: A Simple and Scalable Filter for XML Streams SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,

More information

XPath Lecture 34. Robb T. Koether. Hampden-Sydney College. Wed, Apr 11, 2012

XPath Lecture 34. Robb T. Koether. Hampden-Sydney College. Wed, Apr 11, 2012 XPath Lecture 34 Robb T. Koether Hampden-Sydney College Wed, Apr 11, 2012 Robb T. Koether (Hampden-Sydney College) XPathLecture 34 Wed, Apr 11, 2012 1 / 20 1 XPath Functions 2 Predicates 3 Axes Robb T.

More information

Implementation of Relational Operations in Omega Parallel Database System *

Implementation of Relational Operations in Omega Parallel Database System * Implementation of Relational Operations in Omega Parallel Database System * Abstract The paper describes the implementation of relational operations in the prototype of the Omega parallel database system

More information

arxiv: v2 [cs.ds] 9 Apr 2009

arxiv: v2 [cs.ds] 9 Apr 2009 Pairing Heaps with Costless Meld arxiv:09034130v2 [csds] 9 Apr 2009 Amr Elmasry Max-Planck Institut für Informatik Saarbrücken, Germany elmasry@mpi-infmpgde Abstract Improving the structure and analysis

More information

Improving generalized inverted index lock wait times

Improving generalized inverted index lock wait times Journal of Physics: Conference Series PAPER OPEN ACCESS Improving generalized inverted index lock wait times To cite this article: A Borodin et al 2018 J. Phys.: Conf. Ser. 944 012022 View the article

More information

Structural Consistency: Enabling XML Keyword Search to Eliminate Spurious Results Consistently

Structural Consistency: Enabling XML Keyword Search to Eliminate Spurious Results Consistently Last Modified: 22 Sept. 29 Structural Consistency: Enabling XML Keyword Search to Eliminate Spurious Results Consistently Ki-Hoon Lee, Kyu-Young Whang, Wook-Shin Han, and Min-Soo Kim Department of Computer

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Directed Graphical Models (Bayes Nets) (9/4/13)

Directed Graphical Models (Bayes Nets) (9/4/13) STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information