Bottom Up and Top Down Twig Pattern Matching on Indexed Trees

Size: px
Start display at page:

Download "Bottom Up and Top Down Twig Pattern Matching on Indexed Trees"

Transcription

1 Nils Grimsmo Bottom Up and Top Down Twig Pattern Matching on Indexed Trees Thesis for the degree of philosophiae doctor Trondheim, Norwegian University of Science and Technology. Faculty of Information Technology, Mathematics and Electrical Engineering. Department of Computer and Information Science.

2 NTNU Norwegian University of Science and Technology Thesis for the degree of philosophiae doctor Faculty of Information Technology, Mathematics and Electrical Engineering Department of Computer and Information Science c Nils Grimsmo ISBN (printed version) ISBN (electronic version) ISSN Doctoral theses at NTNU, 2011:96 Printed by NTNU-trykk

3 Preface This thesis is submitted to the Norwegian University of Science and Technology (NTNU) for partial fulfillment of the requirements for the degree of philosophiae doctor. The doctoral work has been performed at the Department of Computer and Information Science, NTNU, Trondheim, with Bjørn Olstad as main supervisor, and Øystein Torbjørnsen and Magnus Lie Hetland as co-supervisors. The candidate was supported by the Research Council of Norway under the grant NFR , and by the iad project, also funded by the Research Council of Norway. 5

4

5 Summary This PhD thesis is a collection of papers presented with a general introduction to the topic, which is twig pattern matching (TPM) on indexed tree data. TPM is a pattern matching problem where occurrences of a query tree are found in a usually much larger data tree. This has applications in XML search, where the data is tree shaped and the queries specify tree patterns. The papers included present contributions on how to construct and use structure indexes, which can speed up pattern matching, and on how to efficiently join together results for the different parts of the query with so-called twig joins. Paper 1 [18] shows how to perform more efficient matching of root-to-leaf query paths in so-called path indexes, by using new opportunistic algorithms on existing data structures. Paper 2 [19] proves a tight bound on the worst-case space usage for a data structure used to implement path indexes. Paper 3 [24] presents an XML indexing system which combines existing techniques in a novel way, and has orders of magnitude improved performance over existing commercial and open-source systems. Paper 4 [20] reviews and creates a taxonomy for the many advances in the field of TPM on indexed data, and proposes opportunities for further research. Paper 5 [21] bridges the gap between worst-case optimality and practical performance in current twig join algorithms. Paper 6 [22] improves the construction cost of so-called forward and backward path indexes for tree data from loglinear to linear. 7

6

7 Acknowledgments The day-to-day supervision of the PhD work during the first years was mostly done by the external supervisor Dr. Øystein Torbjørnsen from Fast Search and Transfer, who has been a good source of ideas and clever technical solutions. Dr. Magnus Lie Hetland from my department has been supervising the last year, and has given substantial help both scientifically and in the writing process of some papers. The visits of my formal supervisor Dr. Bjørn Olstad have been inspirational. The discussions with Dr. Felix Weigel during his internship at FAST resulted in many ideas. I would like to thank fellow PhD student Truls Amundsen Bjørklund for good times, fruitful discussions and honest feedback during our work together. Thank you Nina, for your patience, beauty and delicious cooking. 9

8

9 Contents Preface 5 Summary 7 Acknowledgments 9 Contents 12 1 Introduction Indexing/search in semi-structured data Use-case: XML XPath and XQuery Abstract problem: Twig Pattern Matching Research scope: TPM on indexed data Research questions Background Twig joins Twig join work-flow Result enumeration Single output query node Simple intermediate result architecture Tree position encoding Partial match filtering Intermediate result construction Merging input streams Data locality and updatability Twig join conclusion Partitioning data Motivation for fragmentation Path partitioning Backward and forward path partitioning Balancing fragmentation Reading data

10 2.3.1 Skipping Skipping child matches Skipping parent matches Holistic skipping Virtual streams Virtual matches for non-branching internal query nodes Tree position encoding allowing ancestor reconstruction Virtual matches for branching query nodes Related problems and solutions Research Summary Formalities Publications and research process Paper Paper Paper Paper Paper Paper Research methodology Evaluation of contributions Research questions revisited Opportunities revisited Future work Strong structure summaries for independent documents A simpler fast optimal twig join Simpler and faster evaluation with non-output nodes Ultimate data access shoot-out Conclusions Bibliography 61 4 Included papers 63 Paper 1: Faster Path Indexes for Search in XML Data Paper 2: On the Size of Generalised Suffix Trees Extended with String ID Lists 87 Paper 3: XLeaf: Twig Evaluation with Skipping Loop Joins and Virtual Nodes 93 Paper 4: Towards Unifying Advances in Twig Join Algorithms Paper 5: Fast Optimal Twig Joins Paper 6: Linear Computation of the Maximum Simultaneous Forward and Backward Bisimulation for Node-Labeled Trees A Other Papers 193 Paper 7: On performance and cache effects in substring indexes Paper 8: Inverted Indexes vs. Bitmap Indexes in Decision Support Systems Paper 9: Search Your Friends And Not Your Enemies

11 Chapter 1 Introduction Research is formalized curiosity. It is poking and prying with a purpose. Zora Neale Hurston The thesis is submitted as a paper collection bound together by a general introduction. This chapter presents the context of the research, which is indexing and querying semi-structured data, and the abstract problem investigated, which is twig pattern matching (TPM). Chapter 2 gives a high-level introduction to techniques used in state of the art TPM on indexed data. Chapter 3 lists the included published papers with short qualitative assessments, evaluates the total contribution of this thesis, and proposes future work. 1.1 Indexing/search in semi-structured data So-called semi-structured data gives both flexibility and expressional power, and is commonly used for storing and exchanging data in heterogeneous information systems. In the semi-structured data model, documents have a structure that specifies how the different parts of the content relate to each other. This means information is contained both in the structure and the content. Documents are usually structurally self-contained, meaning that the structure can be understood from the document alone, without additional meta-data. The focus of this thesis is algorithms and data structures for indexing and querying semi-structured data, where queries specify both structure and content. The use of semistructured data can functionally cover both traditional structure-oriented and contentoriented data management, and the thesis therefore touches the fields of both databases and information retrieval. 13

12 CHAPTER 1. INTRODUCTION 1.2 Use-case: XML XML is a simple yet flexible markup language [46], and has become the de facto standard for storing semi-structured data. An example XML document is shown in Figure 1.1. Standard XML has a tree model, where there are mainly three types of nodes in a document tree: element, attribute and text. All internal nodes in the document tree are of type element, and are given by start and end tags, such as the node with name book in the example. Text and attribute nodes are always leaf nodes. Text nodes have simple string values, while attributes have both a name and a value, such as the ISBN node in the example. <library> <book ISBN="13"> <title>kritik der Unvollständigkeit</title> <author>kant</author> <author>gödel</author> </book>... </library> Figure 1.1: Example XML document XPath and XQuery XPath [45] and XQuery [47] have become standard languages for querying XML. Comparing the two, XPath is a simpler declarative language, while XQuery is a more complex language that uses XPath expressions as building blocks. The XPath expression in Figure 1.2a asks for the title of all books coauthored by Kant and Gödel. In XPath single and double forward slashes specify child and descendant relationships between nodes, respectively. Square brackets contain predicates, and the rightmost node not part of a predicate is the output node, also called the return node. XPath queries are trees, and the tree representation of the example is shown in Figure 1.2b. In XPath there are 11 so-called axes in addition to descendant and child: parent, ancestor, followingsibling, preceding-sibling, following, preceding, attribute, namespace, self, descendant-orself and ancestor-or-self [45]. There can also be more complex value predicates than simple tests on string equality, using function such as count(), contains(), sum(), etc.. XQuery is a powerful language where small programs are built with path expressions as building blocks, in so-called FLWOR expressions (for, let, where, order, return). Figure 1.3 shows an XQuery program similar to the XPath expression in Figure 1.2a, which in addition orders books by title and retrieves both title and ISBN. 14

13 1.3. ABSTRACT PROBLEM: TWIG PATTERN MATCHING //book[author/text()="kant"][author/text()="gödel"]/title (a) Expression. <book> <author> <author> <title> "Kant" "Gödel" (b) Tree representation. Figure 1.2: XPath example finding books coauthored by Kant and Gödel. for $b in doc("lib.xml")/library/book let $t := $b/title where $b/author = "Kant" and $b/author = "Gödel" order by $t return ($t, $b/@isbn) Figure 1.3: Example XQuery. 1.3 Abstract problem: Twig Pattern Matching In XPath a large number of functions can be used in value predicates, and thirteen different axes dictate the relationships between nodes. The many details in the language makes it hard to reason about the complexity of evaluation algorithms and hard to implement prototypes. TPM is a more abstract tree matching problem that covers a subset of XPath. It is of academic interest because a TPM solution covers the majority of the workload in most XML search systems [15]. In TPM both query and data are node-labeled trees, as shown in the example in Figure 1.4. Node predicates are on label equality, and all nodes have the same type. There are two types of query edges that dictate the relationship between data nodes in a match, ancestor descendant (A D) and parent child (P C), denoted in figures by double and single edges, respectively. The result of a TPM query is the set of mappings of query nodes to data nodes that both respect the node labels and satisfy the A D and P C relationships specified by the query edges. In settings with XML document collections, the data is a forest of trees, but this can easily be transformed into a single tree by adding a virtual super-root node. 15

14 CHAPTER 1. INTRODUCTION c 1 a 1 b 1 c 2 a 1 d 2 b 6 b 1 c 1 a 2 e 1 a 4 d 1 c 5 b 2 a 3 c 4 b 3 b 5 c 6 c 3 f 1 b 4 Figure 1.4: TPM example with a query tree on the left and a data tree on the right. One of the matches for the query in the data is shown with arrows from query nodes to data nodes. In the following, query nodes are drawn with circles and data nodes with rounded rectangles. Node labels are written with typewriter font, and the superscripts in query nodes and subscripts in data nodes are used to identify the nodes (together with the labels) Research scope: TPM on indexed data The scope of this thesis is twig pattern matching on indexed data, and we assume that the processes of preparing the index and evaluating queries are separate. For this strategy to be viable, the cost of index construction must be justified by the performance gain for query evaluation compared to evaluation without an index, seen in light of the index construction cost. We use the following abstract view of an index: It is a mechanism which provides a function from some feature of a node, to nodes in the data tree that have this feature. The simplest non-trivial such feature is node label, as used in the index shown in Figure 1.5a. In a typical implementation, entries in a so-called dictionary on label point to so-called occurrence lists containing nodes with matching label. When indexing on label, a query can be evaluated by reading the label-matching data nodes for each query node, and joining these into full query matches. The number of full query matches may be small compared to the total number of query node matches read, but if the labels on the query nodes are selective, much fewer data nodes will be processed than when evaluating the query on the data tree without an index. Indexing on node label can be extended to indexing on path labels, the string of labels from the root to a node, as illustrated in Figure 1.5b. This can again be extended to classify nodes not only on labels of the ancestor nodes on the path above, but also on the labels of the children in the subtree below. These indexing strategies, called structure indexing, will be discussed in the next chapter, together with so-called twig join algorithms. 16

15 1.4. RESEARCH QUESTIONS c c 1 a a 1 a 2 a 3 a 4 c a a 1 b b 1 b 2 b 3 b 4 b 5 b 6 c a a a 2 a 4 c c 1 c 2 c 3 c 4 c 5 c 6 c a a a a 3 d d 1 d 2 c a a b b 2 b 3 b 5 e e 1 f f 1 c d d 2 (a) Indexing on label. (b) Indexing on path. Figure 1.5: Indexing the data tree from Figure Research questions The following are the main research questions I have investigated during the work with this thesis: RQ1: How can matches for tree queries be joined more efficiently? RQ2: How can pattern matching in the dictionary be done more efficiently? RQ3: How can structure indexes be constructed faster and using less space? These questions will be revisited in Section 3.4.1, where I will evaluate to what extent they have been answered by my research. Note that more efficient query evaluation can mean either that all or most queries are evaluated using less time, or that queries from some important group are evaluated using less time. Preferably, faster evaluation for one group of queries should not cause slower evaluation for other groups. 17

16

17 Chapter 2 Background Research is what I m doing when I don t know what I m doing. Werner von Braun This chapter presents some underlying concepts for state-of-the-art approaches for TPM on indexed data, which will hopefully ease the understanding of the contributions in the research papers included in this thesis. A high-level conceptual overview is given instead of an in-depth description of details in state-of-the-art solutions, because this is better covered by the included papers where the specific techniques are discussed. The following discussion divides the problem of TPM on indexed data into three somewhat orthogonal issues: How to construct full query matches from individual query node matches in so-called twig joins, how to partition the underlying data nodes such that as few as possible are read to evaluate a query, and how to efficiently read streams of data nodes during a join. Notation. The following notation is used in the discussion: A graph G has node set V G and edge set E G V G V G. All graphs are directed. A graph is a tree if all nodes have one incoming edge except the root, which has zero incoming edges. Nodes with zero outgoing edges are called leaves. A graph is called a forest if it consists of many unconnected trees, i.e., if all nodes have zero to one incoming edges. If a relation R relates x to y, this may be denoted both xry and x, y R and x y R. We primarily use angle brackets for graph edges, as in u, v E G, and the maps to arrow for mappings of query nodes to data nodes, as in q d M. The transitive closure of a relation R is denoted by R. In the problems discussed there will mostly be a query tree Q and a data tree D, where each node v V Q V D has a Label(v) A. Assume A O( D ) for simplicity. Each query edge u, v E Q has an EdgeType(u, v) { A D, P C }, specifying an ancestor descendant or a parent child relationship. Remember from Section 1.3 that in TPM we have a single node type, and only differentiate nodes by label, while in XML there are different node types. We can generalize 19

18 CHAPTER 2. BACKGROUND TPM to cover this by using different label codings for different node type, such as for example starting element node labels with <, attribute node labels and text node labels with ". Definition 1 (Query match). Given a query tree Q and a data tree D, a match for Q in D is a total 1 function M : V Q V D such that Label(v) = Label(M(v)) for all v V Q, and whenever there is an edge u, v E Q, if EdgeType(u, v) = P C, then there is an edge M(u), M(v) E D, or if EdgeType(u, v) = A D, then there is an edge M(u), M(v) E D. 2 Revisit the example in Figure 1.4 for an illustration of a match. Definition 2 (Twig pattern matching problem). Given a query tree Q and a data tree D, the twig pattern matching problem is to find the set of functions that are matches for Q in D. Denote this set of matches by M Q,D, or just M when there is no ambiguity. 2.1 Twig joins In a twig join, a query is evaluated by considering a set of candidate data nodes I v V D for each query node v V Q, which are joined into full query matches, where the query node v is mapped to a data node in I v. With label-indexing, I v = {v V D Label(v ) = Label(v)}, the set of all data nodes with label matching that of v. For the example in Figure 1.4, the candidate set for query node a 1 would be I a 1 = {a 1, a 2, a 3, a 4 }. Denote the total input by I = {v v v V Q, v I v }, and the set of query matches that can be constructed from this input by O = {M M I, M M}. The following discussion assumes label indexing, but the techniques for constructing twig matches presented here are also applicable when changing the assumptions on how to index the underlying data, as discussed in Section 2.2. In practice, a twig join accesses each set I v through an enumeration S v, which typically follows a given ordering, such as tree preorder. As a base case, S v is implemented as a simple stream, where you can read out the current element, the so-called head, or forward to the next element. Later in this chapter we also consider cases where you can fastforward to search for a given element in S v. In some settings S v could also have random access to elements at given positions Twig join work-flow Early approaches used multiple binary joins to construct full query matches [56, 1], but this can give intermediate results of exponential size when the query contains A D edges [5]. This deficiency led to the introduction of multi-way joins [5, 7, 27, 39, 33]. Current multi-way twig join algorithms generally use the following strategy, illustrated in Figure 2.1: There are two phases, temporally separate, where the first phase constructs an intermediate result data structure, and the second phase traverses this data structure 1 A function f : X Y is called total iff f(x) is defined for all x X. 2 There in an edge u, v E iff there is a simple path from u to v using edges from E. 20

19 2.1. TWIG JOINS to enumerate and output the set of full query matches O. The first phase has two components, where the first merges the streams S v, materializing I v for each v V Q, into a single stream S, materializing the total input set I. Phase 1, Component 1: Input stream merger Phase 1, Component 2: Intermediate result constr. Phase 2: Result enumeration a 1 a 1 a 2... b 1 b 1 b 2... c 1 c 1 c 2... c 1 c 1 b 1 b 1... Intermed. results a 1 b 2 c 5... Figure 2.1: Work-flow of twig join algorithms. Figure 2.2 illustrates why the two phases are temporally separate, as in the worst case, all the data must be read before it is known whether or not the nodes in the input are useful. On the other hand, use of the two components in Phase 1 can be temporally overlapping, because Component 2 reads data and query node pairs from Component 1 in some order that can be implemented without lookahead in the individual streams. Note that for some combinations of query and data, the construction of intermediate results is not necessary for linear evaluation (as we exploit in Paper 3 included in Chapter 4). a 1 a 1 c 1 b 1 b 1 b n a 2 c n+1 b n+1 c 1 c n Figure 2.2: Example showing why Phase 1 and Phase 2 are temporally separate. When the input streams are sorted in tree preorder, it cannot be known whether b 1,..., b n are part of a query match before c n+1 is seen, or whether c 1,..., c n are part of a query match before b n+1 is seen. Note that there is no stream ordering such that all twig queries can be evaluated without storing intermediate results [10]. To understand the design choices in the approach depicted in Figure 2.1, it is easiest to start with the last step, result enumeration, and work backwards. Section sketches a generic algorithm for enumerating results, and Section sketches the layout of a generic data structure that enables evaluating that algorithm in linear time. With this as a starting point, I go through various techniques and strategies for implementing the generic approach. Section briefly presents a common tree position encoding that makes it possible to decide A D and P C relationships between data nodes in the 21

20 CHAPTER 2. BACKGROUND various streams in constant time. Section describes two common data node filtering strategies, and Section shows how one of these can be used to realize the conceptual data structure from Section in linear time. Section describes the input stream merge component, where filtering strategies can be used for practical speedups Result enumeration Algorithm 1 gives a high-level description of how to output all unique query matches that can be constructed from the input. The approach is a generalization of what is used in state of the art twig joins [7, 27, 39, 33]. The algorithm recursively constructs full query matches from partial matches that are known to be part of full query matches, denoted here as partial full matches. Formally, a partial full match is an M such that M M for some full query match M M. The set of all partial full matches is M = {M M M : M M}. Algorithm 1 Result enumeration Denote the set of partial full matches by M. Start with M = {}, an empty partial full twig match. Assume any fixed ordering of the nodes in Q, and let v Q be the first node in this ordering. For all v such that {v v } M : Call Recurse(v v ). The function Recurse(u u ): Insert u u into M. If M = Q : Output M. Otherwise: Let v be the node following u in Q. For all v v such that M {v v } M : Recurse(v v ) Remove u u from M. Example 1. We evaluate the query and data in Figure 1.4 using Algorithm 1, and order query nodes in tree preorder. A candidate match for query node a 1 that is part of a full match is data node a 1, and hence one of the top-level calls to Recurse will be with the parameter u u set to a 1 a 1. After this pair has been inserted into M, we consider the query node b 1, which follows a 1 in tree preorder. Since M = {a 1 a 1 }, and M {b 1 b 1 } is a partial full match, b 1 b 1 is one of the pairs we recurse with. In that recursive call we have M = {a 1 a 1, b 1 b 1 }, and consider matches for the final query node c 1. As {a 1 a 1, b 1 b 1 } {c 1 c 1 } is a partial full match, we again recurse with c 1 c 1, and output the new M, since it is a complete full match. Assume that the set of partial full matches M does not have to be materialized, and that given a partial full match M M, where all nodes u preceding v have a mapping 22

21 2.1. TWIG JOINS in u u M, all v v such that M {v v } M can be traversed in time linear in their number. Under these assumptions the algorithm can be evaluated in O( O Q ) time, linear in the total number of data nodes in the output. The intuition is that each recursive call constructs in constant time a partial full match not seen before, and that each unique partial full match yields at least one unique full query match Single output query node In TPM the answers in the result set are all legal ways of matching the query nodes to the data nodes, but in many information retrieval settings other semantics may be more useful. In the XPath language [45] queries have a single output node, and the result set contains all matches for this query node that are part of some full query match. In the XQuery language [47], which is used for more complex information retrieval and processing, there can be any number of output and non-output nodes in the query. Only minor changes are needed in Algorithm 1 for this generalized case with both output and non-output query nodes. A simple solution is to put the output query nodes first in the fixed ordering, and stop the recursion before non-output nodes are considered. Note that practical data structures that enable linear enumeration for any combination of output and non-output nodes [7] are not as simple as the data structures described in the following sections Simple intermediate result architecture Figure 2.2 illustrated why it is not possible to output query matches directly by just inspecting the heads of the streams for each query node. In the example all the nodes labeled c must be read before it can be known whether or not any of the nodes labeled b are useful, and vice versa. The purpose of storing intermediate results is to organize the data nodes in such a way that an implementation of the approach in Algorithm 1 can be evaluated efficiently. If the query nodes are ordered in tree preorder, it is natural to maintain for each u u that is part of a full query match, for each child v of u, the list of pairs v v used together with u u in some full query match. Figure 2.3 illustrates this strategy. In addition to the lists of pointers to useful child query node matches for each pair, there must be a list of pointers to the data nodes that match the query root in full query matches. a 1 b 1 c 1 b 1 b 2 b 3 b 4 b 5 b 6 a 1 a 2 a 3 a 4 c 1 c 2 c 3 c 4 c 5 c 6 Full match roots Figure 2.3: Generic intermediate results for the data tree in Figure

22 CHAPTER 2. BACKGROUND This data structure takes O( I + O Q ) space, linear in the size of the input and output, because the lists of data nodes take O( I ) space, and each root pointer or child match pointer is used at least once in Algorithm 1, which has time complexity O( O Q ). The following intuition shows how this data structure can be used to efficiently implement Algorithm 1 when query nodes are ordered in tree preorder: (i) The pairs v v in the initial calls in the outer for-loop are trivially found by traversing the list of pointers to full match roots. (ii) In a recursive call, after u u has been added to M, the current M is a partial full match by assumption. Let v be the node following u in preorder, and let p be the parent of v (possibly p = u). All query nodes preceding v have a mapping in M, and assume M (p) = p. Let Q p and Q v be the subgraphs resulting from removing the edge p, v from Q. These subqueries can be matched independently when the mapping of both p and v is fixed in a way such that EdgeType(p, v) is satisfied. If v v is used in some full query match together with p p, we know that p, v satisfies EdgeType(p, v). Then, if M is a partial full match, M {v v } must also be a partial full match. Example 2. This example illustrates how to implement the data access for Example 1 using the data structure in Figure 2.3. The first match for the query root a 1 that is part of a full match is the data node a 1, and hence the first non-empty partial full match in Algorithm 1 is M = {a 1 a 1 }. When considering the next query node in preorder, b 1, we see from the pointers in the data structure that b 2 is the first data node usable together with a 1. Hence the next partial full match is M = {a 1 a 1, b 1 b 2 }. Then, when considering the next query node c 1, we see that the data node c 5 is the only data node usable with a 1, the current match for a 1, the parent of query node c 1. We insert c 1 c 5 to get the full match M = {a 1 a 1, b 1 b 2, c 1 c 5 } Tree position encoding To construct the intermediate results efficiently it must be decidable from position information following the data nodes whether or not they satisfy A D and P C relationships. A common solution is the interval-based BEL encoding [56], where each node is given integer numbers begin, end and level, as shown in Figure ,10,1 2,5,2 3,3,3 4,4,3 6,9,2 7,7,3 8,8,3 Figure 2.4: The BEL encoding for a tree, with begin, end and level numbers. This encoding is similar to preorder and postorder traversal numbers, and can be computed in a depth-first traversal of the tree. The reason the encoding is often preferred is probably that the begin and end numbers correspond to the document position of opening and closing tags in XML. 24

23 2.1. TWIG JOINS With the BEL encoding, a node a is an ancestor of a node b iff a.begin < b.begin and b.begin < a.end, and it is a parent if also a.level + 1 = b.level. Sorting on begin or end numbers respectively gives the same sorting orders as preorder and postorder traversal numbers. There exists a large number of tree position encodings with different properties [50]. Some allow decision of more types of node relationships, and some allow reconstruction of related nodes. They differ in the computational cost of evaluating relationships, space usage, and how well they handle updates in the data tree Partial match filtering When constructing intermediate results it is often possible to filter out some query and data node pairs that will never be part of a full query match. In current twig join algorithms filtering is used both for practical speedup [5, 27, 33], and/or as a necessity for worst-case efficient result enumeration [7]. A filtering strategy does not have to be perfect, but it must certainly not remove pairs that are part of full query matches. In other words, it can have false positives, but not false negatives. Most filtering strategies are based on the observation that if there is some subquery (a subgraph of the query), such that the pair v v is not part of any match for the subquery, then v v is not part of any match for the entire query, and can safely be thrown away [21]. The two most common filtering strategies are illustrated in Figure 2.5. The first is based on checking if query prefix paths are matched [5, 27, 33], and the second on checking if query subtrees are matched [7, 39, 33]. The prefix path of a query node is the subquery containing the nodes on the path from the root down to the node. c 1 c 1 b 1 c 2 a 1 d 2 b 6 b 1 c 2 a 1 d 2 b 6 a 1 a 2 e 1 a 4 d 1 c 5 a 2 e 1 a 4 d 1 c 5 b 1 c 1 b 2 a 3 c 4 b 3 b 5 c 6 b 2 a 3 c 4 b 3 b 5 c 6 f 1 b 2 c 3 f 1 b 4 c 3 f 1 b 4 (a) Query. (b) Matching prefix paths. (c) Matching subtrees. Figure 2.5: Matching query parts. We call a pair v v that is part of a prefix path match for v a prefix path matcher. Filtering query and data node pairs on whether or not they are prefix path matchers is easy to implement with an inductive strategy: Assuming that v Q has parent u, the 25

24 CHAPTER 2. BACKGROUND pair v v is a prefix path matcher for v if and only if there exists a pair u u that is a prefix path matcher for u such that u, v satisfies the A D or P C relationship specified by EdgeType(u, v) [5]. Prefix path filtering is easiest to implement when data nodes are seen in tree preorder, where ancestors are seen before descendants. Example 3. Figure 2.5b illustrates prefix path match checking. The pair a 1 a 1 is trivially a prefix path matcher, and b 1 b 3 must then be a prefix path matcher because EdgeType(a 1, b 1 ) = A D and a 1, b 3 E D. This again implies that f 1 f 1 must be a prefix path matcher because EdgeType(b 1, f 1 ) = P C and b 3, f 1 E D. Filtering pairs on whether or not they are subtree matchers can be implemented with a similar strategy: The pair v v is a subtree matcher if and only if for each child w of v, there exists a subtree matcher w w such that v, w satisfies the A D or P C relationship specified by EdgeType(v, w) [7]. Subtree match filtering is easiest to implement when data nodes are seen in tree postorder. Example 4. Figure 2.5c illustrates subtree match checking. The pairs f 1 f 1, b 2 b 4 and c 1 c 5 are trivially subtree matchers because the query nodes are leaves. The pair b 1 b 3 is a subtree matcher because f 1 f 1 is a subtree matcher and b 3, f 1 E D satisfies EdgeType(b 1, f 1 ) = P C, and because b 2 b 4 is a subtree matcher and b 3, b 4 E D satisfies EdgeType(b 1, b 2 ) = A D. The pair a 1 a 1 is a subtree matcher, because b 1 b 3 is a subtree matcher and a 1, b 3 E D satisfies EdgeType(a 1, b 1 ) = A D, and because c 1 c 5 is a subtree matcher and a 1, c 5 E D satisfies EdgeType(a 1, b 1 ) = P C Intermediate result construction The filtering on matched subtrees described in the previous section is strongly related to a strategy that can be used to efficiently build a data structure that realizes the conceptual structure depicted in Figure 2.3. What is described in the following is a slight simplification of what is used in the Twig 2 Stack [7] algorithm, which was the first twig join algorithm with cost linear in the size of the input data and the output result set. The reason preorder processing of data nodes and filtering on matched prefix paths is not a suitable starting point for a worst-case efficient algorithm, is that even though paths in the data do match paths in the query, it is hard to figure out on the fly during preorder processing whether or not other paths in the query can use the same branching nodes. On the other hand, with postorder processing matches for the query can be constructed bottom up by combining subtree matches into bigger subtree matches. The storage order of data nodes in the index does not have to be changed for postorder processing, as a preorder stream of match pairs can be translated to a postorder stream with a stack: When a pair v v is read in preorder, all pairs u u on the stack such that u is not an ancestor of v are popped off and processed one by one, before v v is pushed on stack. When following the strategy from Sections and 2.1.3, the key to efficient enumeration of results is the ability to efficiently find usable subtree matches. Given a candidate v v, we need to find for all children w of v, the list of matchers w w such that v, w satisfies EdgeType(v, w). Subtree matches for the query root are trivially full query matches. 26

25 2.1. TWIG JOINS The overall strategy for the proposed data structure is to maintain for each query node v a list of disjoint trees T v consisting of node matches from the stream S v, as shown in Figure 2.6. Some additional dummy nodes are used to bind the trees together. For each data node in the trees for a query node, there is a list of pointers to usable child query node matches. P C matches are pointed to directly, while A D matches are found in the entire subtrees pointed to. a 1 a 4 c 1 b 1 c 2 b 2 c 3 c 4 c 5 b 3 b 4 b 5 c 6 Figure 2.6: Figure 1.4. Postorder construction of intermediate results for the data and query in Algorithm 2 shows how this data structure can be constructed, specifying the processing of a single pair v v in postorder. For each query node v, there is a list T v of disjoint trees consisting of subtree matchers v v where v S v. When processing a pair v v, the trees where the root data nodes are descendants of v are joined into single trees, both in the lists T w for the children w of v, and in the list T v for v itself. For P C edges, pointers from v v to w w denote single direct child matches, while for A D edges, pointers denote that entire subtrees contain matches. A pair v v is only added if there is at least one pointer for each child w of v, and this effectively implements subtree match filtering as described in Section Example 5. Figure 2.7 shows the step processing a 1 a 1 when constructing intermediate results for the data and query from Figure 1.4 with Algorithm 2. The trees at the end of T b 1, where the roots are b 2 and a dummy node, are joined into a single tree. So are the trees at the end of T c 1, where the roots are c 3, c 4 and c 5. Pointers are added from a 1 a 1 to the tree of descendants in T b 1, and to the child match c 1 c 5 in T c 1. Since a 1 a 1 has pointers both to matches for b 1 and c 1, it is a subtree match, and is added to T a 1. When evaluating the input I with Algorithm 2, the total number of calls to the procedure Process() would be v V Q S v = I, and the total number of rounds in the for-loop would be v V Q I v b v O( I b Q ), where b v is number of children of v and b Q is the maximal number of children for any node in Q. Apart from constant time 27

26 CHAPTER 2. BACKGROUND Algorithm 2 Postorder intermediate result construction Function Process(v v ): For each child w of v: Let T w be the trees at the end of T w where root nodes are descendants of v. If EdgeType(v, v) = P C : Add pointers from v v to all w w in T w where depth(w ) = depth(v )+1. If T w > 1 Replace T w by a dummy node with the trees from T w as children. If EdgeType(v, v) = A D and T w > 0: Add a descendant pointer from v v to the single node in T w. If v v does not have at least one pointer per child w of v: Discard v v and return failure. Remove from the end of T v all roots where data nodes are descendants of v, add them as children of v v, and append v v to T v. a 1 a 1 a 4 c 1 a 4 c 1 b 1 c 2 b 1 c 2 b 2 c 3 c 4 c 5 b 2 c 3 c 4 c 5 b 3 b 5 c 6 b 3 b 5 c 6 b 4 b 4 (a) Before adding a 1 a 1. (b) After adding a 1 a 1. Figure 2.7: A step in postorder construction of intermediate results for the data and query in Figure 1.4. Dotted boxes give the current list of trees T v for each v V Q. 28

27 2.1. TWIG JOINS operations for each input v v and each child w of v, there is some non-trivial cost associated with merging trees and adding pointers to P C and A D child matches. A merge attempt either inspects only one tree root and does not change T v, or inspects k > 1 roots, removes k 1 roots from T v and adds a new one. This means that the cost of merge operations is bounded by the number of attempts and the sizes of the trees, i.e., v V Q O( I v + I v b v ). Now consider the cost of adding pointers from matches for a query node u to matches for a child query node w. If EdgeType(v, w) = A D, then only a single edge is added from each v v. If EdgeType(v, w) = P C, then only a single edge is added to each w w, as a node can have only one parent. In conclusion, the total cost of using Algorithm 2 is v V Q O( I v + I v b v ) O( I + I b Q ). What is presented here is a slight simplification of the Twig 2 Stack algorithm [7]. The main difference between the above depiction and Twig 2 Stack is that in the latter, the data structure for each query node is a list of trees of stacks of nodes, instead of simply lists of trees of nodes. Many alternative twig join algorithms have been presented [27, 39, 33] in the years following the publication of the Twig 2 Stack algorithm. What is common to these algorithms is that they have improved practical performance, but higher worst-case complexity in the result enumeration phase. An example is the TwigList algorithm, which stores intermediate nodes in simple vectors instead of trees, and implements a weaker form of subtree filtering, where all query edges are considered to have type A D Merging input streams The final component missing to implement the strategy in Figure 2.1 is the input stream merger. The input to the merge is one preorder sorted stream representing I v for each v Q, and the desired output is a sorted stream representing I. The sort order required for using the approach from Section is that the pairs v v I are sorted primarily on the preorder of the data nodes, and secondarily on the postorder of the query nodes. This means that after translating the stream into data node postorder with a stack, the new stream is sorted secondarily on query node preorder. This is required by Algorithm 2 for cases where a single data node matches multiple query nodes, as a data node could hide useful children of itself if the sorting was not secondarily on query node preorder. The simplest merge approach is to traverse the query in postorder, and find some minimum v v by taking a preorder minimum v that is head of a stream I v for a postorder minimal v. This takes Θ( Q ) time per extraction, and gives a total cost of Θ( I Q ) for the merge. An asymptotically better approach is to organize the individual streams in a priority queue implemented with a binary heap, sorted primarily on the heads of the streams and secondarily on the query nodes. Extractions then take O(log Q ) time, and the total cost is O( I log Q ) [11]. Since the preorder and postorder tree traversal numbers we are sorting on are bounded by the size of the input, the sorting complexity is not loglinear, but linear under the unit cost assumption. The entire set I can be put in a single array, and sorted using radix sort in Θ( I ) time [11]. As the intermediate result construction is already O( I b Q ), the radix sort approach gives no advantage over the heap based approach when log Q b Q. Since the latter uses much less memory in practice, Θ( Q ) instead of Θ( I ), it is preferable in most real-world scenarios. 29

28 CHAPTER 2. BACKGROUND Some of the newer twig join algorithms storing intermediate results in preorder [27, 33] use a O( I Q ) input stream merge component that implements a weak form of subtree match filtering, where all query edges are considered to have type A D [5]. The merger uses only O( Q ) memory and is very fast in practice because queries are typically small. It returns data nodes in a relaxed preorder, where the ordering is only guaranteed between matches for query nodes related by ancestry. This stream is not easily translated into postorder, and hence the merger is not used for postorder processing algorithms [21] Data locality and updatability This chapter does in general not make a distinction between data stored in main memory and on disk, but in practical implementations it is important to consider the costs of different access patterns in different media. While main memory on modern computers does not really have a uniform memory access cost, due to the use of caches, we can design usable systems that use random memory reads and writes. On the other hand, if the data is so large it must reside on disk, a system that uses a lot of random access will not be efficient in practice. Consider now the different phases and components in our twig join strategy. The input stream merger is assumed to only inspect stream heads and store a minimal amount of state. Hence it should work well on an architecture where the candidate matches for each query node are streamed from disk. The intermediate result construction, as shown in Algorithm 2, inspects in each call a number of tree roots stored contiguously at the end of the current list of trees for each query node. This in itself is simple to implement with good spatial locality, but it should also be considered how the layout of data affects the result enumeration phase. Luckily, if intermediate nodes are stream onto disk and inserted into blocks in postorder, most nodes that are close in the data tree will be stored closely on disk. This strategy will give fairly good spatial locality during result enumeration [7]. The problem of intermediate results exceeding the size of main memory can be avoided in many practical cases, by observing that when the uppermost candidate match for the root query node is closed, none of the data nodes seen so far in the tree preorder will be used in any match involving data nodes later in the tree preorder [7]. This means that when the uppermost query root match candidate is closed, the current intermediate data can be used to enumerate the current set of query matches, before this data is discarded. Example 6. Consider the data in Figure 1.4, and an algorithm that pushes nodes onto a stack in preorder and pops them off in postorder. When the data node b 6 is processed, it causes the popping of a 1, and there are no more a-nodes on the stack. As a match for the query node a 1 must be above the match for any other query node in a full query match, no nodes preceding b 6 in the data will be involved in a match together with nodes following and including b 6. Hence we can enumerate results, and delete the current intermediate data structures. In many practical cases with large amounts of data, the underlying information is stored in a large number of independent documents of moderate size, and in these cases the above trick is always applicable. Data updates are also easy to handle in such a setting. A way of encoding global data node positions is by combining document identifiers and local 30

29 2.2. PARTITIONING DATA node position encodings, such as BEL, and this simplifies updates: Updating a document can be viewed as deleting it and then re-adding it with a new document identifier, as is common in search systems for unstructured data [51]. Note that when the data is a single large tree that cannot easily be partitioned into independent documents, we need a node position encoding that has affordable cost for tree updates. There exist a number of such encodings with different properties [50] Twig join conclusion We have now discussed all the components in a state-of-the-art twig join algorithm, and the costs of the different components are: input stream merge: O( I log Q ) for the heap-based approach, intermediate results construction: O( I b Q ), and result enumeration: O( O Q ). This gives a total combined data, query and result complexity of O( I log Q + I b Q + O Q ). Commonly the size of the query is viewed as a constant, and twig join algorithms are called linear and optimal if the combined data and result complexity is O( I + O ). 2.2 Partitioning data In the previous discussion it was assumed that the data nodes where partitioned on label in the index. This section considers the advantages and challenges that arise from more advanced indexing strategies Motivation for fragmentation Let us first recap the introduction to the general strategy for TPM on indexed data from Section 1.3.1: The index is a mechanism which provides a function from some feature of a node to the set of nodes in the data that have this feature. The main motivation for using an index is of course reading and processing less data during query processing. If node labels are selective then simple label partitioning is an efficient approach, but this is not always the case. Figure 2.8 shows a case with many label-matches for the individual query nodes in the data, but only a few full matches for the query. The above example may be unrealistic, but reconsider the data in Figure 1.1 and the query in Figure 1.2 on page 14. If the given library has billions of books, then the cost of reading the data nodes labeled book will be huge compared to the size of the output result set. This motivates the use of a more fragmented partitioning of the data to improve the selectivity of query nodes. Note that another way of improving performance in these cases is to use skipping, discussed later in Section

30 CHAPTER 2. BACKGROUND a 1 b 1 a 2 b 3 a 2 a 9 a 15 a 4 b 3 a 7 a 8 b 10 b 13 b 16 b 18 b 20 a 4 b 5 a 6 a 11 a 12 b 14 b 17 b 19 a 21 (a) Example query and data, showing first of four matches. a 1 a 2 b 3 a 4 a 2 a 4 a 6 a 7 a 8 a 9 a 11 a 12 a 15 a 21 a 2 a 4 a 6 a 7 a 8 a 9 a 11 a 12 a 15 a 21 b 1 b 3 b 5 b 10 b 13 b 14 b 16 b 17 b 18 b 19 b 20 a 2 a 4 a 6 a 7 a 8 a 9 a 11 a 12 a 15 a 21 (b) Example query and streams read. Marked stream nodes are useful. Figure 2.8: Partitioning on label Path partitioning A natural extension of label partitioning is to partition data nodes on the paths by which they are reachable [37, 13, 36, 8]. Section described how useless data nodes could be filtered out during intermediate result construction if they did not match prefix paths in the query. When indexing data nodes on prefix path, the same filtering is performed in advance, and we only process data nodes from classes where the prefix paths match the prefix paths in the query. To identify useful partitions when evaluating a query, we need some form of dictionary. In Figure 1.5b on page 17 a simple dictionary of path strings was used in the index, but this approach does not have attractive worst-case properties. There may be many unique paths in the data, and the size of this naive dictionary can be O( D 2 ) if the tree is deep. A more robust approach is to use a dictionary tree called a path summary, where shared prefixes of paths are only encoded once. Figure 2.9a shows the path partitioning for the data tree in Figure 2.8a. A path summary can be constructed from this partitioning by creating one node for each block in the partition, and creating edges between summary nodes whenever there are edges between data nodes in the related blocks, as shown on the left in Figure 2.9b. Prefix path matches for each query node can be found individually by using a matching algorithm on the summary tree, but this may give many individual matches that never take part in full query matches. A robust and efficient way to find useful prefix path matches is to index the summary itself on label, and use a twig join algorithms to evaluate queries directly on the summary to find relevant nodes [2]. 32

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

Column Stores versus Search Engines and Applications to Search in Social Networks

Column Stores versus Search Engines and Applications to Search in Social Networks Truls A. Bjørklund Column Stores versus Search Engines and Applications to Search in Social Networks Thesis for the degree of philosophiae doctor Trondheim, June 2011 Norwegian University of Science and

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

DDS Dynamic Search Trees

DDS Dynamic Search Trees DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures. Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,

More information

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. In linked list the elements are necessarily to be contiguous In linked list the elements may locate at far positions

More information

Analysis of Algorithms

Analysis of Algorithms Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and

More information

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures

More information

Navigation- vs. Index-Based XML Multi-Query Processing

Navigation- vs. Index-Based XML Multi-Query Processing Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

logn D. Θ C. Θ n 2 ( ) ( ) f n B. nlogn Ο n2 n 2 D. Ο & % ( C. Θ # ( D. Θ n ( ) Ω f ( n)

logn D. Θ C. Θ n 2 ( ) ( ) f n B. nlogn Ο n2 n 2 D. Ο & % ( C. Θ # ( D. Θ n ( ) Ω f ( n) CSE 0 Test Your name as it appears on your UTA ID Card Fall 0 Multiple Choice:. Write the letter of your answer on the line ) to the LEFT of each problem.. CIRCLED ANSWERS DO NOT COUNT.. points each. The

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Cpt S 223 Fall Cpt S 223. School of EECS, WSU Course Review Cpt S 223 Fall 2012 1 Final Exam When: Monday (December 10) 8 10 AM Where: in class (Sloan 150) Closed book, closed notes Comprehensive Material for preparation: Lecture slides & class notes

More information

Lecture Notes for Advanced Algorithms

Lecture Notes for Advanced Algorithms Lecture Notes for Advanced Algorithms Prof. Bernard Moret September 29, 2011 Notes prepared by Blanc, Eberle, and Jonnalagedda. 1 Average Case Analysis 1.1 Reminders on quicksort and tree sort We start

More information

Topics. Trees Vojislav Kecman. Which graphs are trees? Terminology. Terminology Trees as Models Some Tree Theorems Applications of Trees CMSC 302

Topics. Trees Vojislav Kecman. Which graphs are trees? Terminology. Terminology Trees as Models Some Tree Theorems Applications of Trees CMSC 302 Topics VCU, Department of Computer Science CMSC 302 Trees Vojislav Kecman Terminology Trees as Models Some Tree Theorems Applications of Trees Binary Search Tree Decision Tree Tree Traversal Spanning Trees

More information

MULTIMEDIA COLLEGE JALAN GURNEY KIRI KUALA LUMPUR

MULTIMEDIA COLLEGE JALAN GURNEY KIRI KUALA LUMPUR STUDENT IDENTIFICATION NO MULTIMEDIA COLLEGE JALAN GURNEY KIRI 54100 KUALA LUMPUR FIFTH SEMESTER FINAL EXAMINATION, 2014/2015 SESSION PSD2023 ALGORITHM & DATA STRUCTURE DSEW-E-F-2/13 25 MAY 2015 9.00 AM

More information

Course Review for Finals. Cpt S 223 Fall 2008

Course Review for Finals. Cpt S 223 Fall 2008 Course Review for Finals Cpt S 223 Fall 2008 1 Course Overview Introduction to advanced data structures Algorithmic asymptotic analysis Programming data structures Program design based on performance i.e.,

More information

Child Prime Label Approaches to Evaluate XML Structured Queries

Child Prime Label Approaches to Evaluate XML Structured Queries Child Prime Label Approaches to Evaluate XML Structured Queries Shtwai Abdullah Alsubai Department of Computer Science the University of Sheffield This thesis is submitted for the degree of Doctor of Philosophy

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Department of Computer Science and Technology

Department of Computer Science and Technology UNIT : Stack & Queue Short Questions 1 1 1 1 1 1 1 1 20) 2 What is the difference between Data and Information? Define Data, Information, and Data Structure. List the primitive data structure. List the

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition

More information

[ DATA STRUCTURES ] Fig. (1) : A Tree

[ DATA STRUCTURES ] Fig. (1) : A Tree [ DATA STRUCTURES ] Chapter - 07 : Trees A Tree is a non-linear data structure in which items are arranged in a sorted sequence. It is used to represent hierarchical relationship existing amongst several

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees Computer Science 0 Data Structures Siena College Fall 08 Topic Notes: Trees We ve spent a lot of time looking at a variety of structures where there is a natural linear ordering of the elements in arrays,

More information

Course Review. Cpt S 223 Fall 2009

Course Review. Cpt S 223 Fall 2009 Course Review Cpt S 223 Fall 2009 1 Final Exam When: Tuesday (12/15) 8-10am Where: in class Closed book, closed notes Comprehensive Material for preparation: Lecture slides & class notes Homeworks & program

More information

Course Review. Cpt S 223 Fall 2010

Course Review. Cpt S 223 Fall 2010 Course Review Cpt S 223 Fall 2010 1 Final Exam When: Thursday (12/16) 8-10am Where: in class Closed book, closed notes Comprehensive Material for preparation: Lecture slides & class notes Homeworks & program

More information

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο CSE 0 Name Test Fall 0 Multiple Choice. Write your answer to the LEFT of each problem. points each. The expected time for insertion sort for n keys is in which set? (All n! input permutations are equally

More information

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today XML Query Processing CPS 216 Advanced Database Systems Announcements (March 31) 2 Course project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in

More information

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014 Outline Gerênciade Dados daweb -DCC922 - XML Query Processing ( Apresentação basedaem material do livro-texto [Abiteboul et al., 2012]) 2014 Motivation Deep-first Tree Traversal Naïve Page-based Storage

More information

6.001 Notes: Section 31.1

6.001 Notes: Section 31.1 6.001 Notes: Section 31.1 Slide 31.1.1 In previous lectures we have seen a number of important themes, which relate to designing code for complex systems. One was the idea of proof by induction, meaning

More information

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

B-Trees. Version of October 2, B-Trees Version of October 2, / 22 B-Trees Version of October 2, 2014 B-Trees Version of October 2, 2014 1 / 22 Motivation An AVL tree can be an excellent data structure for implementing dictionary search, insertion and deletion Each operation

More information

A Note on Scheduling Parallel Unit Jobs on Hypercubes

A Note on Scheduling Parallel Unit Jobs on Hypercubes A Note on Scheduling Parallel Unit Jobs on Hypercubes Ondřej Zajíček Abstract We study the problem of scheduling independent unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between

More information

Trees. CSE 373 Data Structures

Trees. CSE 373 Data Structures Trees CSE 373 Data Structures Readings Reading Chapter 7 Trees 2 Why Do We Need Trees? Lists, Stacks, and Queues are linear relationships Information often contains hierarchical relationships File directories

More information

Graph and Digraph Glossary

Graph and Digraph Glossary 1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose

More information

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND

CSE 373 MAY 10 TH SPANNING TREES AND UNION FIND CSE 373 MAY 0 TH SPANNING TREES AND UNION FIND COURSE LOGISTICS HW4 due tonight, if you want feedback by the weekend COURSE LOGISTICS HW4 due tonight, if you want feedback by the weekend HW5 out tomorrow

More information

TREES. Trees - Introduction

TREES. Trees - Introduction TREES Chapter 6 Trees - Introduction All previous data organizations we've studied are linear each element can have only one predecessor and successor Accessing all elements in a linear sequence is O(n)

More information

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore

Announcements (March 31) XML Query Processing. Overview. Navigational processing in Lore. Navigational plans in Lore Announcements (March 31) 2 XML Query Processing PS 216 Advanced Database Systems ourse project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

Index-Trees for Descendant Tree Queries on XML documents

Index-Trees for Descendant Tree Queries on XML documents Index-Trees for Descendant Tree Queries on XML documents (long version) Jérémy arbay University of Waterloo, School of Computer Science, 200 University Ave West, Waterloo, Ontario, Canada, N2L 3G1 Phone

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer) COSC 2007 Data Structures II Final Exam Thursday, April 13 th, 2006 This is a closed book and closed notes exam. There are total 3 parts. Please answer the questions in the provided space and use back

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Monotone Constraints in Frequent Tree Mining

Monotone Constraints in Frequent Tree Mining Monotone Constraints in Frequent Tree Mining Jeroen De Knijf Ad Feelders Abstract Recent studies show that using constraints that can be pushed into the mining process, substantially improves the performance

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Trees, Part 1: Unbalanced Trees

Trees, Part 1: Unbalanced Trees Trees, Part 1: Unbalanced Trees The first part of this chapter takes a look at trees in general and unbalanced binary trees. The second part looks at various schemes to balance trees and/or make them more

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

Draw a diagram of an empty circular queue and describe it to the reader.

Draw a diagram of an empty circular queue and describe it to the reader. 1020_1030_testquestions.text Wed Sep 10 10:40:46 2014 1 1983/84 COSC1020/30 Tests >>> The following was given to students. >>> Students can have a good idea of test questions by examining and trying the

More information

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College!

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College! Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College! ADTʼs Weʼve Studied! Position-oriented ADT! List! Stack! Queue! Value-oriented ADT! Sorted list! All of these are linear! One previous item;

More information

( ) n 3. n 2 ( ) D. Ο

( ) n 3. n 2 ( ) D. Ο CSE 0 Name Test Summer 0 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply two n n matrices is: A. Θ( n) B. Θ( max( m,n, p) ) C.

More information

Twig Pattern Search in XML Database

Twig Pattern Search in XML Database Twig Pattern Search in XML Database By LEPING ZOU A thesis submitted to the Department of Applied Computer Science in conformity with the requirements for the degree of Master of Science University of

More information

Lecture 5: Suffix Trees

Lecture 5: Suffix Trees Longest Common Substring Problem Lecture 5: Suffix Trees Given a text T = GGAGCTTAGAACT and a string P = ATTCGCTTAGCCTA, how do we find the longest common substring between them? Here the longest common

More information

Algorithms and Data Structures (INF1) Lecture 8/15 Hua Lu

Algorithms and Data Structures (INF1) Lecture 8/15 Hua Lu Algorithms and Data Structures (INF1) Lecture 8/15 Hua Lu Department of Computer Science Aalborg University Fall 2007 This Lecture Trees Basics Rooted trees Binary trees Binary tree ADT Tree traversal

More information

Binary Trees, Binary Search Trees

Binary Trees, Binary Search Trees Binary Trees, Binary Search Trees Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete)

More information

Trees. Tree Structure Binary Tree Tree Traversals

Trees. Tree Structure Binary Tree Tree Traversals Trees Tree Structure Binary Tree Tree Traversals The Tree Structure Consists of nodes and edges that organize data in a hierarchical fashion. nodes store the data elements. edges connect the nodes. The

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Final Exam in Algorithms and Data Structures 1 (1DL210)

Final Exam in Algorithms and Data Structures 1 (1DL210) Final Exam in Algorithms and Data Structures 1 (1DL210) Department of Information Technology Uppsala University February 0th, 2012 Lecturers: Parosh Aziz Abdulla, Jonathan Cederberg and Jari Stenman Location:

More information

Scaling Similarity Joins over Tree-Structured Data

Scaling Similarity Joins over Tree-Structured Data Scaling Similarity Joins over Tree-Structured Data Yu Tang, Yilun Cai, Nikos Mamoulis The University of Hong Kong EPFL Switzerland {ytang, ylcai, nikos}@cs.hku.hk ABACT Given a large collection of tree-structured

More information

n 2 ( ) ( ) + n is in Θ n logn

n 2 ( ) ( ) + n is in Θ n logn CSE Test Spring Name Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply an m n matrix and a n p matrix is in: A. Θ( n) B. Θ( max(

More information

EE 368. Weeks 5 (Notes)

EE 368. Weeks 5 (Notes) EE 368 Weeks 5 (Notes) 1 Chapter 5: Trees Skip pages 273-281, Section 5.6 - If A is the root of a tree and B is the root of a subtree of that tree, then A is B s parent (or father or mother) and B is A

More information

COMP : Trees. COMP20012 Trees 219

COMP : Trees. COMP20012 Trees 219 COMP20012 3: Trees COMP20012 Trees 219 Trees Seen lots of examples. Parse Trees Decision Trees Search Trees Family Trees Hierarchical Structures Management Directories COMP20012 Trees 220 Trees have natural

More information

UNIT IV -NON-LINEAR DATA STRUCTURES 4.1 Trees TREE: A tree is a finite set of one or more nodes such that there is a specially designated node called the Root, and zero or more non empty sub trees T1,

More information

ADT 2009 Other Approaches to XQuery Processing

ADT 2009 Other Approaches to XQuery Processing Other Approaches to XQuery Processing Stefan Manegold Stefan.Manegold@cwi.nl http://www.cwi.nl/~manegold/ 12.11.2009: Schedule 2 RDBMS back-end support for XML/XQuery (1/2): Document Representation (XPath

More information

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree. The Lecture Contains: Index structure Binary search tree (BST) B-tree B+-tree Order file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture13/13_1.htm[6/14/2012

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Chapter 3 - Graphs Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. Directed

More information

Stacks, Queues and Hierarchical Collections

Stacks, Queues and Hierarchical Collections Programming III Stacks, Queues and Hierarchical Collections 2501ICT Nathan Contents Linked Data Structures Revisited Stacks Queues Trees Binary Trees Generic Trees Implementations 2 Copyright 2002- by

More information

CSE 100 Advanced Data Structures

CSE 100 Advanced Data Structures CSE 100 Advanced Data Structures Overview of course requirements Outline of CSE 100 topics Review of trees Helpful hints for team programming Information about computer accounts Page 1 of 25 CSE 100 web

More information

Recursive Data Structures and Grammars

Recursive Data Structures and Grammars Recursive Data Structures and Grammars Themes Recursive Description of Data Structures Grammars and Parsing Recursive Definitions of Properties of Data Structures Recursive Algorithms for Manipulating

More information

XML: Extensible Markup Language

XML: Extensible Markup Language XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified

More information

Thus, it is reasonable to compare binary search trees and binary heaps as is shown in Table 1.

Thus, it is reasonable to compare binary search trees and binary heaps as is shown in Table 1. 7.2 Binary Min-Heaps A heap is a tree-based structure, but it doesn t use the binary-search differentiation between the left and right sub-trees to create a linear ordering. Instead, a binary heap only

More information

n 2 C. Θ n ( ) Ο f ( n) B. n 2 Ω( n logn)

n 2 C. Θ n ( ) Ο f ( n) B. n 2 Ω( n logn) CSE 0 Name Test Fall 0 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to find the maximum of the n elements of an integer array is in: A.

More information

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms

More information

Postfix (and prefix) notation

Postfix (and prefix) notation Postfix (and prefix) notation Also called reverse Polish reversed form of notation devised by mathematician named Jan Łukasiewicz (so really lü-kä-sha-vech notation) Infix notation is: operand operator

More information

CSI33 Data Structures

CSI33 Data Structures Outline Department of Mathematics and Computer Science Bronx Community College November 13, 2017 Outline Outline 1 C++ Supplement.1: Trees Outline C++ Supplement.1: Trees 1 C++ Supplement.1: Trees Uses

More information

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits.

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits. Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: o There is a unique simple path between any 2 of its vertices. o No loops. o No multiple edges. Example

More information

) $ f ( n) " %( g( n)

) $ f ( n)  %( g( n) CSE 0 Name Test Spring 008 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to compute the sum of the n elements of an integer array is: # A.

More information

CS61BL. Lecture 3: Asymptotic Analysis Trees & Tree Traversals Stacks and Queues Binary Search Trees (and other trees)

CS61BL. Lecture 3: Asymptotic Analysis Trees & Tree Traversals Stacks and Queues Binary Search Trees (and other trees) CS61BL Lecture 3: Asymptotic Analysis Trees & Tree Traversals Stacks and Queues Binary Search Trees (and other trees) Program Efficiency How much memory a program uses Memory is cheap How much time a

More information

Tree Structures. A hierarchical data structure whose point of entry is the root node

Tree Structures. A hierarchical data structure whose point of entry is the root node Binary Trees 1 Tree Structures A tree is A hierarchical data structure whose point of entry is the root node This structure can be partitioned into disjoint subsets These subsets are themselves trees and

More information

COMP3121/3821/9101/ s1 Assignment 1

COMP3121/3821/9101/ s1 Assignment 1 Sample solutions to assignment 1 1. (a) Describe an O(n log n) algorithm (in the sense of the worst case performance) that, given an array S of n integers and another integer x, determines whether or not

More information

INF2220: algorithms and data structures Series 1

INF2220: algorithms and data structures Series 1 Universitetet i Oslo Institutt for Informatikk A. Maus, R.K. Runde, I. Yu INF2220: algorithms and data structures Series 1 Topic Trees & estimation of running time (Exercises with hints for solution) Issued:

More information

( ) ( ) C. " 1 n. ( ) $ f n. ( ) B. " log( n! ) ( ) and that you already know ( ) ( ) " % g( n) ( ) " #&

( ) ( ) C.  1 n. ( ) $ f n. ( ) B.  log( n! ) ( ) and that you already know ( ) ( )  % g( n) ( )  #& CSE 0 Name Test Summer 008 Last 4 Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time for the following code is in which set? for (i=0; i

More information

Range Minimum Queries Part Two

Range Minimum Queries Part Two Range Minimum Queries Part Two Recap from Last Time The RMQ Problem The Range Minimum Query (RMQ) problem is the following: Given a fixed array A and two indices i j, what is the smallest element out of

More information

Algorithms and Data Structures

Algorithms and Data Structures Lesson 3: trees and visits Luciano Bononi http://www.cs.unibo.it/~bononi/ (slide credits: these slides are a revised version of slides created by Dr. Gabriele D Angelo) International

More information

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE

A6-R3: DATA STRUCTURE THROUGH C LANGUAGE A6-R3: DATA STRUCTURE THROUGH C LANGUAGE NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered in the TEAR-OFF

More information

1 The range query problem

1 The range query problem CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition

More information

3 Competitive Dynamic BSTs (January 31 and February 2)

3 Competitive Dynamic BSTs (January 31 and February 2) 3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within

More information

Lecture 3 February 9, 2010

Lecture 3 February 9, 2010 6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees

More information

R13. II B. Tech I Semester Supplementary Examinations, May/June DATA STRUCTURES (Com. to ECE, CSE, EIE, IT, ECC)

R13. II B. Tech I Semester Supplementary Examinations, May/June DATA STRUCTURES (Com. to ECE, CSE, EIE, IT, ECC) SET - 1 II B. Tech I Semester Supplementary Examinations, May/June - 2016 PART A 1. a) Write a procedure for the Tower of Hanoi problem? b) What you mean by enqueue and dequeue operations in a queue? c)

More information

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set.

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set. 1 In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set. This presentation presents how such a DAG structure can be accessed immediately

More information

Data Structures Question Bank Multiple Choice

Data Structures Question Bank Multiple Choice Section 1. Fundamentals: Complexity, Algorthm Analysis 1. An algorithm solves A single problem or function Multiple problems or functions Has a single programming language implementation 2. A solution

More information

DATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305

DATA STRUCTURE : A MCQ QUESTION SET Code : RBMCQ0305 Q.1 If h is any hashing function and is used to hash n keys in to a table of size m, where n

More information

Analysis of Algorithms

Analysis of Algorithms Analysis of Algorithms Trees-I Prof. Muhammad Saeed Tree Representation.. Analysis Of Algorithms 2 .. Tree Representation Analysis Of Algorithms 3 Nomenclature Nodes (13) Size (13) Degree of a node Depth

More information

Course Review for. Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Course Review for. Cpt S 223 Fall Cpt S 223. School of EECS, WSU Course Review for Midterm Exam 1 Cpt S 223 Fall 2011 1 Midterm Exam 1 When: Friday (10/14) 1:10-2pm Where: in class Closed book, closed notes Comprehensive Material for preparation: Lecture slides & in-class

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information