Efficient schema-based XML-to-Relational data mapping

Size: px
Start display at page:

Download "Efficient schema-based XML-to-Relational data mapping"

Transcription

1 Information Systems ] (]]]]) ]]] ]]] Efficient schema-based XML-to-Relational data mapping Mustafa Atay, Artem Chebotko, Dapeng Liu, Shiyong Lu, Farshad Fotouhi Department of Computer Science, Wayne State University, Detroit, MI 48202, USA Received 2 March 2005; received in revised form 4 December 2005; accepted 15 December 2005 Recommended by: Prof. J. Van den Bussche Abstract Storing and querying XML documents using a RDBMS is a challenging problem since one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: schema mapping, data mapping and query mapping. In this paper, we propose: (i) a lossless schema mapping algorithm to generate a database schema from a DTD, which makes several improvements over existing algorithms, (ii) two linear data mapping algorithms based on DOM and SAX, respectively, to map ordered XML data to relational data. To our best knowledge, there is no published linear schema-based data mapping algorithm for mapping ordered XML data to relational data. Experimental results are presented to show that our algorithms are efficient and scalable. r 2006 Elsevier B.V. All rights reserved. Keywords: XML; Relational; Schema-based; Ordered; Mapping; Shredding 1. Introduction XML has emerged as a standard for representing and exchanging data over the World Wide Web. The increasing amount of XML documents requires the need to store and query XML documents efficiently. Numerous researchers have proposed using relational databases to store and query XML documents [1 9]. The main challenge of this relational approach is that one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered Corresponding author. Tel.: ; fax: addresses: matay@wayne.edu (M. Atay), artem@wayne.edu (A. Chebotko), dliu@wayne.edu (D. Liu), shiyong@wayne.edu (S. Lu), fotouhi@wayne.edu (F. Fotouhi). nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: Schema mapping: Either a fixed generic database schema (schema-oblivious XML storage) is used, or a database schema is generated from an XML schema or DTD (schema-based XML storage) for the storage of XML documents. To support the ordered nature of the XML data model, an order encoding scheme such as those proposed in [8] can be used and additional columns are introduced to store the ordinals of XML elements. Data mapping, which shreds an input XML document into relational tuples and inserts them into the relational database whose schema is generated in the schema mapping phase /$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi: /j.is

2 2 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Query mapping, which translates an XML query into its relational equivalent (i.e. SQL statements or relational algebra expressions), executes them against the database and returns the query result to the user. If the query result is to be returned as XML documents, then a reconstruction algorithm [10] is needed to reconstruct the XML subtrees rooted at the matching nodes. While existing work has focused on the problems of schema mapping [1 7,9] and query mapping [8,11 19], there is no published linear schema-based data mapping algorithm for mapping ordered XML documents to relational data. Firstly, the schemaoblivious storage schemes [1 3,16] use a simple, fixed database schema for XML storage, and the data mapping problem in this context has been addressed by Grust et al. in [20]. Secondly, while the schema-based storage schemes [4 7,9] have presented different strategies to generate a good database schema from an XML schema, there has been no published work presenting algorithms for mapping XML documents to relational data that will fit into the generated database schema and preserve the XML document order. Tatarinov et al. [8] focus on the investigation of three order encoding schemes for storing and querying XML documents. Although it presents a brief discussion of schema-based order-preserving schema mapping, no algorithmic details are given for the schemabased data mapping. Thirdly, existing works on query mapping [8,11 15,17 19] assume that the database has already been populated with XML documents, and no algorithms have been published for shredding XML documents into relational data in the context where the database schema is generated from an XML schema. The data translation algorithm presented in [21] does not support recursive XML schemas and does not consider the ordered nature of XML documents. The data loading algorithms defined in [16,20] support the schema-oblivious storage scheme and use a SAX-based approach. Finally, our previous data mapping algorithm presented in [22] is not order-preserving and uses only a DOM-based approach. Since the target database schema might be complex and its corresponding XML-to-Relational schema mapping is non-trivial, it is challenging to design an efficient schema-based data mapping algorithm. This is one major motivation of our research. The main contributions of this paper are: 1. We propose a schema mapping algorithm, ODTDMap, which generates a database schema from an XML DTD for storing and querying ordered XML documents. Although the main idea of ODTDMap is similar to the shared inlining algorithm [4,8] and its variant [9], ODTDMap makes several improvements over them as discussed at the end of Section We propose an efficient DOM-based linear data mapping algorithm, OXInsert, which shreds and composes input XML documents into relational tuples and inserts them into the relational database according to the schema generated by ODTDMap. OXInsert is based on our previous data mapping algorithm XInsert [22], but it takes into account the ordered nature of the input XML documents and set-valued attributes that were not considered by XInsert. 3. We propose an efficient and linear SAX-based data mapping algorithm, SDM, which shreds and composes ordered XML documents into relational tuples and inserts them into the relational database according to the schema generated by ODTDMap. Our experimental study shows that the proposed algorithms ODTDMap, OXInsert, and SDM are efficient and scalable. We show that our data mapping algorithms OXInsert and SDM are efficient under different schema mapping algorithms other than ODTDMap in the experimental study. Although query mapping is an essential part of a complete mapping scheme, mapping XML queries into their SQL counterparts is not the focus of this paper. We refer the interested readers to recently proposed query mapping algorithms [8,11,12,14,15,17 19]. We assume the reader is familiar with XML [23] and its related technologies, such as DTD [23], DOM [24] and SAX [25]. Organization: The rest of the paper is organized as follows. Section 2 presents an overview of related work. The formalization of a schema-based relational XML storage system is given in Section 3. Section 4 gives a brief description of our schema mapping algorithm ODTDMap. Section 5 identifies the main issues for data mapping and describes our proposed data mapping algorithms OXInsert and SDM. Section 6 presents an experimental study of the time performance of ODTDMap, OXInsert and SDM algorithms. Finally, Section 7 concludes the paper and points out some potential future work.

3 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 3 2. Related work Three major approaches have been proposed for storing and querying XML data. The first approach is to develop native XML databases that support the XML data model and XML query languages directly. This includes Software AG s Tamino XML Server [26], IXIA s TEXTML Server [27], Sonic Software s extensible Information Server [28] (formerly excelon s XIS) and MODIS s Sedna Native XML DBMS [29]. The advantage of this native approach is that XML data can be stored and retrieved in their original formats and no additional mappings or translations are needed. Furthermore, most native XML databases have the ability to perform sophisticated full-text searches including full thesaurus support, word stubbing (to match all forms of a word: run, ran, running) and proximity searches. The disadvantage is that due to the document-centric nature of these databases, complex searches or aggregations might be cumbersome. The second approach is to use existing mature technologies, such as relational DBMSs or objectoriented DBMSs, to store and query XML data [1 9]. The main challenge of this approach is that one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the target data model. This usually requires various mappings such as schema mapping, data mapping and query mapping to be performed between the two data models. Therefore, the main issue is to develop efficient algorithms to perform these mappings. This approach includes two categories of methods: schema-oblivious XML storage [1 3,16], which uses a fixed generic database schema for XML storage, and schema-based XML storage [4 7,9], which uses a database schema generated from an XML schema for XML storage. The third approach is to use the XML support enabled by commercial database systems. Currently, most major databases, such as SQL Server [30], Oracle [31] and DB2 [32], provide mechanisms to store and query XML data by extending the existing data model with an additional XML data type (e.g., XMLType in Oracle 10g) so that a column of this data type can be defined and used to store XML data. In addition, a set of methods is associated with this new XML data type to process, manipulate and query stored XML data. As discussed above, these approaches have their pros and cons, and the choice has to be made based on the requirement of the application at hand and the advancement of these approaches at the time that the choice has to be made. Readers are referred to an evaluation study of alternative XML storage strategies [33] for more details. 3. Schema-based relational XML storage system Our schema-based relational XML storage system contains two major components: 1. Schema mapping, which takes an XML DTD as input, and outputs a database schema and a s- mapping, which assigns each element/attribute in the DTD to the relation in which the element/ attribute is going to be stored. 2. Data mapping, which takes a valid XML document and the output of a schema mapping as input, shreds the XML document into relational tuples, and inserts them into the relational database. In the following, we formalize the notions of s- mapping, schema mapping and data mapping, respectively: Definition 3.1 (s-mapping). Given a DTD D with element-type set E and attribute-type set A, and a database schema R, a s-mapping is a function s : ðe [ AÞ! R, such that given an attribute/elementtype e 2ðA [ EÞ, sðeþ is the relation in which the instances of e will be stored. Definition 3.2 (Schema mapping). A schema mapping is a function SM that assigns to each DTD D a pair ðr; sþ to store the XML documents conforming to D, where R is a database schema and s is a s- mapping over R. Definition 3.3 (Data mapping). A data mapping DM is a function that assigns to each triple ðx; R; sþ a set of relational tuples T, where X is a valid XML document, R is a database schema, s is a s-mapping over R, and T is the result of shredding X into relational tuples according to the layout described by R and s. 4. Schema mapping algorithm ODTDMap In this section, we propose our schema mapping algorithm, ODTDMap, which generates a database schema from an XML DTD for storing and querying ordered XML documents. Several

4 4 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] approaches exist in the literature. One approach is to map each DTD element to a separate table [4]. The drawback of this approach is that it might result in too many tables and thus expensive join of multiple tables for a query. Another approach is to map all DTD elements into a single fixed table [2]. This approach might result in a large table and expensive self-join of the table for a query. A better approach, which our ODTDMap algorithm takes, is to map a child and its parent to the same table when the child appears at most once under its parent. This operation is called inlining and was first introduced in [4]. The inlining approach reduces the number of tables in the generated database schema and thus the number of joins for a query. Our ODTDMap algorithm is shown in Fig. 1.Itis inspired by the shared inlining algorithm introduced in [4]. However, we made several improvements over it which are described in Section 4.4. The ODTDMap algorithm consists of the following three main steps: 1. Simplifying DTD: Since a DTD expression might be very complex due to its hierarchical nesting capability, this step greatly simplifies the mapping procedure. 2. Creating and inlining DTD graph: We create the corresponding DTD graph based on the simplified DTD, and then inline as many descendant nodes as possible to a parent node in the DTD graph. Thus, all descendants of an XML element e which occur at most once under e will be mapped to the same relation with e. 3. Generating database schema and s-mapping: After a DTD graph is inlined, we generate a database schema and s-mapping based on the inlined DTD graph. The section ends with a discussion on the improvements we made over existing schema mapping algorithms. 00 Algorithm ODTDMap 01 Input: DTD D 02 Output: Database Schema R, σ-mapping σ 03 Begin 04 Simplify the DTD D 05 Create the DTD graph G 06 IG = Inline(G) //create the inlined DTD graph 07 GenerateRelSigma(IG) //generate the relations and σ-mapping 08 End Fig. 1. Schema mapping algorithm ODTDMap Simplifying DTDs DTDs, in general, can be complex and generating database schemas for these DTDs can be an awkward task. The first step in our schema mapping algorithm is to simplify a DTD into a canonical form such that it can easily be translated into a database schema which will be able to store the XML documents conforming to the original, unsimplified DTD. The occurrence operators in a DTD can be classified into two groups based on the underlying relationship between parent and child elements: (i) operators that lead to a one-to-one relationship: {?,, }, (ii) operators that lead to a one-to-many relationship: { þ, }. It is sufficient to generate a complete relational schema for the given DTD if we can distinguish between those two relationship groups. Thus, we can replace the first operator in each group with the second one which results in reducing the types of occurrence operators from four to two. Although the processing of the choice operator j seems to be a problematic issue in the schema mapping process, we can deal with it easily. Let us consider the following DTD expression: h!element a ðb j cþi. The element a can contain elements b or c but not both at the same time. However, we can introduce columns b and c together in the table corresponding to element a. During the data mapping phase, if a contains child b, then we assign null to c column and vice versa. Thus, there is not much difference between the given DTD expression and h!element a ðb; cþi regarding the target database schema. We define a set of transformation rules in Fig. 2 to transform a DTD into a canonical form. Example 4.1. Using the simplification rules shown in Fig. 2, one can transform h!element a ððb þ ; c ; d?þ?; ðe?; f; ðg ; h?þ þ Þ?Þi to a simplified version h!element a ðb ; c ; d; e; f; g ; h Þi. The following DTD expressions are the ones which are changed as a result of applying the simplification rules given in Fig. 2 to the DTD shown in Fig. 3: h!element book ðtitle, author ; chapter ; citationþi h!element section ðparagraph ; section Þi

5 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 5 Fig. 2. DTD simplification rules. While the above simplification procedure maintains the parent child relationships, it does not maintain the document order. However, we introduce additional ordinal attributes to record the order of the document. Thus, any XML query, including the ones which require the document order information, can be evaluated over the generated database schema. Our set of rules is essentially an improvement of the transformation rules defined in the shared inlining algorithm [4]. Our set of rules is complete since we consider all possible combinations of operators and XML elements, whereas the shared inlining algorithm only lists some important combinations. For example, there is no rule that corresponds to ðe 1 jje n Þ? in the shared inlining algorithm Creating and inlining DTD graphs In this step, we create the corresponding DTD graph based on the simplified DTD and do the inlining operation on the DTD graph. The notion of the DTD graph is defined as follows: Fig. 3. Sample XML DTD xbib.dtd. Our simplification rules will transform complex DTD expressions into a flat canonical form as it loosens some DTD constraints. However, the DTD simplification procedure will preserve sufficient information to generate a database schema with necessary tables and columns to store XML data. The actual constraint information can be derived from the original DTD and introduced to the database schema by revisiting the original DTD later. Interested readers are referred to [5,34] where capturing semantic knowledge from a DTD and introducing it to a database schema through semantic constraints are discussed in detail. Two pieces of information are essential for the reconstruction of an XML document from its relational representation and for answering XML queries against the relational storage of an XML document: (1) the parent child relationships between XML elements and (2) the document order. Definition 4.2 (DTD graph). The structure of a DTD D can be represented by a directed graph G ¼ðV; EÞ, where V is the set of vertices and E is the set of edges. The vertices represent elements and attribute types in D, and the edges represent their parent child relationships. Each vertex is labeled with the name of the corresponding element or attribute type. An edge is labeled by if it is incident to a vertex which can appear more than once under its parent in the corresponding XML document, otherwise no label is used. For example, the DTD graph shown in Fig. 4 corresponds to the simplified form of DTD given in Fig. 3. While each element appears only once in the DTD graph, attributes appear as many times as they appear in the DTD. Node identifiers for the attributes in the DTD graph are preceded by 1 For a set-valued attribute such as IDREFS or NMTOKENS, the edge between the set-valued attribute and its parent (the owner element) is labeled by in the DTD graph. Thus, we can 1 In implementation, to ensure the uniqueness of attribute names, we can use the concatenation of an attribute name and its owner element name as the attribute identifier. For attribute of element book can have a label book.id.

6 6 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] xbib title, citation} author {fname, lname } paragraph section manage the set-valued attributes easily in further steps. In Fig. 4, cites is a set-valued attribute. Therefore, its incoming edge is labeled with. After we create the DTD graph for the simplified DTD, we inline as many descendant elements to an element as possible. The rationale is that these inlined elements will eventually produce a single relation. Therefore, we only inline a child c to a parent p when p can contain at most one occurrence of c in order to avoid introducing redundancy into the generated relation. After the simplification procedure, any input DTD is now in a canonical form, i.e., each DTD expression is a tuple of distinct element names or their stars (). As a result, in the corresponding DTD graph, an edge is labeled by a star () if the edge is leading to an element with a and no label is put otherwise. Thus, if an edge has a as its label, we call it a star edge, otherwise, we call it a normal edge. We define the notion of an inlinable node, an inlinable subtree and a shared node in a DTD graph as follows: @publisher author title chapter fname lname section paragraph Fig. 4. DTD graph of xbib.dtd. Fig. 5. Inlined DTD graph of xbib.dtd. Definition 4.3 (Inlinable node). Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge. Definition 4.4 (Inlinable subtree). Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree rooted at e. This subtree is called the inlinable subtree for the node e. Definition 4.5 (Shared node). Given a DTD graph, a node is called a shared node if it has more than one incoming edge. Using our inlining procedure, the DTD graph shown in Fig. 4, will be transformed into the inlined DTD graph shown in Fig. 5. Our inlining procedure considers the following three cases which are illustrated in Fig Case 1: Node a is connected to node b by a normal edge and b has no other incoming edges. In this case, a can contain at most one occurrence of b, and we combine node b into a while maintaining the parent child relationships between b and its children. 2. Case 2: Node a is connected to node b by a normal edge and b has other incoming edges. In this case, we do not combine b into a since b has multiple parents. 3. Case 3: Node a is connected to node b by a star edge. In this case, each a can contain multiple occurrences of b, and we do not combine b into a. Only Case 1 allows us to inline an element to its parent. While Case 2 does not allow inlining due to a shared node, Case 3 does not allow inlining to avoid redundancy due to the multiple occurrences of a child element in its parent caused by the operator. Example 4.6. In Fig. 7A, nodes b and d are inlinable but nodes a and c are not inlinable. The inlinable subtree for a contains nodes a and b, whereas the inlinable subtree for c contains nodes c and d. In m a b n m a,b n m a n Case 1 Case 2 Case 3 Fig. 6. Three cases for inlining. b a b

7 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 7 a d a, b c, d b c (A) (B) a b c g a, b, c, d g d e (C) f (D) e, f Fig. 7. Inlining DTD graphs. Fig. 7C, nodes b d and f are inlinable, but nodes a, e and g are not inlinable. The inlinable subtree for a contains nodes a d, and the inlinable subtree for node e contains nodes e and f. While there is no shared node in Fig. 7A, the only shared node in Fig. 7C is node e. The DTD graph shown in Fig. 7A will be inlined into one shown in Fig. 7B, and the DTD graph shown in Fig. 7C will be inlined into one shown in Fig. 7D. The notion of the inlinable subtree formalizes the intuition of inlining as many descendant elements as possible to an element. We illustrate our inlining algorithm in pseudocode in Fig. 8. Essentially, it uses a depth-first search strategy to identify the inlinable subtree for each node and then inline that subtree to its root. A field inlinedset of set type is introduced for each node e to represent the set of nodes that has been inlined to this node e (initially e:inlinedset ¼fg). For example, in Fig. 7C, after the inlining procedure, a:inlinedset ¼fb; c; dg. The algorithm is efficient as indicated by the following theorem. Theorem 4.7 (Time complexity). The time complexity of our inlining algorithm is OðnÞ, where n is the number of nodes in the input DTD graph. Proof. This is obvious since each node of the DTD graph is visited at most once. & Fig. 8. The inlining procedure Generating database schema and s-mapping After a simplified DTD graph is inlined, the last step is to generate a database schema based on this inlined DTD graph and generate the schema mapping information which will be used in the data mapping process later. The procedure to generate the database schema and s-mapping is given in Fig. 9. For each node e in the inlined DTD graph, a relation e is generated. Basically, in the generated database schema, we associate each element e with a unique ID. We also introduce a unique f :ID for each element type f in the inlined set of e. The rationale behind introducing an ID or f.id for each element is to be able to store the order of XML elements in the relational tables. It is mentioned in [8] that no ordinal ID will be required for inlined elements. However, as we will show in Section 4.4, such a mapping scheme is lossy. Our mapping scheme is lossless and stores sufficient information in the relational database to reconstruct the original XML document. A complete proof, which shows that our mapping scheme is lossless, is in [10]. Attribute parentid is introduced for each noninlinable element to preserve the parent child relationship and, thus, the tree structure of an XML document. We do not need to introduce an

8 8 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Fig. 9. The database schema and s-mapping generation procedure. Fig. 10. Database schema for xbib.dtd. attribute parentid for inlinable elements since they are stored in the same tuple with their parent. To facilitate the processing of recursive XML queries (queries with == axis), each element e is associated with an attribute endid, which stores the maximum ID of the descendants of e. 2 We introduce f.endid for each element type f in the inlined set of e for the same purpose. We introduce attribute parenttype if the node in the inlined DTD graph has more than one parent (shared node). Thus, the attribute parenttype facilitates efficient selection of descendants of a particular parent. A column e is introduced in the database schema for each non-inlinable leaf element type to store its textual content. Similarly, column f is introduced 2 Leaf elements have the same ID and endid values. As such, we can omit the endid to save space. for each leaf element or attribute type f in the inlined set of e. Obviously, if the element type is EMPTY, we do not introduce such a column. The database schema shown in Fig. 10 is generated for the inlined DTD graph given in Fig. 5 by the schema generation procedure explained above. After generating the database schema, both the database schema and the s-mapping that maps element and attribute types to the relational schemas in which they should be stored are output. This output is used by our data mapping algorithms in Section 5 to actually shred XML documents into relational tuples Discussion Although the main idea of ODTDMap is similar to existing algorithms [4,8,9], ODTDMap made

9 several improvements over them: M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 9 Recursion: The standard shared-inlining algorithm [4] defines a rule to deal with two mutually recursive elements. It is not clear how to handle a DTD with a cycle consisting of more than two elements. ODTDMap can handle arbitrary cycles in DTDs by just checking the incoming edges to the current node irrespective of other nodes in the DTD graph. Therefore, ODTDMap deals with cycles naturally without requiring an explicit check of the existence of cycles which is required by the standard shared-inlining algorithm. Losslessness: ODTDMap is lossless in the sense that the generated database schema can store enough structural information to reconstruct the original XML documents and support the storage and query of ordered XML documents. We are able to reconstruct the original XML document in the given document order. In contrast, the shared-inlining algorithm [4] and its variant [9] do not support the ordered nature of XML documents. Although [8] proposes the Global, Local and Dewey Order schemes and discusses their applications to the schema-less case, no details are presented for the schemabased case. The authors suggest that there is no need to have a separate column for storing the order information of inlined elements, since the position of such elements can be determined from the position of their parent element and the document schema. This is not true. For example, consider the DTD and the sample XML document shown in Fig. 11A and B, respectively. The ordered shared-inlining will create the database shown in Fig. 11C, in which the order information of the inlined element C is lost, and there is no way to determine whether the element B comes before or after element C; therefore, the original XML document cannot be reconstructed. On the other hand, our ODTDMap will create a database shown in Fig. 11D, where we associate an ID with the inlined element C as well. Thus, it will support the reconstruction of the original XML document. Efficient support for XML queries: To facilitate the processing of XML queries, each non-leaf element e is associated with an endid which stores the maximum ID of the descendants of e. In this way, one can efficiently identify all the following and preceding elements of a given element as well as its descendants. (A) (C) Set-valued attributes: Existing schema mapping algorithms [4,8,9] have not considered set-valued attributes such as IDREFS and NMTOKENS. In ODTDMap, we connect a set-valued attribute to its owner element with a star edge in the DTD graph and map it to a separate relation (see how the cites attribute in Fig. 3 is mapped). 5. Data mapping As the target database schema might be complex and its corresponding XML-to-Relational schema mapping is non-trivial, it is challenging to design an efficient schema-based data mapping algorithm. The main challenging issues include the following: Varying document structure: XML documents have varying structures due to the optional occurrence operators?,, and choice operator j used in the underlying DTD, unlike relational tables which always have a fixed structure. For example, in the XML document tree given in Fig. 13, which corresponds to the sample XML document shown in Fig. 12, the nodes with ordinal numbers 10 and 16 are of the same element type. However, their subtrees are quite different. While there is no paragraph node among the child nodes of node 10, there is no section node among the child nodes of node 16. A data mapping algorithm should keep track of the missing child nodes and handle structural differences between the same type of element nodes due to the optional operators using efficient data structures. Scalability: In an online environment, where new XML documents might be inserted into the database on-the-fly, a data mapping algorithm (B) (D) Fig. 11. A lossy versus a lossless mapping.

10 10 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Fig. 12. A sample XML document xbib.xml. will be used frequently. Thus, it is critical that a data mapping algorithm is efficient and scales well with the size of XML documents. It is obvious that a linear data mapping algorithm will fulfill this requirement the best. In the following sections, we present two data mapping algorithms, DOM-based OXInsert and SAX-based SDM to address these issues. An appropriate ordering technique is needed to keep the ordered XML documents in the unordered structure of relational tables. Several order encoding methods are proposed in [8]. Their experimental results show that the global order encoding performs the best on query intensive workloads. However, our data mapping algorithms can be easily adapted to other order encoding schemes proposed in [8] DOM-based approach We use a tree data model to represent the XML documents since each valid XML document is rooted at a unique element which is specified by DOCTYPE declaration in the DTD. We first introduce our XML Tree data model, which is based on W3C s Document Object Model (DOM) [24]. The details of our XML document model are given in Definition 5.1. Definition 5.1 (XML Tree). We model an XML document D as an XML element tree (XML Tree) T, in which nodes represent XML elements and edges represent parent child relationships between XML elements. The XML Tree T is an ordered tree and its nodes can have attributes and values associated with them. The root of XML Tree T is denoted by T:root. For each element node e in T,we use the following notations: e:name, the name of XML element e. e:eid, the global ID of XML element e which is given based on the pre-order tree traversal. e:endid, the largest descendant ID of node e and e:id ¼ e:endid if e is a leaf node in T. e:attributes, the set of XML attributes of e. We also denote the attributes of e by e:a 1 ;...; e:a n and the names and values of these attributes by e:a i :name and e:a i :value, respectively (i ¼ 1;...; n). e:value, the value of e, where e:value ¼ NULL if e is a non-leaf node. e:parent, the parent node of e, where e:parent ¼ NULL if e is the root node of T. e:children, the ordered sequence of child nodes of e, and e:children ¼ NULL if e is a leaf n ode of T. We also denote the children of e by e:c 1 ;...; e:c m.

11 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 11 xbib(1) book (2) title (3) author (4) author (7) chapter (10) chapter (16) fname (5) lname (6) fname (8) lname (9) section (11) paragraph (17) paragraph (18) section (12) section (14) paragraph (13) paragraph (15) Fig. 13. XML Tree of xbib.xml. The XML Tree data model has some distinctions from W3C s DOM specification. In contrast to traditional XML DOM tree, the XML Tree does not consider XML PCDATA values as nodes but consider them as data fields of XML element nodes. It has an ID field for each node which is assigned based on the pre-order tree traversal as an XML document is being parsed. Besides an ID field, each node is assigned an endid field which denotes the largest descendant ID of that node. This distinction is only for the convenience of presentation; thus, the algorithm proposed in this paper can be implemented directly on the standard DOM model. The XML Tree for the XML document shown in Fig. 12 is illustrated in Fig. 13. In an XML Tree, each node e is labeled by e.name(e.eid, e.endid, e.value, e:a 1 :name ¼ e:a 1 :value;...; e:a n :name ¼ e:a n :value) and e:value is omitted when e is a non-leaf node where e:value ¼ NULL. However, in Fig. 13, we just include e.name and e.eid for simplicity. We differentiate an element node e in an XML Tree from its corresponding type in the DTD which is denoted by typeðeþ. For example, we use the expression sðtypeðeþþ to find the corresponding table for e. Our DOM-based data mapping algorithm OXInsert is shown in Fig. 14. We design OXInsert as an iterative algorithm. The documents conforming to a DTD might be nested with arbitrary depth if the input DTD is recursive (cyclic XML schema). One concern of a recursive data mapping algorithm might be memory space requirement as a result of numerous recursive calls. Therefore, we avoid using a recursive design for OXInsert algorithm. The main idea of OXInsert is that it uses queue q to process all non-inlinable XML elements, and for each such element e, it uses queue r to process all XML elements that are inlinable to e. Lines process each non-inlinable XML element e dequeued from q. In particular, a tuple tp is created in the table corresponding to typeðeþ denoted by sðtypeðeþþ. The data values of node e are retrieved and loaded to the corresponding fields of tuple tp in procedure loadtupledataðþ (line 09). Set-valued attributes are dealt with processsetattrðþ procedure where the values of a set-valued attribute are stored in a separate table. Note that we deal with the issue of varying document structure elegantly: on one hand, all missing nodes will have NULL values in their corresponding columns as they are all initialized to NULL. The corresponding column of a node is filled with a value only when the node is present. On the other hand, for two elements of the same type, even though the structures of their subtrees might vary, we process each of their descendants using the s-mapping in a consistent and correct manner. Since the information of inlinable elements are stored in the same tuple as their parents, for each non-inlinable element e, we need to retrieve the data values of the elements that are inlinable to e. This is achieved by using another queue r to process the descendants of e, which are inlinable to e in lines During this process, if we encounter any non-inlinable element, it will be enqueued into q for further processing (line 17). For each element f that is inlinable to e, we fill appropriate fields of the tuple tp corresponding to e with the data values retrieved from node f in procedure loadtupledataðþ. The set-

12 12 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Fig. 14. DOM-based data mapping algorithm OXInsert. valued attributes of f are dealt with processsetattrðþ procedure where the values of a set-valued attribute are stored in a separate table. The descendants of f are enqueued into r for further processing (lines 20 22). Procedure loadtupledataðþ retrieves the data of any node n and loads it to the tuple tp. Parameter prefix helps to overcome the difference in relational attribute names of non-inlinable and inlinable nodes in XML Tree T. The shredding of set-valued attributes of node n is processed by procedure processsetattrðþ. Procedure processsetattrðþ processes the setvalued attribute e:a of a particular element e. Each such attribute is mapped to a separate table, which is denoted by sðtypeðe:aþþ, unlike a single-valued attribute which is mapped to the same table with its owner element. A tuple with a sequential index ID, which is disjoint from the IDs in the XML tree, a parent ID and a value is inserted for each value of the set-valued attribute e:a to the table sðtypeðe:aþþ (lines 3 7). To analyze the time complexity of algorithm OXInsert, we first present some properties of the algorithm in the following lemmas. Lemma 5.2. Each non-inlinable element e in XML Tree T is enqueued into queue q exactly once, and q only contains non-inlinable elements. Proof. The operation of enqueue into q is performed only at line 5 and at line 17. Line 5 enqueues the root element which is non-inlinable. Line 17 is in

13 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 13 the body of the If statement whose condition indicates that element f to be enqueued into queue q is non-inlinable. Therefore, q only contains noninlinable elements. The acyclicity of T implies that each non-inlinable element of T can be enqueued into q at most once. In addition, except the root element, the While statement (lines 15 24) will ensure that each non-inlinable element will be enqueued into q at least once in line 17. Finally, the root element is enqueued into q exactly once. Therefore, each non-inlinable element e is enqueued into q exactly once. & Lemma 5.3. Each XML element e, except the root element in XML Tree T is enqueued into queue r exactly once. Proof. Lemma 5.2 implies that each non-inlinable element e is dequeued from q exactly once (line 7), and for each such e, the While statement (lines 15 24) will enqueue each of e s descendant element f exactly once into queue r, where f satisfies the following: (1) f is e s child (line 14) or (2) f is a descendant of e, where f s parent is inlinable to e (line 21). Therefore, each element of T, except the root element, will satisfy one of these two cases for some e and, thus, will be enqueued into r at least once. The acyclicity of T implies that each element of T can be enqueued into r at most once. Therefore, each XML element in T is enqueued into r exactly once. & The following theorem demonstrates that OXInsert is an efficient linear algorithm. Theorem 5.4 (Time complexity). The time complexity of algorithm OXInsert is OðnÞ, where database schema R and s-mapping s are fixed and n is the total number of XML elements and attribute values in XML Tree T. Proof (Sketch). From Lemma 5.2, each non-inlinable element e in XML Tree T is enqueued into queue q exactly once, and q only contains noninlinable elements. Therefore, lines 7 11 will be executed exactly once for each non-inlinable element. In addition, the execution of lines 7 11 is constant when we ignore lines 05 and 06 of loadtupledataðþ procedure, whose execution time is attributed to XML attributes. From Lemma 5.3, each XML element is enqueued into queue r exactly once, thus, lines will be executed exactly once for each XML element. In addition, the execution time of line is constant when we ignore lines 05 and 06 of loadtupledataðþprocedure, whose execution time is attributed to XML attributes. In conclusion, the time complexity of OXInsert is OðnÞ. & Table 1 shows how the XML Tree given in Fig. 13 is mapped to the relational database using our data mapping algorithm OXInsert SAX-based approach DOM-based algorithms are popular because W3C adopts DOM as its standard for XML description. For a big XML file, or multiple XML files processed in multi-tasking environment, creating DOM trees is expensive. A DOM-based data mapping algorithm processes a document in two runs: in the first run, the parser browses the Table 1 The state of the database after xbib.xml is stored

14 14 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] document and creates an XML tree in the main memory. In the second run, the data mapping algorithm accesses to this DOM tree and processes it. On the other hand, SAX-based [25] data mapping approach only needs to process the document in one run. Our SAX-based data mapping algorithm, called SDM hereafter, is given in Fig. 15. The data mapping algorithm SDM takes an XML document X, a database schema R and a s-mapping s as input as described in Definition 3.3. Event-driven SDM algorithm makes a sequential scan of the whole document from top to bottom. It triggers procedures startelementðþ, charactersðþ, and endelementðþ for start tags, character data and end tags, respectively. When a start tag for an element e is encountered, SDM triggers the procedure startelementðþ. startelementðþ generates a sequential global ID (GID) for the element e. This global ID helps to maintain XML document order in the relational database. If e is a non-inlinable element, then it creates a new tuple t of table sðtypeðeþþ and starts to fill out the fields of tuple t with the information obtained from e. While it pushes element type e and its GID onto stack GST, the tuple t is pushed onto the stack ST sðtypeðeþþ to be completely filled out when all the descendants of e are processed. If e is an inlinable element, then no new tuple is created. However, the tuple on top of the stack ST sðtypeðeþþ,is updated with GID and the attribute values of e. Then, the element type of e and its GID is pushed onto the stack GST. Set-valued attributes of e are dealt with processsetattrðþ procedure as in the DOM-based algorithm OXInsert, since values of a set-valued attribute are stored into a separate table. When any character data between the start and the end tags are encountered, SDM triggers the procedure charactersðþ. Since element e on top of GST is the owner of scanned character data, these data are mapped to the tuple on top of the stack ST sðtypeðeþþ. When the end tag for element e is encountered, SDM triggers the procedure endelementðþ.if e is non-inlinable, then endelementðþpops up the tuple t from the stack ST sðtypeðeþþ and assigns GID as endid of tuple t, and inserts t into the table sðtypeðeþþ. Otherwise, it updates the tuple on top of the stack ST sðtypeðeþþ assigning the current GID as endid of e. SDM maintains a global stack, GST, and a separate stack, ST sðtypeðeþþ, for each table sðtypeðeþþ, where sðtypeðeþþ is the table corresponding to the type of e in the underlying DTD. Global stack GST keeps the parent child relationships. The stacks for tables are used to fill the required context information for a particular tuple t of table sðtypeðeþþ. SDM pushes an item to a table stack ST sðtypeðeþþ when a start tag for a non-inlinable element e is encountered. It pops up the stack ST sðtypeðeþþ when it reads the end tag of e. Hence, a table stack never grows over one item, unless there exists a descendant element which is of the same type as its ancestor (recursive XML schema). Table stacks in SDM allow processing such elements easily without interfering with the context of a pending ancestor element, which has the same type as its descendant and for which a tuple has been already created. Theorem 5.5 (Time complexity). The time complexity of algorithm SDM is OðnÞ, where n is the number of elements and attribute values in the input XML document. We skip the proof since it is trivial. 6. Experimental study We implemented ODTDMap, OXInsert and SDM algorithms in Java. We used a Pentium IV computer with 2.4 GHz processor and 1 GB main memory for the experiments. The experiments were run using Java software development kit. We minimized the usage of system resources during the experiments to get more realistic results. We ran the programs 6 times and got the average value, excluding the first run, to have more accurate results The experiment of schema mapping ODTDMap We applied ODTDMap to a set of DTDs to conduct a performance evaluation of our proposed schema mapping algorithm ODTDMap. We used 6 test DTDs from the XBench XML Benchmark [35] for our experiments. First, we identified the properties of each DTD such as the number of elements and attributes, the number of and þ operators and, etc. Then, we ran ODTDMap and measured its time for mapping the input DTD to the output database schema. The time spent is measured by running the schema mapping procedure for 1000 times to get significant results. The number of tables generated for each DTD was recorded. The experimental results are shown in Table 2.

15 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 15 Fig. 15. SAX-based data mapping algorithm SDM. While the total number of elements in 6 DTDs is 125 and, the total number of attributes is 14, the total number of tables generated for those DTDs is 23. The total number of tables is around one-sixth of the total number of elements and attributes. We observed that ODTDMap algorithm reduced the number of tables considerably in contrast to the number of elements. We observed that the running time of the ODTDMap algorithm is proportional to the number of elements in the input DTD. This is not surprising since ODTDMap algorithm visits each element only once and spends constant time on each element DOM-based versus SAX-based data mapping We chose auction:xml of the XMark benchmark [36] as our data set to compare the performance of the DOM-based algorithm OXInsert with the SAXbased algorithm SDM. We generated the test documents in six different sizes ranging from 25 to 125 MB. We constructed the XML Tree for each document using W3C s DOM specification. Our performance metric is the time to map the input XML document to the target relational data. While loading data to the database are not included in this time, the time for parsing the input XML documents is included in the measurement. The chart given in Fig. 16 shows the average time spent for each document using the two data mapping approaches. As shown in Fig. 16, SDM shows linear performance and scales very well with the size of the input XML documents while OXInsert shows linear performance up to the 75 MB document. DOM-based data mapping algorithm OXInsert has much better performance than the SAX-based

16 16 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Table 2 Experimental results of schema mapping DTD file File size # of # of # of # of þ # of Running (bytes) elements attributes operators operators tables time ðmsþ country.dtd address.dtd customer.dtd item.dtd order.dtd catalog.dtd Time (sec) OXInsert SDM Size (MB) Fig. 16. The performance of OXInsert versus SDM. algorithm SDM up to the 75 MB XML document. However, after 75 MB, the SAX-based algorithm starts to outperform as XML documents beyond 75 MB could no longer be represented as a DOM tree in the main memory in our experiments. For a large XML document whose XML tree does not fit in the main memory, part of the tree will be swapped between the disk and the main memory, causing a considerable time on I/O operations and degrading the performance of the DOM-based approach. In this case, the event-driven SAX-based approach does not suffer. We observed from our experiments that, as long as the document tree can fit in the main memory, the DOM-based approach for data mapping should be chosen. Otherwise, the SAX-based approach should be the choice for data mapping Data mapping across different schema mappings In order to study the performance of both the DOM-based data mapping algorithm OXInsert and the SAX-based algorithm SDM across various schema mapping schemes, we conducted experiments on the following three classic schema mappings [4]: Basic, which inlines a child element to its parent if the parent can contain at most one occurrence of the child. Basic creates a separate relation for each element type. Therefore, an element type might be represented in multiple relations. One disadvantage of Basic is that it might generate a large number of relations, causing low performance for some queries. Shared, which inlines a child element type to its parent if the parent can contain at most one occurrence of the child. However, to avoid the problem of Basic, each element type is represented in exactly one relation. A shared element type is always mapped to a separate table in Shared. Hybrid, which inlines the shared element types that are not reached through a -edge in addition to the inlining performed by shared inlining. This approach combines the features of both Basic and Shared. We added the support for set-valued attributes to these three schema mapping algorithms. To see the impact of inlining on data mapping performance, we did not implement the inlining feature of Basic since we already implemented the same notion of inlining in Shared. The database schema generated by Basic, Shared and Hybrid for the DTD given in Fig. 3 is shown in Fig. 17. The database schemas generated by Hybrid and Shared are the same. We used auction:xml as our data set and generated test documents of sizes from 25 to 125 MB for OXInsert and from 100 to 1 GB for SDM. OXInsert does not terminate normally for test documents beyond 125 MB due to its memory space limitation.

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

arxiv: v1 [cs.db] 8 Oct 2010

arxiv: v1 [cs.db] 8 Oct 2010 MAPPING XML DATA TO RELATIONAL DATA: A DOM-BASED APPROACH Mustafa Atay, Yezhou Sun, Dapeng Liu, Shiyong Lu, Farshad Fotouhi Department of Computer Science Wayne State University, Detroit, MI 48202 {matay,

More information

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values XML Storage CPS 296.1 Topics in Database Systems Approaches Text files Use DOM/XSLT to parse and access XML data Specialized DBMS Lore, Strudel, exist, etc. Still a long way to go Object-oriented DBMS

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

Schemaless Approach of Mapping XML Document into Relational Database

Schemaless Approach of Mapping XML Document into Relational Database Schemaless Approach of Mapping XML Document into Relational Database Ibrahim Dweib 1, Ayman Awadi 2, Seif Elduola Fath Elrhman 1, Joan Lu 1 University of Huddersfield 1 Alkhoja Group 2 ibrahim_thweib@yahoo.c

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Schema-Based XML-to-SQL Query Translation Using Interval Encoding

Schema-Based XML-to-SQL Query Translation Using Interval Encoding 2011 Eighth International Conference on Information Technology: New Generations Schema-Based XML-to-SQL Query Translation Using Interval Encoding Mustafa Atay Department of Computer Science Winston-Salem

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Navigation- vs. Index-Based XML Multi-Query Processing

Navigation- vs. Index-Based XML Multi-Query Processing Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com

More information

Labeling Dynamic XML Documents: An Order-Centric Approach

Labeling Dynamic XML Documents: An Order-Centric Approach 1 Labeling Dynamic XML Documents: An Order-Centric Approach Liang Xu, Tok Wang Ling, and Huayu Wu School of Computing National University of Singapore Abstract Dynamic XML labeling schemes have important

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining XML DTD Relational Databases for Querying XML Documents: Limitations and Opportunities Semi-structured SGML Emerging as a standard E.g. john 604xxxxxxxx 778xxxxxxxx

More information

Hierarchical Data in RDBMS

Hierarchical Data in RDBMS Hierarchical Data in RDBMS Introduction There are times when we need to store "tree" or "hierarchical" data for various modelling problems: Categories, sub-categories and sub-sub-categories in a manufacturing

More information

12 Abstract Data Types

12 Abstract Data Types 12 Abstract Data Types 12.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define the concept of an abstract data type (ADT). Define

More information

Indexing Keys in Hierarchical Data

Indexing Keys in Hierarchical Data University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees Computer Science 0 Data Structures Siena College Fall 08 Topic Notes: Trees We ve spent a lot of time looking at a variety of structures where there is a natural linear ordering of the elements in arrays,

More information

Answering Aggregate Queries Over Large RDF Graphs

Answering Aggregate Queries Over Large RDF Graphs 1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo

More information

Indexing XML documents for XPath query processing in external memory

Indexing XML documents for XPath query processing in external memory Data & Knowledge Engineering xxx (2005) xxx xxx www.elsevier.com/locate/datak Indexing XML documents for XPath query processing in external memory Qun Chen a, *, Andrew Lim a, Kian Win Ong b, Jiqing Tang

More information

Integrating Path Index with Value Index for XML data

Integrating Path Index with Value Index for XML data Integrating Path Index with Value Index for XML data Jing Wang 1, Xiaofeng Meng 2, Shan Wang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080 Beijing, China cuckoowj@btamail.net.cn

More information

The Xlint Project * 1 Motivation. 2 XML Parsing Techniques

The Xlint Project * 1 Motivation. 2 XML Parsing Techniques The Xlint Project * Juan Fernando Arguello, Yuhui Jin {jarguell, yhjin}@db.stanford.edu Stanford University December 24, 2003 1 Motivation Extensible Markup Language (XML) [1] is a simple, very flexible

More information

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while

One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while 1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling

More information

Analysis of Algorithms

Analysis of Algorithms Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and

More information

XML: Extensible Markup Language

XML: Extensible Markup Language XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

An approach to the model-based fragmentation and relational storage of XML-documents

An approach to the model-based fragmentation and relational storage of XML-documents An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Chapter 3 - Graphs Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. Directed

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

DATA STRUCTURE AND ALGORITHM USING PYTHON

DATA STRUCTURE AND ALGORITHM USING PYTHON DATA STRUCTURE AND ALGORITHM USING PYTHON Advanced Data Structure and File Manipulation Peter Lo Linear Structure Queue, Stack, Linked List and Tree 2 Queue A queue is a line of people or things waiting

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Semantics Preserving SQL-to-SPARQL Query Translation for Left Outer Join

Semantics Preserving SQL-to-SPARQL Query Translation for Left Outer Join Semantics Preserving SQL-to-SPARQL Query Translation for Left Outer Join BAHAJ Mohamed, Soussi Nassima Faculty of Science and Technologies, Settat Morocco mohamedbahaj@gmail.com sossinass@gmail.com ABSTRACT:

More information

An Algorithm for Enumerating All Spanning Trees of a Directed Graph 1. S. Kapoor 2 and H. Ramesh 3

An Algorithm for Enumerating All Spanning Trees of a Directed Graph 1. S. Kapoor 2 and H. Ramesh 3 Algorithmica (2000) 27: 120 130 DOI: 10.1007/s004530010008 Algorithmica 2000 Springer-Verlag New York Inc. An Algorithm for Enumerating All Spanning Trees of a Directed Graph 1 S. Kapoor 2 and H. Ramesh

More information

RELATIONAL STORAGE FOR XML RULES

RELATIONAL STORAGE FOR XML RULES RELATIONAL STORAGE FOR XML RULES A. A. Abd El-Aziz Research Scholar Dept. of Information Science & Technology Anna University Email: abdelazizahmed@auist.net Professor A. Kannan Dept. of Information Science

More information

XML Clustering by Bit Vector

XML Clustering by Bit Vector XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange

More information

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching

Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West

More information

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry I-Chen Wu 1 and Shang-Hsien Hsieh 2 Department of Civil Engineering, National Taiwan

More information

SFilter: A Simple and Scalable Filter for XML Streams

SFilter: A Simple and Scalable Filter for XML Streams SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,

More information

Security Based Heuristic SAX for XML Parsing

Security Based Heuristic SAX for XML Parsing Security Based Heuristic SAX for XML Parsing Wei Wang Department of Automation Tsinghua University, China Beijing, China Abstract - XML based services integrate information resources running on different

More information

Query Processing and Optimization using Compiler Tools

Query Processing and Optimization using Compiler Tools Query Processing and Optimization using Compiler Tools Caetano Sauer csauer@cs.uni-kl.de Karsten Schmidt kschmidt@cs.uni-kl.de Theo Härder haerder@cs.uni-kl.de ABSTRACT We propose a rule-based approach

More information

Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F].

Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F]. Question Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F]. a) Draw the linked node structure of L, including

More information

XML Systems & Benchmarks

XML Systems & Benchmarks XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise

More information

Graph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1

Graph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1 Graph Algorithms Chapter 22 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms? Mathematical graphs seem to be relatively specialized and abstract Why spend so much time and effort on algorithms

More information

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE TED (10)-3071 Reg. No.. (REVISION-2010) Signature. FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours (Maximum marks: 100)

More information

UserMap an Exploitation of User-Specified XML-to-Relational Mapping Requirements and Related Problems

UserMap an Exploitation of User-Specified XML-to-Relational Mapping Requirements and Related Problems UserMap an Exploitation of User-Specified XML-to-Relational Mapping Requirements and Related Problems (Technical Report) Irena Mlýnková and Jaroslav Pokorný Charles University Faculty of Mathematics and

More information

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11 !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...

More information

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this

Lecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this Lecture Notes Array Review An array in C++ is a contiguous block of memory. Since a char is 1 byte, then an array of 5 chars is 5 bytes. For example, if you execute the following C++ code you will allocate

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery

Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information

More information

Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases

Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Boon-Siew Seah 1, Klarinda G. Widjanarko 1, Sourav S. Bhowmick 1, Byron Choi 1 Erwin Leonardi 1, 1 School

More information

Semistructured Data and XML

Semistructured Data and XML Semistructured Data and XML Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Structured Data The logical models we've covered thus far all use some type of schema to define the structure

More information

Graphs. Part I: Basic algorithms. Laura Toma Algorithms (csci2200), Bowdoin College

Graphs. Part I: Basic algorithms. Laura Toma Algorithms (csci2200), Bowdoin College Laura Toma Algorithms (csci2200), Bowdoin College Undirected graphs Concepts: connectivity, connected components paths (undirected) cycles Basic problems, given undirected graph G: is G connected how many

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

CHAPTER 5 GENERATING TEST SCENARIOS AND TEST CASES FROM AN EVENT-FLOW MODEL

CHAPTER 5 GENERATING TEST SCENARIOS AND TEST CASES FROM AN EVENT-FLOW MODEL CHAPTER 5 GENERATING TEST SCENARIOS AND TEST CASES FROM AN EVENT-FLOW MODEL 5.1 INTRODUCTION The survey presented in Chapter 1 has shown that Model based testing approach for automatic generation of test

More information

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Twig Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li, Junichi Tatemura Wang-Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan NEC Laboratories

More information

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests 4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in

More information

Selectively Storing XML Data in Relations

Selectively Storing XML Data in Relations Selectively Storing XML Data in Relations Wenfei Fan 1 and Lisha Ma 2 1 University of Edinburgh and Bell Laboratories 2 Heriot-Watt University Abstract. This paper presents a new framework for users to

More information

Relational Storage for XML Rules

Relational Storage for XML Rules Relational Storage for XML Rules A. A. Abd El-Aziz Research Scholar Dept. of Information Science & Technology Anna University Email: abdelazizahmed@auist.net A. Kannan Professor Dept. of Information Science

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014

Outline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014 Outline Gerênciade Dados daweb -DCC922 - XML Query Processing ( Apresentação basedaem material do livro-texto [Abiteboul et al., 2012]) 2014 Motivation Deep-first Tree Traversal Naïve Page-based Storage

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

XML/Relational mapping Introduction of the Main Challenges

XML/Relational mapping Introduction of the Main Challenges HELSINKI UNIVERSITY OF TECHNOLOGY November 30, 2004 Telecommunications Software and Multimedia Laboratory T-111.590 Research Seminar on Digital Media (2-5 cr.): Autumn 2004: Web Service Technologies XML/Relational

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

BlossomTree: Evaluating XPaths in FLWOR Expressions

BlossomTree: Evaluating XPaths in FLWOR Expressions BlossomTree: Evaluating XPaths in FLWOR Expressions Ning Zhang University of Waterloo School of Computer Science nzhang@uwaterloo.ca Shishir K. Agrawal Indian Institute of Technology, Bombay Department

More information

Index-Driven XQuery Processing in the exist XML Database

Index-Driven XQuery Processing in the exist XML Database Index-Driven XQuery Processing in the exist XML Database Wolfgang Meier wolfgang@exist-db.org The exist Project XML Prague, June 17, 2006 Outline 1 Introducing exist 2 Node Identification Schemes and Indexing

More information

Ian Kenny. November 28, 2017

Ian Kenny. November 28, 2017 Ian Kenny November 28, 2017 Introductory Databases Relational Algebra Introduction In this lecture we will cover Relational Algebra. Relational Algebra is the foundation upon which SQL is built and is

More information

An Improvement of an Approach for Representation of Tree Structures in Relational Tables

An Improvement of an Approach for Representation of Tree Structures in Relational Tables An Improvement of an Approach for Representation of Tree Structures in Relational Tables Ivaylo Atanassov Abstract: The paper introduces an improvement of an approach for tree representation in relational

More information

XEM: XML Evolution Management

XEM: XML Evolution Management Worcester Polytechnic Institute Digital WPI Computer Science Faculty Publications Department of Computer Science 1-2002 XEM: XML Evolution Management Hong Su Worcester Polytechnic Institute Diane K. Kramer

More information

8. Write an example for expression tree. [A/M 10] (A+B)*((C-D)/(E^F))

8. Write an example for expression tree. [A/M 10] (A+B)*((C-D)/(E^F)) DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING EC6301 OBJECT ORIENTED PROGRAMMING AND DATA STRUCTURES UNIT IV NONLINEAR DATA STRUCTURES Part A 1. Define Tree [N/D 08]

More information

ACONCURRENT system may be viewed as a collection of

ACONCURRENT system may be viewed as a collection of 252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 3, MARCH 1999 Constructing a Reliable Test&Set Bit Frank Stomp and Gadi Taubenfeld AbstractÐThe problem of computing with faulty

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 4 Graphs Definitions Traversals Adam Smith 9/8/10 Exercise How can you simulate an array with two unbounded stacks and a small amount of memory? (Hint: think of a

More information

An Appropriate Search Algorithm for Finding Grid Resources

An Appropriate Search Algorithm for Finding Grid Resources An Appropriate Search Algorithm for Finding Grid Resources Olusegun O. A. 1, Babatunde A. N. 2, Omotehinwa T. O. 3,Aremu D. R. 4, Balogun B. F. 5 1,4 Department of Computer Science University of Ilorin,

More information

Data Structures Question Bank Multiple Choice

Data Structures Question Bank Multiple Choice Section 1. Fundamentals: Complexity, Algorthm Analysis 1. An algorithm solves A single problem or function Multiple problems or functions Has a single programming language implementation 2. A solution

More information

Module 4. Implementation of XQuery. Part 2: Data Storage

Module 4. Implementation of XQuery. Part 2: Data Storage Module 4 Implementation of XQuery Part 2: Data Storage Aspects of XQuery Implementation Compile Time + Optimizations Operator Models Query Rewrite Runtime + Query Execution XML Data Representation XML

More information

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017 12. Graphs and Trees 2 Aaron Tan 6 10 November 2017 1 10.5 Trees 2 Definition Definition Definition: Tree A graph is said to be circuit-free if, and only if, it has no circuits. A graph is called a tree

More information

XML publishing. Querying and storing XML. From relations to XML Views. From relations to XML Views

XML publishing. Querying and storing XML. From relations to XML Views. From relations to XML Views Querying and storing XML Week 5 Publishing relational data as XML XML publishing XML DB Exporting and importing XML data shared over Web Key problem: defining relational-xml views specifying mappings from

More information

Graph Representations and Traversal

Graph Representations and Traversal COMPSCI 330: Design and Analysis of Algorithms 02/08/2017-02/20/2017 Graph Representations and Traversal Lecturer: Debmalya Panigrahi Scribe: Tianqi Song, Fred Zhang, Tianyu Wang 1 Overview This lecture

More information

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices. Trees Trees form the most widely used subclasses of graphs. In CS, we make extensive use of trees. Trees are useful in organizing and relating data in databases, file systems and other applications. Formal

More information

A Connection between Network Coding and. Convolutional Codes

A Connection between Network Coding and. Convolutional Codes A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source

More information

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems Query Processing: A Systems View CPS 216 Advanced Database Systems Announcements (March 1) 2 Reading assignment due Wednesday Buffer management Homework #2 due this Thursday Course project proposal due

More information

Depth-First Search Depth-first search (DFS) is another way to traverse the graph.

Depth-First Search Depth-first search (DFS) is another way to traverse the graph. Depth-First Search Depth-first search (DFS) is another way to traverse the graph. Motivating example: In a video game, you are searching for a path from a point in a maze to the exit. The maze can be modeled

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Trees. Carlos Moreno uwaterloo.ca EIT https://ece.uwaterloo.ca/~cmoreno/ece250

Trees. Carlos Moreno uwaterloo.ca EIT https://ece.uwaterloo.ca/~cmoreno/ece250 Carlos Moreno cmoreno @ uwaterloo.ca EIT-4103 https://ece.uwaterloo.ca/~cmoreno/ece250 Today's class: We'll discuss one possible implementation for trees (the general type of trees) We'll look at tree

More information

Ecient XPath Axis Evaluation for DOM Data Structures

Ecient XPath Axis Evaluation for DOM Data Structures Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be

More information

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures. Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK CONVERTING XML DOCUMENT TO SQL QUERY MISS. ANUPAMA V. ZAKARDE 1, DR. H. R. DESHMUKH

More information

[ DATA STRUCTURES ] Fig. (1) : A Tree

[ DATA STRUCTURES ] Fig. (1) : A Tree [ DATA STRUCTURES ] Chapter - 07 : Trees A Tree is a non-linear data structure in which items are arranged in a sorted sequence. It is used to represent hierarchical relationship existing amongst several

More information

XML-Relational Mapping. Introduction to Databases CompSci 316 Fall 2014

XML-Relational Mapping. Introduction to Databases CompSci 316 Fall 2014 XML-Relational Mapping Introduction to Databases CompSci 316 Fall 2014 2 Approaches to XML processing Text files/messages Specialized XML DBMS Tamino(Software AG), BaseX, exist, Sedna, Not as mature as

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information