Efficient schema-based XML-to-Relational data mapping
|
|
- Joleen Hodge
- 5 years ago
- Views:
Transcription
1 Information Systems ] (]]]]) ]]] ]]] Efficient schema-based XML-to-Relational data mapping Mustafa Atay, Artem Chebotko, Dapeng Liu, Shiyong Lu, Farshad Fotouhi Department of Computer Science, Wayne State University, Detroit, MI 48202, USA Received 2 March 2005; received in revised form 4 December 2005; accepted 15 December 2005 Recommended by: Prof. J. Van den Bussche Abstract Storing and querying XML documents using a RDBMS is a challenging problem since one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: schema mapping, data mapping and query mapping. In this paper, we propose: (i) a lossless schema mapping algorithm to generate a database schema from a DTD, which makes several improvements over existing algorithms, (ii) two linear data mapping algorithms based on DOM and SAX, respectively, to map ordered XML data to relational data. To our best knowledge, there is no published linear schema-based data mapping algorithm for mapping ordered XML data to relational data. Experimental results are presented to show that our algorithms are efficient and scalable. r 2006 Elsevier B.V. All rights reserved. Keywords: XML; Relational; Schema-based; Ordered; Mapping; Shredding 1. Introduction XML has emerged as a standard for representing and exchanging data over the World Wide Web. The increasing amount of XML documents requires the need to store and query XML documents efficiently. Numerous researchers have proposed using relational databases to store and query XML documents [1 9]. The main challenge of this relational approach is that one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the flat, unordered Corresponding author. Tel.: ; fax: addresses: matay@wayne.edu (M. Atay), artem@wayne.edu (A. Chebotko), dliu@wayne.edu (D. Liu), shiyong@wayne.edu (S. Lu), fotouhi@wayne.edu (F. Fotouhi). nature of the relational data model. This conflict can be resolved by the following XML-to-Relational mappings: Schema mapping: Either a fixed generic database schema (schema-oblivious XML storage) is used, or a database schema is generated from an XML schema or DTD (schema-based XML storage) for the storage of XML documents. To support the ordered nature of the XML data model, an order encoding scheme such as those proposed in [8] can be used and additional columns are introduced to store the ordinals of XML elements. Data mapping, which shreds an input XML document into relational tuples and inserts them into the relational database whose schema is generated in the schema mapping phase /$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi: /j.is
2 2 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Query mapping, which translates an XML query into its relational equivalent (i.e. SQL statements or relational algebra expressions), executes them against the database and returns the query result to the user. If the query result is to be returned as XML documents, then a reconstruction algorithm [10] is needed to reconstruct the XML subtrees rooted at the matching nodes. While existing work has focused on the problems of schema mapping [1 7,9] and query mapping [8,11 19], there is no published linear schema-based data mapping algorithm for mapping ordered XML documents to relational data. Firstly, the schemaoblivious storage schemes [1 3,16] use a simple, fixed database schema for XML storage, and the data mapping problem in this context has been addressed by Grust et al. in [20]. Secondly, while the schema-based storage schemes [4 7,9] have presented different strategies to generate a good database schema from an XML schema, there has been no published work presenting algorithms for mapping XML documents to relational data that will fit into the generated database schema and preserve the XML document order. Tatarinov et al. [8] focus on the investigation of three order encoding schemes for storing and querying XML documents. Although it presents a brief discussion of schema-based order-preserving schema mapping, no algorithmic details are given for the schemabased data mapping. Thirdly, existing works on query mapping [8,11 15,17 19] assume that the database has already been populated with XML documents, and no algorithms have been published for shredding XML documents into relational data in the context where the database schema is generated from an XML schema. The data translation algorithm presented in [21] does not support recursive XML schemas and does not consider the ordered nature of XML documents. The data loading algorithms defined in [16,20] support the schema-oblivious storage scheme and use a SAX-based approach. Finally, our previous data mapping algorithm presented in [22] is not order-preserving and uses only a DOM-based approach. Since the target database schema might be complex and its corresponding XML-to-Relational schema mapping is non-trivial, it is challenging to design an efficient schema-based data mapping algorithm. This is one major motivation of our research. The main contributions of this paper are: 1. We propose a schema mapping algorithm, ODTDMap, which generates a database schema from an XML DTD for storing and querying ordered XML documents. Although the main idea of ODTDMap is similar to the shared inlining algorithm [4,8] and its variant [9], ODTDMap makes several improvements over them as discussed at the end of Section We propose an efficient DOM-based linear data mapping algorithm, OXInsert, which shreds and composes input XML documents into relational tuples and inserts them into the relational database according to the schema generated by ODTDMap. OXInsert is based on our previous data mapping algorithm XInsert [22], but it takes into account the ordered nature of the input XML documents and set-valued attributes that were not considered by XInsert. 3. We propose an efficient and linear SAX-based data mapping algorithm, SDM, which shreds and composes ordered XML documents into relational tuples and inserts them into the relational database according to the schema generated by ODTDMap. Our experimental study shows that the proposed algorithms ODTDMap, OXInsert, and SDM are efficient and scalable. We show that our data mapping algorithms OXInsert and SDM are efficient under different schema mapping algorithms other than ODTDMap in the experimental study. Although query mapping is an essential part of a complete mapping scheme, mapping XML queries into their SQL counterparts is not the focus of this paper. We refer the interested readers to recently proposed query mapping algorithms [8,11,12,14,15,17 19]. We assume the reader is familiar with XML [23] and its related technologies, such as DTD [23], DOM [24] and SAX [25]. Organization: The rest of the paper is organized as follows. Section 2 presents an overview of related work. The formalization of a schema-based relational XML storage system is given in Section 3. Section 4 gives a brief description of our schema mapping algorithm ODTDMap. Section 5 identifies the main issues for data mapping and describes our proposed data mapping algorithms OXInsert and SDM. Section 6 presents an experimental study of the time performance of ODTDMap, OXInsert and SDM algorithms. Finally, Section 7 concludes the paper and points out some potential future work.
3 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 3 2. Related work Three major approaches have been proposed for storing and querying XML data. The first approach is to develop native XML databases that support the XML data model and XML query languages directly. This includes Software AG s Tamino XML Server [26], IXIA s TEXTML Server [27], Sonic Software s extensible Information Server [28] (formerly excelon s XIS) and MODIS s Sedna Native XML DBMS [29]. The advantage of this native approach is that XML data can be stored and retrieved in their original formats and no additional mappings or translations are needed. Furthermore, most native XML databases have the ability to perform sophisticated full-text searches including full thesaurus support, word stubbing (to match all forms of a word: run, ran, running) and proximity searches. The disadvantage is that due to the document-centric nature of these databases, complex searches or aggregations might be cumbersome. The second approach is to use existing mature technologies, such as relational DBMSs or objectoriented DBMSs, to store and query XML data [1 9]. The main challenge of this approach is that one needs to resolve the conflict between the hierarchical, ordered nature of the XML data model and the target data model. This usually requires various mappings such as schema mapping, data mapping and query mapping to be performed between the two data models. Therefore, the main issue is to develop efficient algorithms to perform these mappings. This approach includes two categories of methods: schema-oblivious XML storage [1 3,16], which uses a fixed generic database schema for XML storage, and schema-based XML storage [4 7,9], which uses a database schema generated from an XML schema for XML storage. The third approach is to use the XML support enabled by commercial database systems. Currently, most major databases, such as SQL Server [30], Oracle [31] and DB2 [32], provide mechanisms to store and query XML data by extending the existing data model with an additional XML data type (e.g., XMLType in Oracle 10g) so that a column of this data type can be defined and used to store XML data. In addition, a set of methods is associated with this new XML data type to process, manipulate and query stored XML data. As discussed above, these approaches have their pros and cons, and the choice has to be made based on the requirement of the application at hand and the advancement of these approaches at the time that the choice has to be made. Readers are referred to an evaluation study of alternative XML storage strategies [33] for more details. 3. Schema-based relational XML storage system Our schema-based relational XML storage system contains two major components: 1. Schema mapping, which takes an XML DTD as input, and outputs a database schema and a s- mapping, which assigns each element/attribute in the DTD to the relation in which the element/ attribute is going to be stored. 2. Data mapping, which takes a valid XML document and the output of a schema mapping as input, shreds the XML document into relational tuples, and inserts them into the relational database. In the following, we formalize the notions of s- mapping, schema mapping and data mapping, respectively: Definition 3.1 (s-mapping). Given a DTD D with element-type set E and attribute-type set A, and a database schema R, a s-mapping is a function s : ðe [ AÞ! R, such that given an attribute/elementtype e 2ðA [ EÞ, sðeþ is the relation in which the instances of e will be stored. Definition 3.2 (Schema mapping). A schema mapping is a function SM that assigns to each DTD D a pair ðr; sþ to store the XML documents conforming to D, where R is a database schema and s is a s- mapping over R. Definition 3.3 (Data mapping). A data mapping DM is a function that assigns to each triple ðx; R; sþ a set of relational tuples T, where X is a valid XML document, R is a database schema, s is a s-mapping over R, and T is the result of shredding X into relational tuples according to the layout described by R and s. 4. Schema mapping algorithm ODTDMap In this section, we propose our schema mapping algorithm, ODTDMap, which generates a database schema from an XML DTD for storing and querying ordered XML documents. Several
4 4 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] approaches exist in the literature. One approach is to map each DTD element to a separate table [4]. The drawback of this approach is that it might result in too many tables and thus expensive join of multiple tables for a query. Another approach is to map all DTD elements into a single fixed table [2]. This approach might result in a large table and expensive self-join of the table for a query. A better approach, which our ODTDMap algorithm takes, is to map a child and its parent to the same table when the child appears at most once under its parent. This operation is called inlining and was first introduced in [4]. The inlining approach reduces the number of tables in the generated database schema and thus the number of joins for a query. Our ODTDMap algorithm is shown in Fig. 1.Itis inspired by the shared inlining algorithm introduced in [4]. However, we made several improvements over it which are described in Section 4.4. The ODTDMap algorithm consists of the following three main steps: 1. Simplifying DTD: Since a DTD expression might be very complex due to its hierarchical nesting capability, this step greatly simplifies the mapping procedure. 2. Creating and inlining DTD graph: We create the corresponding DTD graph based on the simplified DTD, and then inline as many descendant nodes as possible to a parent node in the DTD graph. Thus, all descendants of an XML element e which occur at most once under e will be mapped to the same relation with e. 3. Generating database schema and s-mapping: After a DTD graph is inlined, we generate a database schema and s-mapping based on the inlined DTD graph. The section ends with a discussion on the improvements we made over existing schema mapping algorithms. 00 Algorithm ODTDMap 01 Input: DTD D 02 Output: Database Schema R, σ-mapping σ 03 Begin 04 Simplify the DTD D 05 Create the DTD graph G 06 IG = Inline(G) //create the inlined DTD graph 07 GenerateRelSigma(IG) //generate the relations and σ-mapping 08 End Fig. 1. Schema mapping algorithm ODTDMap Simplifying DTDs DTDs, in general, can be complex and generating database schemas for these DTDs can be an awkward task. The first step in our schema mapping algorithm is to simplify a DTD into a canonical form such that it can easily be translated into a database schema which will be able to store the XML documents conforming to the original, unsimplified DTD. The occurrence operators in a DTD can be classified into two groups based on the underlying relationship between parent and child elements: (i) operators that lead to a one-to-one relationship: {?,, }, (ii) operators that lead to a one-to-many relationship: { þ, }. It is sufficient to generate a complete relational schema for the given DTD if we can distinguish between those two relationship groups. Thus, we can replace the first operator in each group with the second one which results in reducing the types of occurrence operators from four to two. Although the processing of the choice operator j seems to be a problematic issue in the schema mapping process, we can deal with it easily. Let us consider the following DTD expression: h!element a ðb j cþi. The element a can contain elements b or c but not both at the same time. However, we can introduce columns b and c together in the table corresponding to element a. During the data mapping phase, if a contains child b, then we assign null to c column and vice versa. Thus, there is not much difference between the given DTD expression and h!element a ðb; cþi regarding the target database schema. We define a set of transformation rules in Fig. 2 to transform a DTD into a canonical form. Example 4.1. Using the simplification rules shown in Fig. 2, one can transform h!element a ððb þ ; c ; d?þ?; ðe?; f; ðg ; h?þ þ Þ?Þi to a simplified version h!element a ðb ; c ; d; e; f; g ; h Þi. The following DTD expressions are the ones which are changed as a result of applying the simplification rules given in Fig. 2 to the DTD shown in Fig. 3: h!element book ðtitle, author ; chapter ; citationþi h!element section ðparagraph ; section Þi
5 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 5 Fig. 2. DTD simplification rules. While the above simplification procedure maintains the parent child relationships, it does not maintain the document order. However, we introduce additional ordinal attributes to record the order of the document. Thus, any XML query, including the ones which require the document order information, can be evaluated over the generated database schema. Our set of rules is essentially an improvement of the transformation rules defined in the shared inlining algorithm [4]. Our set of rules is complete since we consider all possible combinations of operators and XML elements, whereas the shared inlining algorithm only lists some important combinations. For example, there is no rule that corresponds to ðe 1 jje n Þ? in the shared inlining algorithm Creating and inlining DTD graphs In this step, we create the corresponding DTD graph based on the simplified DTD and do the inlining operation on the DTD graph. The notion of the DTD graph is defined as follows: Fig. 3. Sample XML DTD xbib.dtd. Our simplification rules will transform complex DTD expressions into a flat canonical form as it loosens some DTD constraints. However, the DTD simplification procedure will preserve sufficient information to generate a database schema with necessary tables and columns to store XML data. The actual constraint information can be derived from the original DTD and introduced to the database schema by revisiting the original DTD later. Interested readers are referred to [5,34] where capturing semantic knowledge from a DTD and introducing it to a database schema through semantic constraints are discussed in detail. Two pieces of information are essential for the reconstruction of an XML document from its relational representation and for answering XML queries against the relational storage of an XML document: (1) the parent child relationships between XML elements and (2) the document order. Definition 4.2 (DTD graph). The structure of a DTD D can be represented by a directed graph G ¼ðV; EÞ, where V is the set of vertices and E is the set of edges. The vertices represent elements and attribute types in D, and the edges represent their parent child relationships. Each vertex is labeled with the name of the corresponding element or attribute type. An edge is labeled by if it is incident to a vertex which can appear more than once under its parent in the corresponding XML document, otherwise no label is used. For example, the DTD graph shown in Fig. 4 corresponds to the simplified form of DTD given in Fig. 3. While each element appears only once in the DTD graph, attributes appear as many times as they appear in the DTD. Node identifiers for the attributes in the DTD graph are preceded by 1 For a set-valued attribute such as IDREFS or NMTOKENS, the edge between the set-valued attribute and its parent (the owner element) is labeled by in the DTD graph. Thus, we can 1 In implementation, to ensure the uniqueness of attribute names, we can use the concatenation of an attribute name and its owner element name as the attribute identifier. For attribute of element book can have a label book.id.
6 6 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] xbib title, citation} author {fname, lname } paragraph section manage the set-valued attributes easily in further steps. In Fig. 4, cites is a set-valued attribute. Therefore, its incoming edge is labeled with. After we create the DTD graph for the simplified DTD, we inline as many descendant elements to an element as possible. The rationale is that these inlined elements will eventually produce a single relation. Therefore, we only inline a child c to a parent p when p can contain at most one occurrence of c in order to avoid introducing redundancy into the generated relation. After the simplification procedure, any input DTD is now in a canonical form, i.e., each DTD expression is a tuple of distinct element names or their stars (). As a result, in the corresponding DTD graph, an edge is labeled by a star () if the edge is leading to an element with a and no label is put otherwise. Thus, if an edge has a as its label, we call it a star edge, otherwise, we call it a normal edge. We define the notion of an inlinable node, an inlinable subtree and a shared node in a DTD graph as follows: @publisher author title chapter fname lname section paragraph Fig. 4. DTD graph of xbib.dtd. Fig. 5. Inlined DTD graph of xbib.dtd. Definition 4.3 (Inlinable node). Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge. Definition 4.4 (Inlinable subtree). Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree rooted at e. This subtree is called the inlinable subtree for the node e. Definition 4.5 (Shared node). Given a DTD graph, a node is called a shared node if it has more than one incoming edge. Using our inlining procedure, the DTD graph shown in Fig. 4, will be transformed into the inlined DTD graph shown in Fig. 5. Our inlining procedure considers the following three cases which are illustrated in Fig Case 1: Node a is connected to node b by a normal edge and b has no other incoming edges. In this case, a can contain at most one occurrence of b, and we combine node b into a while maintaining the parent child relationships between b and its children. 2. Case 2: Node a is connected to node b by a normal edge and b has other incoming edges. In this case, we do not combine b into a since b has multiple parents. 3. Case 3: Node a is connected to node b by a star edge. In this case, each a can contain multiple occurrences of b, and we do not combine b into a. Only Case 1 allows us to inline an element to its parent. While Case 2 does not allow inlining due to a shared node, Case 3 does not allow inlining to avoid redundancy due to the multiple occurrences of a child element in its parent caused by the operator. Example 4.6. In Fig. 7A, nodes b and d are inlinable but nodes a and c are not inlinable. The inlinable subtree for a contains nodes a and b, whereas the inlinable subtree for c contains nodes c and d. In m a b n m a,b n m a n Case 1 Case 2 Case 3 Fig. 6. Three cases for inlining. b a b
7 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 7 a d a, b c, d b c (A) (B) a b c g a, b, c, d g d e (C) f (D) e, f Fig. 7. Inlining DTD graphs. Fig. 7C, nodes b d and f are inlinable, but nodes a, e and g are not inlinable. The inlinable subtree for a contains nodes a d, and the inlinable subtree for node e contains nodes e and f. While there is no shared node in Fig. 7A, the only shared node in Fig. 7C is node e. The DTD graph shown in Fig. 7A will be inlined into one shown in Fig. 7B, and the DTD graph shown in Fig. 7C will be inlined into one shown in Fig. 7D. The notion of the inlinable subtree formalizes the intuition of inlining as many descendant elements as possible to an element. We illustrate our inlining algorithm in pseudocode in Fig. 8. Essentially, it uses a depth-first search strategy to identify the inlinable subtree for each node and then inline that subtree to its root. A field inlinedset of set type is introduced for each node e to represent the set of nodes that has been inlined to this node e (initially e:inlinedset ¼fg). For example, in Fig. 7C, after the inlining procedure, a:inlinedset ¼fb; c; dg. The algorithm is efficient as indicated by the following theorem. Theorem 4.7 (Time complexity). The time complexity of our inlining algorithm is OðnÞ, where n is the number of nodes in the input DTD graph. Proof. This is obvious since each node of the DTD graph is visited at most once. & Fig. 8. The inlining procedure Generating database schema and s-mapping After a simplified DTD graph is inlined, the last step is to generate a database schema based on this inlined DTD graph and generate the schema mapping information which will be used in the data mapping process later. The procedure to generate the database schema and s-mapping is given in Fig. 9. For each node e in the inlined DTD graph, a relation e is generated. Basically, in the generated database schema, we associate each element e with a unique ID. We also introduce a unique f :ID for each element type f in the inlined set of e. The rationale behind introducing an ID or f.id for each element is to be able to store the order of XML elements in the relational tables. It is mentioned in [8] that no ordinal ID will be required for inlined elements. However, as we will show in Section 4.4, such a mapping scheme is lossy. Our mapping scheme is lossless and stores sufficient information in the relational database to reconstruct the original XML document. A complete proof, which shows that our mapping scheme is lossless, is in [10]. Attribute parentid is introduced for each noninlinable element to preserve the parent child relationship and, thus, the tree structure of an XML document. We do not need to introduce an
8 8 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Fig. 9. The database schema and s-mapping generation procedure. Fig. 10. Database schema for xbib.dtd. attribute parentid for inlinable elements since they are stored in the same tuple with their parent. To facilitate the processing of recursive XML queries (queries with == axis), each element e is associated with an attribute endid, which stores the maximum ID of the descendants of e. 2 We introduce f.endid for each element type f in the inlined set of e for the same purpose. We introduce attribute parenttype if the node in the inlined DTD graph has more than one parent (shared node). Thus, the attribute parenttype facilitates efficient selection of descendants of a particular parent. A column e is introduced in the database schema for each non-inlinable leaf element type to store its textual content. Similarly, column f is introduced 2 Leaf elements have the same ID and endid values. As such, we can omit the endid to save space. for each leaf element or attribute type f in the inlined set of e. Obviously, if the element type is EMPTY, we do not introduce such a column. The database schema shown in Fig. 10 is generated for the inlined DTD graph given in Fig. 5 by the schema generation procedure explained above. After generating the database schema, both the database schema and the s-mapping that maps element and attribute types to the relational schemas in which they should be stored are output. This output is used by our data mapping algorithms in Section 5 to actually shred XML documents into relational tuples Discussion Although the main idea of ODTDMap is similar to existing algorithms [4,8,9], ODTDMap made
9 several improvements over them: M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 9 Recursion: The standard shared-inlining algorithm [4] defines a rule to deal with two mutually recursive elements. It is not clear how to handle a DTD with a cycle consisting of more than two elements. ODTDMap can handle arbitrary cycles in DTDs by just checking the incoming edges to the current node irrespective of other nodes in the DTD graph. Therefore, ODTDMap deals with cycles naturally without requiring an explicit check of the existence of cycles which is required by the standard shared-inlining algorithm. Losslessness: ODTDMap is lossless in the sense that the generated database schema can store enough structural information to reconstruct the original XML documents and support the storage and query of ordered XML documents. We are able to reconstruct the original XML document in the given document order. In contrast, the shared-inlining algorithm [4] and its variant [9] do not support the ordered nature of XML documents. Although [8] proposes the Global, Local and Dewey Order schemes and discusses their applications to the schema-less case, no details are presented for the schemabased case. The authors suggest that there is no need to have a separate column for storing the order information of inlined elements, since the position of such elements can be determined from the position of their parent element and the document schema. This is not true. For example, consider the DTD and the sample XML document shown in Fig. 11A and B, respectively. The ordered shared-inlining will create the database shown in Fig. 11C, in which the order information of the inlined element C is lost, and there is no way to determine whether the element B comes before or after element C; therefore, the original XML document cannot be reconstructed. On the other hand, our ODTDMap will create a database shown in Fig. 11D, where we associate an ID with the inlined element C as well. Thus, it will support the reconstruction of the original XML document. Efficient support for XML queries: To facilitate the processing of XML queries, each non-leaf element e is associated with an endid which stores the maximum ID of the descendants of e. In this way, one can efficiently identify all the following and preceding elements of a given element as well as its descendants. (A) (C) Set-valued attributes: Existing schema mapping algorithms [4,8,9] have not considered set-valued attributes such as IDREFS and NMTOKENS. In ODTDMap, we connect a set-valued attribute to its owner element with a star edge in the DTD graph and map it to a separate relation (see how the cites attribute in Fig. 3 is mapped). 5. Data mapping As the target database schema might be complex and its corresponding XML-to-Relational schema mapping is non-trivial, it is challenging to design an efficient schema-based data mapping algorithm. The main challenging issues include the following: Varying document structure: XML documents have varying structures due to the optional occurrence operators?,, and choice operator j used in the underlying DTD, unlike relational tables which always have a fixed structure. For example, in the XML document tree given in Fig. 13, which corresponds to the sample XML document shown in Fig. 12, the nodes with ordinal numbers 10 and 16 are of the same element type. However, their subtrees are quite different. While there is no paragraph node among the child nodes of node 10, there is no section node among the child nodes of node 16. A data mapping algorithm should keep track of the missing child nodes and handle structural differences between the same type of element nodes due to the optional operators using efficient data structures. Scalability: In an online environment, where new XML documents might be inserted into the database on-the-fly, a data mapping algorithm (B) (D) Fig. 11. A lossy versus a lossless mapping.
10 10 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Fig. 12. A sample XML document xbib.xml. will be used frequently. Thus, it is critical that a data mapping algorithm is efficient and scales well with the size of XML documents. It is obvious that a linear data mapping algorithm will fulfill this requirement the best. In the following sections, we present two data mapping algorithms, DOM-based OXInsert and SAX-based SDM to address these issues. An appropriate ordering technique is needed to keep the ordered XML documents in the unordered structure of relational tables. Several order encoding methods are proposed in [8]. Their experimental results show that the global order encoding performs the best on query intensive workloads. However, our data mapping algorithms can be easily adapted to other order encoding schemes proposed in [8] DOM-based approach We use a tree data model to represent the XML documents since each valid XML document is rooted at a unique element which is specified by DOCTYPE declaration in the DTD. We first introduce our XML Tree data model, which is based on W3C s Document Object Model (DOM) [24]. The details of our XML document model are given in Definition 5.1. Definition 5.1 (XML Tree). We model an XML document D as an XML element tree (XML Tree) T, in which nodes represent XML elements and edges represent parent child relationships between XML elements. The XML Tree T is an ordered tree and its nodes can have attributes and values associated with them. The root of XML Tree T is denoted by T:root. For each element node e in T,we use the following notations: e:name, the name of XML element e. e:eid, the global ID of XML element e which is given based on the pre-order tree traversal. e:endid, the largest descendant ID of node e and e:id ¼ e:endid if e is a leaf node in T. e:attributes, the set of XML attributes of e. We also denote the attributes of e by e:a 1 ;...; e:a n and the names and values of these attributes by e:a i :name and e:a i :value, respectively (i ¼ 1;...; n). e:value, the value of e, where e:value ¼ NULL if e is a non-leaf node. e:parent, the parent node of e, where e:parent ¼ NULL if e is the root node of T. e:children, the ordered sequence of child nodes of e, and e:children ¼ NULL if e is a leaf n ode of T. We also denote the children of e by e:c 1 ;...; e:c m.
11 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 11 xbib(1) book (2) title (3) author (4) author (7) chapter (10) chapter (16) fname (5) lname (6) fname (8) lname (9) section (11) paragraph (17) paragraph (18) section (12) section (14) paragraph (13) paragraph (15) Fig. 13. XML Tree of xbib.xml. The XML Tree data model has some distinctions from W3C s DOM specification. In contrast to traditional XML DOM tree, the XML Tree does not consider XML PCDATA values as nodes but consider them as data fields of XML element nodes. It has an ID field for each node which is assigned based on the pre-order tree traversal as an XML document is being parsed. Besides an ID field, each node is assigned an endid field which denotes the largest descendant ID of that node. This distinction is only for the convenience of presentation; thus, the algorithm proposed in this paper can be implemented directly on the standard DOM model. The XML Tree for the XML document shown in Fig. 12 is illustrated in Fig. 13. In an XML Tree, each node e is labeled by e.name(e.eid, e.endid, e.value, e:a 1 :name ¼ e:a 1 :value;...; e:a n :name ¼ e:a n :value) and e:value is omitted when e is a non-leaf node where e:value ¼ NULL. However, in Fig. 13, we just include e.name and e.eid for simplicity. We differentiate an element node e in an XML Tree from its corresponding type in the DTD which is denoted by typeðeþ. For example, we use the expression sðtypeðeþþ to find the corresponding table for e. Our DOM-based data mapping algorithm OXInsert is shown in Fig. 14. We design OXInsert as an iterative algorithm. The documents conforming to a DTD might be nested with arbitrary depth if the input DTD is recursive (cyclic XML schema). One concern of a recursive data mapping algorithm might be memory space requirement as a result of numerous recursive calls. Therefore, we avoid using a recursive design for OXInsert algorithm. The main idea of OXInsert is that it uses queue q to process all non-inlinable XML elements, and for each such element e, it uses queue r to process all XML elements that are inlinable to e. Lines process each non-inlinable XML element e dequeued from q. In particular, a tuple tp is created in the table corresponding to typeðeþ denoted by sðtypeðeþþ. The data values of node e are retrieved and loaded to the corresponding fields of tuple tp in procedure loadtupledataðþ (line 09). Set-valued attributes are dealt with processsetattrðþ procedure where the values of a set-valued attribute are stored in a separate table. Note that we deal with the issue of varying document structure elegantly: on one hand, all missing nodes will have NULL values in their corresponding columns as they are all initialized to NULL. The corresponding column of a node is filled with a value only when the node is present. On the other hand, for two elements of the same type, even though the structures of their subtrees might vary, we process each of their descendants using the s-mapping in a consistent and correct manner. Since the information of inlinable elements are stored in the same tuple as their parents, for each non-inlinable element e, we need to retrieve the data values of the elements that are inlinable to e. This is achieved by using another queue r to process the descendants of e, which are inlinable to e in lines During this process, if we encounter any non-inlinable element, it will be enqueued into q for further processing (line 17). For each element f that is inlinable to e, we fill appropriate fields of the tuple tp corresponding to e with the data values retrieved from node f in procedure loadtupledataðþ. The set-
12 12 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Fig. 14. DOM-based data mapping algorithm OXInsert. valued attributes of f are dealt with processsetattrðþ procedure where the values of a set-valued attribute are stored in a separate table. The descendants of f are enqueued into r for further processing (lines 20 22). Procedure loadtupledataðþ retrieves the data of any node n and loads it to the tuple tp. Parameter prefix helps to overcome the difference in relational attribute names of non-inlinable and inlinable nodes in XML Tree T. The shredding of set-valued attributes of node n is processed by procedure processsetattrðþ. Procedure processsetattrðþ processes the setvalued attribute e:a of a particular element e. Each such attribute is mapped to a separate table, which is denoted by sðtypeðe:aþþ, unlike a single-valued attribute which is mapped to the same table with its owner element. A tuple with a sequential index ID, which is disjoint from the IDs in the XML tree, a parent ID and a value is inserted for each value of the set-valued attribute e:a to the table sðtypeðe:aþþ (lines 3 7). To analyze the time complexity of algorithm OXInsert, we first present some properties of the algorithm in the following lemmas. Lemma 5.2. Each non-inlinable element e in XML Tree T is enqueued into queue q exactly once, and q only contains non-inlinable elements. Proof. The operation of enqueue into q is performed only at line 5 and at line 17. Line 5 enqueues the root element which is non-inlinable. Line 17 is in
13 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 13 the body of the If statement whose condition indicates that element f to be enqueued into queue q is non-inlinable. Therefore, q only contains noninlinable elements. The acyclicity of T implies that each non-inlinable element of T can be enqueued into q at most once. In addition, except the root element, the While statement (lines 15 24) will ensure that each non-inlinable element will be enqueued into q at least once in line 17. Finally, the root element is enqueued into q exactly once. Therefore, each non-inlinable element e is enqueued into q exactly once. & Lemma 5.3. Each XML element e, except the root element in XML Tree T is enqueued into queue r exactly once. Proof. Lemma 5.2 implies that each non-inlinable element e is dequeued from q exactly once (line 7), and for each such e, the While statement (lines 15 24) will enqueue each of e s descendant element f exactly once into queue r, where f satisfies the following: (1) f is e s child (line 14) or (2) f is a descendant of e, where f s parent is inlinable to e (line 21). Therefore, each element of T, except the root element, will satisfy one of these two cases for some e and, thus, will be enqueued into r at least once. The acyclicity of T implies that each element of T can be enqueued into r at most once. Therefore, each XML element in T is enqueued into r exactly once. & The following theorem demonstrates that OXInsert is an efficient linear algorithm. Theorem 5.4 (Time complexity). The time complexity of algorithm OXInsert is OðnÞ, where database schema R and s-mapping s are fixed and n is the total number of XML elements and attribute values in XML Tree T. Proof (Sketch). From Lemma 5.2, each non-inlinable element e in XML Tree T is enqueued into queue q exactly once, and q only contains noninlinable elements. Therefore, lines 7 11 will be executed exactly once for each non-inlinable element. In addition, the execution of lines 7 11 is constant when we ignore lines 05 and 06 of loadtupledataðþ procedure, whose execution time is attributed to XML attributes. From Lemma 5.3, each XML element is enqueued into queue r exactly once, thus, lines will be executed exactly once for each XML element. In addition, the execution time of line is constant when we ignore lines 05 and 06 of loadtupledataðþprocedure, whose execution time is attributed to XML attributes. In conclusion, the time complexity of OXInsert is OðnÞ. & Table 1 shows how the XML Tree given in Fig. 13 is mapped to the relational database using our data mapping algorithm OXInsert SAX-based approach DOM-based algorithms are popular because W3C adopts DOM as its standard for XML description. For a big XML file, or multiple XML files processed in multi-tasking environment, creating DOM trees is expensive. A DOM-based data mapping algorithm processes a document in two runs: in the first run, the parser browses the Table 1 The state of the database after xbib.xml is stored
14 14 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] document and creates an XML tree in the main memory. In the second run, the data mapping algorithm accesses to this DOM tree and processes it. On the other hand, SAX-based [25] data mapping approach only needs to process the document in one run. Our SAX-based data mapping algorithm, called SDM hereafter, is given in Fig. 15. The data mapping algorithm SDM takes an XML document X, a database schema R and a s-mapping s as input as described in Definition 3.3. Event-driven SDM algorithm makes a sequential scan of the whole document from top to bottom. It triggers procedures startelementðþ, charactersðþ, and endelementðþ for start tags, character data and end tags, respectively. When a start tag for an element e is encountered, SDM triggers the procedure startelementðþ. startelementðþ generates a sequential global ID (GID) for the element e. This global ID helps to maintain XML document order in the relational database. If e is a non-inlinable element, then it creates a new tuple t of table sðtypeðeþþ and starts to fill out the fields of tuple t with the information obtained from e. While it pushes element type e and its GID onto stack GST, the tuple t is pushed onto the stack ST sðtypeðeþþ to be completely filled out when all the descendants of e are processed. If e is an inlinable element, then no new tuple is created. However, the tuple on top of the stack ST sðtypeðeþþ,is updated with GID and the attribute values of e. Then, the element type of e and its GID is pushed onto the stack GST. Set-valued attributes of e are dealt with processsetattrðþ procedure as in the DOM-based algorithm OXInsert, since values of a set-valued attribute are stored into a separate table. When any character data between the start and the end tags are encountered, SDM triggers the procedure charactersðþ. Since element e on top of GST is the owner of scanned character data, these data are mapped to the tuple on top of the stack ST sðtypeðeþþ. When the end tag for element e is encountered, SDM triggers the procedure endelementðþ.if e is non-inlinable, then endelementðþpops up the tuple t from the stack ST sðtypeðeþþ and assigns GID as endid of tuple t, and inserts t into the table sðtypeðeþþ. Otherwise, it updates the tuple on top of the stack ST sðtypeðeþþ assigning the current GID as endid of e. SDM maintains a global stack, GST, and a separate stack, ST sðtypeðeþþ, for each table sðtypeðeþþ, where sðtypeðeþþ is the table corresponding to the type of e in the underlying DTD. Global stack GST keeps the parent child relationships. The stacks for tables are used to fill the required context information for a particular tuple t of table sðtypeðeþþ. SDM pushes an item to a table stack ST sðtypeðeþþ when a start tag for a non-inlinable element e is encountered. It pops up the stack ST sðtypeðeþþ when it reads the end tag of e. Hence, a table stack never grows over one item, unless there exists a descendant element which is of the same type as its ancestor (recursive XML schema). Table stacks in SDM allow processing such elements easily without interfering with the context of a pending ancestor element, which has the same type as its descendant and for which a tuple has been already created. Theorem 5.5 (Time complexity). The time complexity of algorithm SDM is OðnÞ, where n is the number of elements and attribute values in the input XML document. We skip the proof since it is trivial. 6. Experimental study We implemented ODTDMap, OXInsert and SDM algorithms in Java. We used a Pentium IV computer with 2.4 GHz processor and 1 GB main memory for the experiments. The experiments were run using Java software development kit. We minimized the usage of system resources during the experiments to get more realistic results. We ran the programs 6 times and got the average value, excluding the first run, to have more accurate results The experiment of schema mapping ODTDMap We applied ODTDMap to a set of DTDs to conduct a performance evaluation of our proposed schema mapping algorithm ODTDMap. We used 6 test DTDs from the XBench XML Benchmark [35] for our experiments. First, we identified the properties of each DTD such as the number of elements and attributes, the number of and þ operators and, etc. Then, we ran ODTDMap and measured its time for mapping the input DTD to the output database schema. The time spent is measured by running the schema mapping procedure for 1000 times to get significant results. The number of tables generated for each DTD was recorded. The experimental results are shown in Table 2.
15 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] 15 Fig. 15. SAX-based data mapping algorithm SDM. While the total number of elements in 6 DTDs is 125 and, the total number of attributes is 14, the total number of tables generated for those DTDs is 23. The total number of tables is around one-sixth of the total number of elements and attributes. We observed that ODTDMap algorithm reduced the number of tables considerably in contrast to the number of elements. We observed that the running time of the ODTDMap algorithm is proportional to the number of elements in the input DTD. This is not surprising since ODTDMap algorithm visits each element only once and spends constant time on each element DOM-based versus SAX-based data mapping We chose auction:xml of the XMark benchmark [36] as our data set to compare the performance of the DOM-based algorithm OXInsert with the SAXbased algorithm SDM. We generated the test documents in six different sizes ranging from 25 to 125 MB. We constructed the XML Tree for each document using W3C s DOM specification. Our performance metric is the time to map the input XML document to the target relational data. While loading data to the database are not included in this time, the time for parsing the input XML documents is included in the measurement. The chart given in Fig. 16 shows the average time spent for each document using the two data mapping approaches. As shown in Fig. 16, SDM shows linear performance and scales very well with the size of the input XML documents while OXInsert shows linear performance up to the 75 MB document. DOM-based data mapping algorithm OXInsert has much better performance than the SAX-based
16 16 M. Atay et al. / Information Systems ] (]]]]) ]]] ]]] Table 2 Experimental results of schema mapping DTD file File size # of # of # of # of þ # of Running (bytes) elements attributes operators operators tables time ðmsþ country.dtd address.dtd customer.dtd item.dtd order.dtd catalog.dtd Time (sec) OXInsert SDM Size (MB) Fig. 16. The performance of OXInsert versus SDM. algorithm SDM up to the 75 MB XML document. However, after 75 MB, the SAX-based algorithm starts to outperform as XML documents beyond 75 MB could no longer be represented as a DOM tree in the main memory in our experiments. For a large XML document whose XML tree does not fit in the main memory, part of the tree will be swapped between the disk and the main memory, causing a considerable time on I/O operations and degrading the performance of the DOM-based approach. In this case, the event-driven SAX-based approach does not suffer. We observed from our experiments that, as long as the document tree can fit in the main memory, the DOM-based approach for data mapping should be chosen. Otherwise, the SAX-based approach should be the choice for data mapping Data mapping across different schema mappings In order to study the performance of both the DOM-based data mapping algorithm OXInsert and the SAX-based algorithm SDM across various schema mapping schemes, we conducted experiments on the following three classic schema mappings [4]: Basic, which inlines a child element to its parent if the parent can contain at most one occurrence of the child. Basic creates a separate relation for each element type. Therefore, an element type might be represented in multiple relations. One disadvantage of Basic is that it might generate a large number of relations, causing low performance for some queries. Shared, which inlines a child element type to its parent if the parent can contain at most one occurrence of the child. However, to avoid the problem of Basic, each element type is represented in exactly one relation. A shared element type is always mapped to a separate table in Shared. Hybrid, which inlines the shared element types that are not reached through a -edge in addition to the inlining performed by shared inlining. This approach combines the features of both Basic and Shared. We added the support for set-valued attributes to these three schema mapping algorithms. To see the impact of inlining on data mapping performance, we did not implement the inlining feature of Basic since we already implemented the same notion of inlining in Shared. The database schema generated by Basic, Shared and Hybrid for the DTD given in Fig. 3 is shown in Fig. 17. The database schemas generated by Hybrid and Shared are the same. We used auction:xml as our data set and generated test documents of sizes from 25 to 125 MB for OXInsert and from 100 to 1 GB for SDM. OXInsert does not terminate normally for test documents beyond 125 MB due to its memory space limitation.
Chapter 13 XML: Extensible Markup Language
Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server
More informationarxiv: v1 [cs.db] 8 Oct 2010
MAPPING XML DATA TO RELATIONAL DATA: A DOM-BASED APPROACH Mustafa Atay, Yezhou Sun, Dapeng Liu, Shiyong Lu, Farshad Fotouhi Department of Computer Science Wayne State University, Detroit, MI 48202 {matay,
More informationApproaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values
XML Storage CPS 296.1 Topics in Database Systems Approaches Text files Use DOM/XSLT to parse and access XML data Specialized DBMS Lore, Strudel, exist, etc. Still a long way to go Object-oriented DBMS
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1
Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.
More informationEvaluating XPath Queries
Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But
More informationSchemaless Approach of Mapping XML Document into Relational Database
Schemaless Approach of Mapping XML Document into Relational Database Ibrahim Dweib 1, Ayman Awadi 2, Seif Elduola Fath Elrhman 1, Joan Lu 1 University of Huddersfield 1 Alkhoja Group 2 ibrahim_thweib@yahoo.c
More informationPart XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321
Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends
More informationCHAPTER 3 LITERATURE REVIEW
20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations
More informationSchema-Based XML-to-SQL Query Translation Using Interval Encoding
2011 Eighth International Conference on Information Technology: New Generations Schema-Based XML-to-SQL Query Translation Using Interval Encoding Mustafa Atay Department of Computer Science Winston-Salem
More informationData Structure. IBPS SO (IT- Officer) Exam 2017
Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data
More informationDATA MODELS FOR SEMISTRUCTURED DATA
Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and
More informationXML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9
XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2
More informationNavigation- vs. Index-Based XML Multi-Query Processing
Navigation- vs. Index-Based XML Multi-Query Processing Nicolas Bruno, Luis Gravano Columbia University {nicolas,gravano}@cs.columbia.edu Nick Koudas, Divesh Srivastava AT&T Labs Research {koudas,divesh}@research.att.com
More informationLabeling Dynamic XML Documents: An Order-Centric Approach
1 Labeling Dynamic XML Documents: An Order-Centric Approach Liang Xu, Tok Wang Ling, and Huayu Wu School of Computing National University of Singapore Abstract Dynamic XML labeling schemes have important
More informationPathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data
PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg
More informationDISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining
XML DTD Relational Databases for Querying XML Documents: Limitations and Opportunities Semi-structured SGML Emerging as a standard E.g. john 604xxxxxxxx 778xxxxxxxx
More informationHierarchical Data in RDBMS
Hierarchical Data in RDBMS Introduction There are times when we need to store "tree" or "hierarchical" data for various modelling problems: Categories, sub-categories and sub-sub-categories in a manufacturing
More information12 Abstract Data Types
12 Abstract Data Types 12.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define the concept of an abstract data type (ADT). Define
More informationIndexing Keys in Hierarchical Data
University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan
More informationComputer Science 210 Data Structures Siena College Fall Topic Notes: Trees
Computer Science 0 Data Structures Siena College Fall 08 Topic Notes: Trees We ve spent a lot of time looking at a variety of structures where there is a natural linear ordering of the elements in arrays,
More informationAnswering Aggregate Queries Over Large RDF Graphs
1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo
More informationIndexing XML documents for XPath query processing in external memory
Data & Knowledge Engineering xxx (2005) xxx xxx www.elsevier.com/locate/datak Indexing XML documents for XPath query processing in external memory Qun Chen a, *, Andrew Lim a, Kian Win Ong b, Jiqing Tang
More informationIntegrating Path Index with Value Index for XML data
Integrating Path Index with Value Index for XML data Jing Wang 1, Xiaofeng Meng 2, Shan Wang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080 Beijing, China cuckoowj@btamail.net.cn
More informationThe Xlint Project * 1 Motivation. 2 XML Parsing Techniques
The Xlint Project * Juan Fernando Arguello, Yuhui Jin {jarguell, yhjin}@db.stanford.edu Stanford University December 24, 2003 1 Motivation Extensible Markup Language (XML) [1] is a simple, very flexible
More informationOne of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while
1 One of the main selling points of a database engine is the ability to make declarative queries---like SQL---that specify what should be done while leaving the engine to choose the best way of fulfilling
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More informationXML: Extensible Markup Language
XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified
More informationSemistructured Data Store Mapping with XML and Its Reconstruction
Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationAn approach to the model-based fragmentation and relational storage of XML-documents
An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible
More informationHandout 9: Imperative Programs and State
06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationCopyright 2000, Kevin Wayne 1
Chapter 3 - Graphs Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = V, m = E. Directed
More informationCS301 - Data Structures Glossary By
CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationDATA STRUCTURE AND ALGORITHM USING PYTHON
DATA STRUCTURE AND ALGORITHM USING PYTHON Advanced Data Structure and File Manipulation Peter Lo Linear Structure Queue, Stack, Linked List and Tree 2 Queue A queue is a line of people or things waiting
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationLecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1
CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationSemantics Preserving SQL-to-SPARQL Query Translation for Left Outer Join
Semantics Preserving SQL-to-SPARQL Query Translation for Left Outer Join BAHAJ Mohamed, Soussi Nassima Faculty of Science and Technologies, Settat Morocco mohamedbahaj@gmail.com sossinass@gmail.com ABSTRACT:
More informationAn Algorithm for Enumerating All Spanning Trees of a Directed Graph 1. S. Kapoor 2 and H. Ramesh 3
Algorithmica (2000) 27: 120 130 DOI: 10.1007/s004530010008 Algorithmica 2000 Springer-Verlag New York Inc. An Algorithm for Enumerating All Spanning Trees of a Directed Graph 1 S. Kapoor 2 and H. Ramesh
More informationRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULES A. A. Abd El-Aziz Research Scholar Dept. of Information Science & Technology Anna University Email: abdelazizahmed@auist.net Professor A. Kannan Dept. of Information Science
More informationXML Clustering by Bit Vector
XML Clustering by Bit Vector WOOSAENG KIM Department of Computer Science Kwangwoon University 26 Kwangwoon St. Nowongu, Seoul KOREA kwsrain@kw.ac.kr Abstract: - XML is increasingly important in data exchange
More informationOptimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching
Optimal k-anonymity with Flexible Generalization Schemes through Bottom-up Searching Tiancheng Li Ninghui Li CERIAS and Department of Computer Science, Purdue University 250 N. University Street, West
More informationAn UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry
An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry I-Chen Wu 1 and Shang-Hsien Hsieh 2 Department of Civil Engineering, National Taiwan
More informationSFilter: A Simple and Scalable Filter for XML Streams
SFilter: A Simple and Scalable Filter for XML Streams Abdul Nizar M., G. Suresh Babu, P. Sreenivasa Kumar Indian Institute of Technology Madras Chennai - 600 036 INDIA nizar@cse.iitm.ac.in, sureshbabuau@gmail.com,
More informationSecurity Based Heuristic SAX for XML Parsing
Security Based Heuristic SAX for XML Parsing Wei Wang Department of Automation Tsinghua University, China Beijing, China Abstract - XML based services integrate information resources running on different
More informationQuery Processing and Optimization using Compiler Tools
Query Processing and Optimization using Compiler Tools Caetano Sauer csauer@cs.uni-kl.de Karsten Schmidt kschmidt@cs.uni-kl.de Theo Härder haerder@cs.uni-kl.de ABSTRACT We propose a rule-based approach
More informationAssume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F].
Question Assume you are given a Simple Linked List (i.e. not a doubly linked list) containing an even number of elements. For example L = [A B C D E F]. a) Draw the linked node structure of L, including
More informationXML Systems & Benchmarks
XML Systems & Benchmarks Christoph Staudt Peter Chiv Saarland University, Germany July 1st, 2003 Main Goals of our talk Part I Show up how databases and XML come together Make clear the problems that arise
More informationGraph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1
Graph Algorithms Chapter 22 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms? Mathematical graphs seem to be relatively specialized and abstract Why spend so much time and effort on algorithms
More informationFORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE
TED (10)-3071 Reg. No.. (REVISION-2010) Signature. FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours (Maximum marks: 100)
More informationUserMap an Exploitation of User-Specified XML-to-Relational Mapping Requirements and Related Problems
UserMap an Exploitation of User-Specified XML-to-Relational Mapping Requirements and Related Problems (Technical Report) Irena Mlýnková and Jaroslav Pokorný Charles University Faculty of Mathematics and
More informationXML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11
!important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... 7:4 @import Directive... 9:11 A Absolute Units of Length... 9:14 Addressing the First Line... 9:6 Assigning Meaning to XML Tags...
More informationLecture Notes. char myarray [ ] = {0, 0, 0, 0, 0 } ; The memory diagram associated with the array can be drawn like this
Lecture Notes Array Review An array in C++ is a contiguous block of memory. Since a char is 1 byte, then an array of 5 chars is 5 bytes. For example, if you execute the following C++ code you will allocate
More informationAccelerating XML Structural Matching Using Suffix Bitmaps
Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,
More informationPathfinder/MonetDB: A High-Performance Relational Runtime for XQuery
Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information
More informationEfficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases
Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Boon-Siew Seah 1, Klarinda G. Widjanarko 1, Sourav S. Bhowmick 1, Byron Choi 1 Erwin Leonardi 1, 1 School
More informationSemistructured Data and XML
Semistructured Data and XML Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Structured Data The logical models we've covered thus far all use some type of schema to define the structure
More informationGraphs. Part I: Basic algorithms. Laura Toma Algorithms (csci2200), Bowdoin College
Laura Toma Algorithms (csci2200), Bowdoin College Undirected graphs Concepts: connectivity, connected components paths (undirected) cycles Basic problems, given undirected graph G: is G connected how many
More information2.2 Syntax Definition
42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions
More informationCHAPTER 5 GENERATING TEST SCENARIOS AND TEST CASES FROM AN EVENT-FLOW MODEL
CHAPTER 5 GENERATING TEST SCENARIOS AND TEST CASES FROM AN EVENT-FLOW MODEL 5.1 INTRODUCTION The survey presented in Chapter 1 has shown that Model based testing approach for automatic generation of test
More informationTwig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents
Twig Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li, Junichi Tatemura Wang-Pin Hsiung, Divyakant Agrawal, K. Selçuk Candan NEC Laboratories
More information4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests
4 Basics of Trees Trees, actually acyclic connected simple graphs, are among the simplest graph classes. Despite their simplicity, they still have rich structure and many useful application, such as in
More informationSelectively Storing XML Data in Relations
Selectively Storing XML Data in Relations Wenfei Fan 1 and Lisha Ma 2 1 University of Edinburgh and Bell Laboratories 2 Heriot-Watt University Abstract. This paper presents a new framework for users to
More informationRelational Storage for XML Rules
Relational Storage for XML Rules A. A. Abd El-Aziz Research Scholar Dept. of Information Science & Technology Anna University Email: abdelazizahmed@auist.net A. Kannan Professor Dept. of Information Science
More informationA new generation of tools for SGML
Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because
More informationOutline. Depth-first Binary Tree Traversal. Gerênciade Dados daweb -DCC922 - XML Query Processing. Motivation 24/03/2014
Outline Gerênciade Dados daweb -DCC922 - XML Query Processing ( Apresentação basedaem material do livro-texto [Abiteboul et al., 2012]) 2014 Motivation Deep-first Tree Traversal Naïve Page-based Storage
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationXML/Relational mapping Introduction of the Main Challenges
HELSINKI UNIVERSITY OF TECHNOLOGY November 30, 2004 Telecommunications Software and Multimedia Laboratory T-111.590 Research Seminar on Digital Media (2-5 cr.): Autumn 2004: Web Service Technologies XML/Relational
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationBlossomTree: Evaluating XPaths in FLWOR Expressions
BlossomTree: Evaluating XPaths in FLWOR Expressions Ning Zhang University of Waterloo School of Computer Science nzhang@uwaterloo.ca Shishir K. Agrawal Indian Institute of Technology, Bombay Department
More informationIndex-Driven XQuery Processing in the exist XML Database
Index-Driven XQuery Processing in the exist XML Database Wolfgang Meier wolfgang@exist-db.org The exist Project XML Prague, June 17, 2006 Outline 1 Introducing exist 2 Node Identification Schemes and Indexing
More informationIan Kenny. November 28, 2017
Ian Kenny November 28, 2017 Introductory Databases Relational Algebra Introduction In this lecture we will cover Relational Algebra. Relational Algebra is the foundation upon which SQL is built and is
More informationAn Improvement of an Approach for Representation of Tree Structures in Relational Tables
An Improvement of an Approach for Representation of Tree Structures in Relational Tables Ivaylo Atanassov Abstract: The paper introduces an improvement of an approach for tree representation in relational
More informationXEM: XML Evolution Management
Worcester Polytechnic Institute Digital WPI Computer Science Faculty Publications Department of Computer Science 1-2002 XEM: XML Evolution Management Hong Su Worcester Polytechnic Institute Diane K. Kramer
More information8. Write an example for expression tree. [A/M 10] (A+B)*((C-D)/(E^F))
DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING EC6301 OBJECT ORIENTED PROGRAMMING AND DATA STRUCTURES UNIT IV NONLINEAR DATA STRUCTURES Part A 1. Define Tree [N/D 08]
More informationACONCURRENT system may be viewed as a collection of
252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 3, MARCH 1999 Constructing a Reliable Test&Set Bit Frank Stomp and Gadi Taubenfeld AbstractÐThe problem of computing with faulty
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 4 Graphs Definitions Traversals Adam Smith 9/8/10 Exercise How can you simulate an array with two unbounded stacks and a small amount of memory? (Hint: think of a
More informationAn Appropriate Search Algorithm for Finding Grid Resources
An Appropriate Search Algorithm for Finding Grid Resources Olusegun O. A. 1, Babatunde A. N. 2, Omotehinwa T. O. 3,Aremu D. R. 4, Balogun B. F. 5 1,4 Department of Computer Science University of Ilorin,
More informationData Structures Question Bank Multiple Choice
Section 1. Fundamentals: Complexity, Algorthm Analysis 1. An algorithm solves A single problem or function Multiple problems or functions Has a single programming language implementation 2. A solution
More informationModule 4. Implementation of XQuery. Part 2: Data Storage
Module 4 Implementation of XQuery Part 2: Data Storage Aspects of XQuery Implementation Compile Time + Optimizations Operator Models Query Rewrite Runtime + Query Execution XML Data Representation XML
More informationTrees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017
12. Graphs and Trees 2 Aaron Tan 6 10 November 2017 1 10.5 Trees 2 Definition Definition Definition: Tree A graph is said to be circuit-free if, and only if, it has no circuits. A graph is called a tree
More informationXML publishing. Querying and storing XML. From relations to XML Views. From relations to XML Views
Querying and storing XML Week 5 Publishing relational data as XML XML publishing XML DB Exporting and importing XML data shared over Web Key problem: defining relational-xml views specifying mappings from
More informationGraph Representations and Traversal
COMPSCI 330: Design and Analysis of Algorithms 02/08/2017-02/20/2017 Graph Representations and Traversal Lecturer: Debmalya Panigrahi Scribe: Tianqi Song, Fred Zhang, Tianyu Wang 1 Overview This lecture
More informationAn undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.
Trees Trees form the most widely used subclasses of graphs. In CS, we make extensive use of trees. Trees are useful in organizing and relating data in databases, file systems and other applications. Formal
More informationA Connection between Network Coding and. Convolutional Codes
A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source
More informationQuery Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems
Query Processing: A Systems View CPS 216 Advanced Database Systems Announcements (March 1) 2 Reading assignment due Wednesday Buffer management Homework #2 due this Thursday Course project proposal due
More informationDepth-First Search Depth-first search (DFS) is another way to traverse the graph.
Depth-First Search Depth-first search (DFS) is another way to traverse the graph. Motivating example: In a video game, you are searching for a path from a point in a maze to the exit. The maze can be modeled
More informationTrees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.
Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial
More informationTrees. Carlos Moreno uwaterloo.ca EIT https://ece.uwaterloo.ca/~cmoreno/ece250
Carlos Moreno cmoreno @ uwaterloo.ca EIT-4103 https://ece.uwaterloo.ca/~cmoreno/ece250 Today's class: We'll discuss one possible implementation for trees (the general type of trees) We'll look at tree
More informationEcient XPath Axis Evaluation for DOM Data Structures
Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be
More informationTrees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.
Trees Q: Why study trees? : Many advance DTs are implemented using tree-based data structures. Recursive Definition of (Rooted) Tree: Let T be a set with n 0 elements. (i) If n = 0, T is an empty tree,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK CONVERTING XML DOCUMENT TO SQL QUERY MISS. ANUPAMA V. ZAKARDE 1, DR. H. R. DESHMUKH
More information[ DATA STRUCTURES ] Fig. (1) : A Tree
[ DATA STRUCTURES ] Chapter - 07 : Trees A Tree is a non-linear data structure in which items are arranged in a sorted sequence. It is used to represent hierarchical relationship existing amongst several
More informationXML-Relational Mapping. Introduction to Databases CompSci 316 Fall 2014
XML-Relational Mapping Introduction to Databases CompSci 316 Fall 2014 2 Approaches to XML processing Text files/messages Specialized XML DBMS Tamino(Software AG), BaseX, exist, Sedna, Not as mature as
More informationA FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS
A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:
More information