A Foundation for XML Document Databases: DTD Modeling

Size: px
Start display at page:

Download "A Foundation for XML Document Databases: DTD Modeling"

Transcription

1 A Foundation for XML Document Databases: DTD Modeling 1 A Foundation for XML Document Databases: DTD Modeling Chutiporn Anutariya 1, Vilas Wuwongse 1, Kiyoshi Akama 2 and Ekawit Nantajeewarawat 3 1 Computer Science & Information Management Program, Asian Institute of Technology Pathumtani 12120, Thailand {ca, vw}@cs.ait.ac.th 2 Department of Information Engineering, Faculty of Engineering, Hokkaido University, Sapporo 060, Japan akama@complex.eng.hokudai.ac.jp 3 Information Technology Program, Sirindhorn International Institute of Technology, Thammasat University, Pathumtani 12120, Thailand ekawit@siit.tu.ac.th Abstract. XML Document Type Declarations (DTDs) and document validity are crucial notions to the implementation of a variety of XML-based applications. By employment of XML Declarative Description theory, this paper develops a formalism for specification of an XML document s grammatical constraints imposed by a DTD and also presents a mechanism to validate XML documents with respect to a given DTD. The proposed formalism can be similarly applied to restriction and validation of other forms of constraints, e.g., type constraints. For ease of understanding, various examples on the formalized concepts are given throughout the paper. Keywords. specialization system, XML document, XML declarative description, XML element, XML expression, XML DTD, document validation. 1 Introduction Although XML is only required to be well-formed, many applications requiring communication and exchange of XML data with other applications demand often a support for the verification of the conformance of a document with some particular Document Type Definition (DTD), in order to ensure correctness of the logical structure of the exchanged data. However, several proposals to model and manage XML documents, e.g., graph-based [8,10] and functional programming approaches [13], have failed to directly model XML DTDs and to accommodate such a key functionality. Hence, those models require some significant extensions. For example, applications of schema graphs, first-order logic or Datalog rules to the graph model, as presented in [1], [1,11] and [1], respectively, can incorporate a facility for the description of document structuring subject to certain kinds of constraints; however, integration of two different formalisms could make the model complicated and difficult to understand. Important formalisms for the specification of the structure imposed by a DTD are based on hedge automaton theory [14,15,16], and Datalog rules [1]. In the hedge automaton approach, an XML document is represented by a hedge (aka. a forest or a sequence of trees corresponding to the document s text representation) and a DTD is expressed by a regular hedge grammar (RHG) an extension of context-free grammar in string automaton theory. However, such a formalism only provides a way to regulate elements content model, defined in terms of element type declarations, while lacking an ability to capture attribute-list declarations which are used to specify the names, data types as well as default values of attributes associated with some particular element types. Thus, the hedge automaton approach is considered to be insufficient to model XML DTDs as it cannot restrict uniqueness and referential constraints defined by attributes of types ID and IDREF(S), respectively. Moreover, other forms of integrity constraints are not readily devised for it. The Datalog approach to the modeling of XML DTDs [1] requires a specific definition of how to represent an XML document, a hierarchical- and recursive-structured object, in such an inexpressive flat structure provided by Datalog. If a document is merely viewed as a directed, edge-labeled graph without a focus on the semantics attached to elements tag names and attribute names, only a relational representation scheme for a data graph (a

2 A Foundation for XML Document Databases: DTD Modeling 2 set of directed edges and vertices) needs to be defined. On the other hand, if a document s semantics is taken into account, the document is then represented as a set of related tuples contained in their corresponding relations; different documents are probably modeled differently, whence resulting in a very complex and huge database schemas. A DTD is then formalized as a set of Datalog rules, where predicates contained in each rule have structures corresponding to the selected document s relational representation. A new approach is presented to the modeling of XML DTDs by employment of XML Declarative Description (XML-DD) theory [6,17] which serves as a foundation for the representation and computation of as well as reasoning with XML data. In this approach, an XML DTD is represented as an XML-DD which comprises a set of clauses, to be referred specifically as DTD clauses. Such an XML-DD is obtained directly by translation of each of the element type and attribute-list declarations contained in the DTD into a corresponding set of DTD clauses and consequence combination of these sets. This formalism also facilitates the development of a simple mechanism for convenient determination of whether a given XML element/document conforms to the grammar imposed by the DTD or not. Besides providing means for restriction of a document s syntactical constraints, this formalism can also be applied to enforce various kinds of integrity constraints which are not expressible in terms of DTDs but are extremely important in query evaluation [5] and optimization, e.g., atomic typing (char, integer, float, etc.) and restrictions on the type of IDREF(S). Section 2 summarizes the XML-DD theory developed in [6,17] and presents its extension for dealing with references, Section 3 develops a formalism for modeling XML DTDs, Section 4 presents an approach to validation of an element/document against a particular DTD, and Section 5 concludes and outlines future research. 2 XML Declarative Description Theory 2.1 Declarative Description Data Model for XML Documents In the declarative description data model for XML documents [17], developed by employment of Declarative Description (DD) theory [1,3,4], the definition of an XML element is formally extended by incorporation of variables in order to represent inherent implicit information and enhance its expressive power. Such extended XML elements, referred to as XML expressions, have a similar form to XML elements except that they can carry variables. XML expressions without variable will be called ground XML expressions or XML elements, those with variables non-ground XML expressions. There are several kinds of variables useful for the representation of implicit information contained in XML expressions: name-variables (N-variables), string-variables (Svariables), attribute-value-pair-variables (P-variables), XML-expression-variables (E-variables) and intermediate-expression-variables (I-variables). Every variable is preceded by together with a character specifying its type, i.e., 1, 6, 3, ( or,. An XML expression alphabet Ω X comprises the symbols in the following sets: Σ (a set of characters), N (a set of names), NVAR (a set of N-variables), SVAR (a set of S-variables), PVAR (a set of P-variables), EVAR (a set of E-variables) and IVAR (a set of I-variables). Intuitively, an N-variable will be instantiated to an element type or an attribute name, an S-variable to a string on Σ *, a P-variable to a sequence of attribute-value pairs, an E-variable to a sequence of XML expressions, an I- variable to a part of an XML expression. Such variable instantiations are defined by means of basic specializations each of which is a pair of the form (var, val), where var is the variable to be specialized and val a value or tuple of values describing the resulting structure. There are four types of basic specializations: 1. rename variables, 2. expand a P- or an E-variable into a sequence of variables of their respective types, 3. remove P-, E- or I-variables, and 4. instantiate variables to some values which correspond to the types of the variables. Let $ X denote the set of all XML expressions on Ω X, * X the subset of $ X which comprises all ground XML expressions in $ X, & X the set of basic specializations and ν X : & X partial_map($ X ) the mapping from & X to the set of all partial mappings on $ X which determines for each basic specialization c in & X the change of elements in $ X caused by c. Let X = $ X, * X, & X, ν X be a specialization generation system, which will be used to define a specialization system characterizing the data structure of XML expressions and sets of XML expressions. Let V be a set of set variables, $ = $ X 2 ($ X V), * = * X 2 * X, & = & X (V 2 ($ X V) ), and

3 A Foundation for XML Document Databases: DTD Modeling 3 ν : & partial_map($) the mapping from & to the set of all partial mappings on $ which determines for each basic specialization c in & the change of objects in $ caused by c such that 1. If c & X and a $ X, then ν(c)(a) = ν X (c)(a). 2. If c (& & X ) and a $ X, then ν(c)(a) = a. 3. If c & X, S = {a 1,, a m, v 1,, v n } ($ $ X ), a i $ X and v j V, then ν(c)(s) = {ν X (c)(a 1 ),, ν X (c)(a m ), v 1,, v n } 4. If c = (v, R) (& & X ) and S = {x 1,, x n, v} ($ $ X ), then ν(c)(s) = {x 1,, x n } R. In order to distinguish a set variable from other types of variables, every set variable in V will be preceded by 9. In the sequel, let Γ = $, *, 6, µ (1) be a specialization system for XML expressions with flat sets, where6 = & *, and µ : 6 partial_map($) such that, for a $, µ(λ)(a) = a, where λ denotes the null sequence, µ(c s)(a) = µ(s)(ν(c)(a)), where c & and s 6. Elements of $, * and 6 are called objects, ground objects and specializations, respectively. The mapping µ is called the specialization mapping. Note that when µ is clear from the context, for θ 6, µ(θ)(a) will be written simply as aθ, and, for X V, a singleton {X} will be written as X. The definition of XML declarative description with references together with its related concepts can be given in terms of Γ = $, *, 6, µ. 2.2 XML Declarative Description with References An XML declarative description on Γ, simply called an XML-DD or a description, is a (possibly infinite) set of clauses on Γ, each in the form H B 1, B 2,..., B n. (2) where n 0, H is an XML expression in $ X, and B i an XML expression in $ X, a constraint or a reference on Γ. H is called the head and (B 1, B 2,..., B n ) the body of the clause. Such a clause, if n = 0, is specifically called a unit clause, and, if n > 0, a non-unit clause. Let. be a set of constraint predicates. A constraint on Γ is a formula q(a 1,, a n ), where n > 0, q is a constraint predicate in. and a i an object in $. Given a ground constraint q(g 1,, g n ), g i *, its truth or falsity is assumed to be predetermined. Denote the set of all true ground constraints by Tcon. The notion of constraints introduced here is useful for defining restrictions on objects in $, i.e., both on XML expressions in $ X and on sets of XML expressions in 2 ($ X V). Let ) be the set of all mappings: 2 * 2 *, the elements of which are called reference functions. A reference on Γ is a triple r = a, f, P of an object a in $, a reference function f in ) and a description P, which will be called the referred description of r. A reference g, f, P is a ground reference iff g *. Such a notion of references introduced here together with appropriate definitions of id-, idref- and idfefs-reference functions in ) (cf. Definition 9, Subsection 3.2) will be employed to restrict uniqueness and referential constraints imposed by attributes of types ID and IDREF(S), respectively (cf. Definition 11, Subsection 3.2). For instance, given an XML element identified by x, in order to ensure the uniqueness of such an identifier x with respect to a particular XML document represented by a description P, an id reference <LG YDOXH=x/>, id dtd, P is formulated. Given a specialization θ 6, application of θ to a constraint q(a 1,, a n ) is the constraint q(a 1 θ,, a n θ), to a reference a, f, P the reference a, f, P θ = aθ, f, P and to a clause (H B 1, B 2,..., B n ) the clause (Hθ B 1 θ, B 2 θ,..., B n θ). The head of a clause C will be denoted by head(c) and the set of all objects (XML expressions), constraints and references in the body of C by object(c), con(c) and ref(c), respectively. Let body(c) = object(c) con(c) ref(c). A clause C is a ground clause iff C comprises only ground objects, ground constraints and ground references.

4 A Foundation for XML Document Databases: DTD Modeling 4 Let C be a clause (either unit or non-unit clause) and P a description on ΓThe height of C and P, denoted by hgt(c)and hgt(p), respectively, are defined as follows 1. The height of the clause C is zero if C contains no reference, i.e., if ref(c) =. 2. If the clause C contains references, its height is equal to the maximum height of the all referred descriptions contained in its body plus one. 3. The height of the description P is the maximum height of all the clauses in P. Let P be a description on Γ. The meaning of P, denoted by0(p), is defined inductively as follows: 1. Given the meaning, 0(Q), of a description Q with the height m, a reference r = g, f, Q is a true reference iff g f(0(q)). For any m 0, define Tref(m) as the set of all true references the heights of the referred description of which are smaller than or equal to m, i.e.: Tref(m) = { g, f, R g *, f ), hgt(r) m, g f (0(R)) } (3) 2. The meaning, 0(P), of the description P with the height m + 1 is a set of ground XML expressions defined by 0(P) = =1 n n [ T ] ( ) (4) P where is the empty set, [T P ] n ( ) = T P ([T P ] n-1 ( )) and the mapping T P : 2 * 2 * is defined as follows: For each X *, g T P (X) iff there exist a clause C P and a specialization θ 6 such that Cθ is a ground clause the head of which is g and all the objects, constraints and references in the body of which belong to X, Tcon and Tref(n), for some n m, respectively, i.e.: T P (X) = {head(cθ) C P, θ 6, Cθ is a ground clause, object(cθ) X, con(cθ) Tcon, ref(cθ) Tref(n), n m } (5) Intuitively, given a description P, its meaning, 0(P), is a set of all the ground XML expressions which can be derived from the clauses in P. In other words, given a clause C = (H B 1, B 2,..., B n ), n 0, in P, for every θ 6 that makes B 1 θ, B 2 θ,..., B n θ true with respect to the meaning of P, the expression Hθ will be derived and included in the meaning of P. 3 XML DTD Modeling This section employs the XML-DD theory, formulated in Section 2, to model XML DTDs 3.1 Element Type and AttributeList Declarations Element type and attributelist declarations are two essential declarations contained in an XML DTD, used to define the ordering and structuring of elements in a document. An element type declaration typically specifies the element s content model. In other words, it provides a grammar regulating the structure of the element s content which could be empty, character data or a valid sequence of the allowed types of child elements. An attribute-list declaration specifies the names, data types as well as default values (if any) of attributes associated with a given element type. XML elements content models can be categorized into three classes: empty, simple and complex (or nested) content models. An element type has empty content if elements of that type are empty, i.e., they are encoded by empty-element tags only, it has simple content if elements of that type contain merely character data, and it has complex content if elements of that type contain a sequence of one or more child elements. For these three classes of content models, there are also three corresponding forms of element type declarations: empty, simple and complex forms. Each form is used to declare element types with the respective content model, i.e., an emptycontent element type is declared by an empty-formed declaration, a simple-content element type by a simple-

5 A Foundation for XML Document Databases: DTD Modeling 5 formed declaration and a complex-content element type by a complex-formed declaration which employs content particles (simply referred to as particles) to constrain the element s content. Given below is the formal definition of content particles which will be used in the definition of complexformed element type declarations (cf. Definition 2-3). Definition 1 [Content particles] A content particle on a set of names N takes one of the forms: 1. Unqualified content particle 1.1. atomic form: elem-type 1.2. choice-list form: ( cp 1 cp n ) 1.3. sequence-list form: ( cp 1,, cp n ) 2. Qualified content particle: 2.1.?-form: cp? form: cp *-form: cp * where elem-type is an element type in N, n > 1, cp i is a content particle, cp is an unqualified content particle. Let CP be the set of all content particles on N. Apparently, a content particle is simply a regular expression over element types in N. Definition 2 [Element type declarations] An element type declaration on N assumes one of the forms: 1. empty form: (/(0(17 elem-type (037<! 2. simple form: (/(0(17 elem-type 3&'$7$! 3. complex form: (/(0(17 elem-type content-particle! where elem-type N specifies the element type being declared content-particle CP describes the element s content model. Let ETD be the set of all element type declarations on N. From the definition of element type declarations, an element type having a very complex content model can be simply described by a content particle which is formed by combinations of nested content particles and occurrence qualifiers?, + or *. Definition 3 [Attribute-list declarations] An attribute-list declaration has the form: $77/,67 elem-type attr-name 1 attr-type 1 attr-default 1 attr-name n attr-type n attr-default n! where n 1, elem-type N specifies the type of element that will be associated by the specified set of attributes, the attr-name i N are distinct attribute names, attr-type i {&'$7$,,',,'5(),,'5()6} {(value 1 value m ) value j Σ* are distinct enumerated values}, attr-default i {5(48,5(',,03/,('} {),;(' fixed-value fixed-value Σ*} Σ*. Let ALD be the set of all attribute-list declarations.

6 A Foundation for XML Document Databases: DTD Modeling 6 Definition 4 [Document type declarations] A document type declaration is a sequence d 1 d 2 d n, where d i (ETD ALD). Let DTD = (ETD ALD)*, i.e., the set of all sequences on (ETD ALD), be the set of all document type declarations. 3.2 XML DTD Translation In the proposed approach, an XML DTD is modeled as a description comprising a set of clauses. Such clauses, precisely referred to as DTD clauses, are obtained directly by translation of each of the element type and attribute-list declarations contained in the DTD into a corresponding set of clauses and then combination of these sets. The numbers of clauses formulated for an element type declaration and for an attribute-list declaration depend solely on the complexity of the element type s content model and on the number of the declared attributes, their specified types and default values, respectively. The more complex is an element s structure, the greater a number of DTD clauses is obtained. There are two classes of DTD clauses, namely, those that restrict element types content model and those that constrain associated lists of attributes. The tag name of the head expression of each DTD clause starts simply with the name of the translated DTD, concatenated with the name of the element type being restricted. Such a head expression only describes certain particular restrictions on the element type s content model and merely specifies a general pattern of associated attribute list. Additional restrictions on the element s content model (e.g., descriptions of valid sequences of child elements) and on its associated attribute list (e.g., attribute type and default value constraints) are defined by appropriate specifications of XML expressions, constraints and references in a clause s body. An XML expression contained in a clause s body will be further restricted by the other DTD clauses the head of which can be matched with that XML expression. Constraints and references in a clause s body are used to impose conditions on attribute types and default values. An XML element is valid with respect to a given DTD, if such an element can successfully match with the head of some clause translated from the DTD and all the restrictions specified in the body of such a clause are satisfied. Let XClause denote the set of all clauses on Γ. Given next is the formal definition of the mapping τ CP to be used for the definition of the element-type-declaration translator, τ E (cf. Definition 6). Intuitively, τ CP recursively translates a given pair (cp, cp-specification) into a corresponding set of clauses, where cp is a content particle and cp-specification an underscored separated element type in N having the form dtd_elem-type_position, where dtd specifies the translated DTD, elem-type the declared element type and position the location that cp occurs in the declaration of elem-type. In the sequel, assume that the DTD being translated is denoted by dtd. Definition 5 [τ CP, the content-particle translator] Let cp CP and cp-spec N. The content-particle translator τ CP : (CP N) 2 XClause is defined by Table 1. Table 1. τ CP, the content-particle translator. Types of Content Particles Content Particle cp CP τ CP (cp, cp-spec) 1. Unqualified Content Particle 1.1. Atomic Form 1.2 Choice-List Form cp = elem-type, where elem-type N cp = ( cp 1 cp n ), where n 1, cp i CP τ CP (cp, cp-spec) = {C}, where C: <cp-spec> </cp-spec> <dtd_elem-type> τ CP (cp, cp-spec) = n CP i=1 where, for each i {1,, n}, C i : <cp-spec> </dtd_elem-type>. τ (cp i, cp-spec_i) {C 1,, C n }, </cp-spec> <cp-spec_i> </cp-spec_i>.

7 A Foundation for XML Document Databases: DTD Modeling 7 Types of Content Particles Content Particle cp CP τ CP (cp, cp-spec) 1.3. Sequence-List Form 2. Qualified Content Particles 2.1.?-Form Form 2.3. *-Form cp = ( cp 1,, cp n ), where n 1, cp i CP cp = ( cp 1? ), where cp 1 CP cp = ( cp 1 + ), where cp 1 CP cp = ( cp 1 * ), where cp 1 CP τ CP (cp, cp-spec) = n C: <cp-spec> τ CP i=1 (cp i, cp-spec_i) {C}, where n </cp-spec> <cp-spec_1> </cp-spec_1>,, <cp-spec_i> n </cp-spec_i>. τ CP (cp, cp-spec) = τ CP (cp 1, cp-spec_1) {C 1, C 2 }, where C 1 : <cp-spec> </cp-spec> <cp-spec_1> <cp-spec_1>. C 2 : <cp-spec> </cp-spec>. τ CP (cp, cp-spec) = τ CP (cp 1, cp-spec_1) {C 1, C 2 }, where C 1 : <cp-spec> </cp-spec> <cp-spec_1> C 2 : <cp-spec> </cp-spec_1>. </cp-spec> <cp-spec_1> </cp-spec_1>, <cp-spec> </cp-spec>. τ CP (cp, cp-spec) = τ CP (cp 1, cp-spec_1) {C 1, C 2 }, where C 1 : <cp-spec> </cp-spec> <cp-spec_1> C 2 : <cp-spec> </cp-spec>. <cp-spec_1>, <cp-spec> </cp-spec>. Example 1 Given a content particle cp = (2UJDQL]HU _ 6SRQVRU) together with its specification P\'7'B&RQIHUHQFHBB which describes that cp occurs in the declaration of &RQIHUHQFH element type of P\'7', by means of the translator τ CP, the pair (cp, P\'7'B&RQIHUHQFHBB) can be translated into a corresponding set of clauses: 1. τ CP ((2UJDQL]HU_6SRQVRU), P\'7'B&RQIHUHQFHBB) = τ CP (2UJDQL]HU, P\'7'B&RQIHUHQFHBBB) τ CP (6SRQVRU, P\'7'B&RQIHUHQFHBBB)

8 A Foundation for XML Document Databases: DTD Modeling 8 where {C 1, C 2 } C 1 : P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! C 2 : P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! 2. τ CP (2UJDQL]HU, P\'7'B&RQIHUHQFHBBB) = τ CP (2UJDQL]HU, P\'7'B&RQIHUHQFHBBBB) {C 3, C 4 } where C 3 : P\'7'B&RQIHUHQFHBBBB! P\'7'B&RQIHUHQFHBBBB! C 4 : P\'7'B&RQIHUHQFHBBBB! P\'7'B&RQIHUHQFHBBBB! 3. τ CP (2UJDQL]HU, P\'7'B&RQIHUHQFHBBBB) = {C 5 } where C 5 : P\'7'B&RQIHUHQFHBBBB! P\'7'B&RQIHUHQFHBBBB! P\'7'B2UJDQL]HU! P\'7'B2UJDQL]HU! 4. τ CP (6SRQVRU, P\'7'B&RQIHUHQFHBBB) = τ CP (6SRQVRU, P\'7'B&RQIHUHQFHBBBB) {C 6, C 7 } where C 6 : P\'7'B&RQIHUHQFHBBBB! P\'7'B&RQIHUHQFHBBBB! C 7 :

9 A Foundation for XML Document Databases: DTD Modeling 9 5. τ CP (6SRQVRU, P\'7'B&RQIHUHQFHBBBB) = {C 8 } where C 8 : P\'7'B&RQIHUHQFHBBBB! P\'7'B&RQIHUHQFHBBBB! P\'7'B6SRQVRU! P\'7'B6SRQVRU!, let P 1 = τ CP (cp, P\'7'B&RQIHUHQFHBB) = {C 1,, C 8 }. Based on the definition of the content particle translator τ CP, the definition of element-type-declaration translator τ E is now given. Definition 6 [τ E, the element-type-declaration translator] Let d ETD be an element type declaration. The element-type-declaration translator τ E : ETD 2 XClause is defined by Table 2. Table 2. τ E, the element-type-declaration translator. Types of Element Type Declarations Element Type Declaration d ETD τ E (d) = {C}, where τ E (d) 1. Empty Form 2. Simple Form 3. Complex Form <(/(0(17elem-type (037<>, where elem-type N <(/(0(17elem-type 3&'$7$>, where elem-type N <(/(0(17elem-type cp>, where cp CP C: <dtd_elem-type> <elem-type 3DWWU/LVW/> </dtd_elem-type> <dtd_elem-type_dwwu/lvwb3dwwu/lvw/>. τ E (d) = {C}, where C: <dtd_elem-type> <elem-type 3DWWU/LVW> 6SFGDWD </elem-type> </dtd_elem-type> <dtd_elem-type_dwwu/lvwb3dwwu/lvw/>. τ E (d) = τ CP (cp, dtd_elem-type_1) {C}, where C: <dtd_elem-type> <elem-type 3DWWU/LVW> </elem-type> </dtd_elem-type> <dtd_elem-type_dwwu/lvwb3dwwu/lvw/>, <dtd_elem-type_1> </dtd_elem-type_1>. Example 2 Denote the DTD of Fig. 1 by P\'7'. This example demonstrates a translation of &RQIHUHQFH element type declaration d 1 into a corresponding set of clauses. d 1 (/(0(17&RQIHUHQFH1DPH2UJDQL]HU_6SRQVRU! d 2 $77/,67&RQIHUHQFHXUO,'5(48,5(' W\SH,QWHUQDWLRQDO_/RFDO5(48,5(' FKDLU,'5(),03/,('! d 3 (/(0(171DPH3&'$7$! d 4 (/(0(172UJDQL]HU3&'$7$! d 5 (/(0(176SRQVRU3&'$7$! d 6 (/(0(173HUVRQ3&'$7$! d 7 $77/,673HUVRQVVQ,'5(48,5('! Fig. 1. An XML DTD Example.

10 A Foundation for XML Document Databases: DTD Modeling τ E (d 1 ) = τ CP (1DPH2UJDQL]HU_6SRQVRU*, P\'7'B&RQIHUHQFHB) {C 9 } where C 9 : P\'7'B&RQIHUHQFH! &RQIHUHQFH3DWWU/LVW! &RQIHUHQFH! P\'7'B&RQIHUHQFH! P\'7'B&RQIHUHQFHBDWWU/LVWB3DWWU/LVW! P\'7'B&RQIHUHQFHB! P\'7'B&RQIHUHQFHB! 2. τ CP (1DPH2UJDQL]HU_6SRQVRU*, P\'7'B&RQIHUHQFHB) = τ CP (1DPH, P\'7'B&RQIHUHQFHBB) τ CP (2UJDQL]HU_6SRQVRU*, P\'7'B&RQIHUHQFHBB) {C 10 } where C 10 : P\'7'B&RQIHUHQFHB! P\'7'B&RQIHUHQFHB! P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! 3. τ CP (1DPH, P\'7'B&RQIHUHQFHBB) = {C 11 } where C 11 : P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! P\'7'B1DPH! P\'7'B1DPH! 4. τ CP (2UJDQL]HU_6SRQVRU*, P\'7'B&RQIHUHQFHBB) = P 1 = {C 1,, C 8 } These four steps yield P 2 = τ E (d 1 ) = {C 9, C 10, C 11 } P 1. Clause C 9 imposes some restrictions on the &RQIHUHQFH element. Its head specifies that every conforming &RQIHUHQFH element must contain a list of associated attribute-value pairs as well as a sequence of subelements, represented by the P-variable 3DWWU/LVWand the E-variable, respectively. Its first and second body elements indicate that the validity of the attribute list and the subelement sequence will be determined by clauses with the heads: P\'7'B&RQIHUHQFHBDWWU/LVWB and P\'7'B&RQIHUHQFHB elements, i.e., by those clauses obtained by translation of the declaration of &RQIHUHQFH s attributes (cf. Example 3) and by clause C 10, respectively. Clause C 10 divides the subelement sequence of a &RQIHUHQFH element into arbitrary two subelement sequences and then specifies that restrictions on the first sequence are defined by means of the P\'7'B&RQIHUHQFHBB expression (i.e., by clause C 11 obtained from τ CP (1DPH,P\'7'B&RQIHUHQFHBB))) while restrictions on the second sequence by P\'7'B&RQIHUHQFHBB expression (i.e., by clauses C 1 and C 2 in description P 1 obtained from τ CP (2UJDQL]HU _ 6SRQVRU*, P\'7'B&RQIHUHQFHBB)). Clause C 11 simply constrains that such a first sequence must contain exactly one element conforming to the grammar defined for the 1DPH element type, i.e., it must satisfy the clauses the head of which are P\'7'B1DPH expressions. Clauses C 1 and C 2 demand that the second sequence must conform to the restriction defined by clauses C 3 C 4 or by clauses C 6 C 7, respectively. Clauses C 3 and C 4 together specify that such a sequence may consist of one or more sub-sequences each sub-sequence of which is restricted by clause C 5, i.e., it must contain a valid 2UJDQL]HU element. Alternatively, clauses C 6 and C 7 indicate that such a second sequence of a &RQIHUHQFH element may

11 A Foundation for XML Document Databases: DTD Modeling 11 comprise zero or more sub-sequences each of which is constrained by clause C 8, i.e., each sub-sequence must contain a valid 6SRQVRU element. In the sequel, let 6LG be an S-variable in SVAR. Definition 7 [Mapping EID] A mapping ElementID : DTD 2 $ X is EID(dtd) = Y $ X such that the XML expressions,dq([suhvvlrq! elemtype i attrname i 6LG3DWWU/LVW! 6FRQWHQW elemtype i! and,dq([suhvvlrq!,dq([suhvvlrq! elemtype i attrname i 6LG3DWWU/LVW! elemtype i!,dq([suhvvlrq! where,dq([suhvvlrq IVAR, 6FRQWHQW SVAR,3DWWU/LVW PVAR, EVAR, will be contained in Y iff $77/,67 elem-type name 1 type 1 default 1 name i,' default i name n type n default n! is an attribute-list declaration in dtd. Given dtd DTD, EID(dtd) returns a set of non-ground XML expressions in $ X which represent classes of XML elements having associated attributes of type ID, defined by the given dtd. Definition 8 [Mapping GetID] Based on the mapping EID, letgetid: (2 * X 2 DTD ) 2 * X be GivenX * X, dtd DTD, GetID(X,dtd) = {LGYDOXH 6LG!θ a EID(dtd), θ 6 X, aθ X} (6) Intuitively, given a subset X of * X, and dtd in DTD, GetID(X, dtd) is a set containing XML elements, each of the form <LG YDOXH=elem-id/>, where elem-id Σ* is an identity of an XML element in X. Definition 9 [id-, idref-, idrefs-reference functions] Given dtd DTD, let id dtd : 2 * X 2* X, idref dtd : 2 * X 2* X and idrefs dtd : 2 * X 2* be reference functions in ) defined in terms of the mapping GetID as follows: For each X * X, id dtd (X) = * X GetID(X, dtd) (7) idref dtd (X) = GetID(X, dtd) (8) idrefs dtd (X) = 2 GetID(X, dtd) (9) id dtd, idref dtd and idrefs dtd will be referred to as id-, idref- and idrefs-reference functions. Note that references a, id dtd, R, a, idref dtd, R and S, idrefs dtd, R will be called id, idref and idrefs references, respectively, iff a = <LG YDOXH=elem-id/> $ X of the form <LG YDOXH=elem-id/>, where elem-id (Σ* SVAR),

12 A Foundation for XML Document Databases: DTD Modeling 12 S V or S = {a 1,, a n } $ X, where a i has the form <LG YDOXH=elem-id i /> and elem-id i (Σ* SVAR), dtd DTD, R is a description on Γ specifying an XML document upon which a given XML element will be validated against. Such concepts of id and idref(s) references defined here are useful for specification of uniqueness and referential constraints defined by attributes of types ID and IDREF(S), respectively. The definition of true references in Section 2.2 shows that the conditions specified in Table 3 must hold for a particular id and idref(s) references to be true references. Table 3. Satisfiability conditions for true id and idref(s) references Reference 1. id reference g, id dtd, R, where g = <LG YDOXH=elem-id/> * X 2. idref reference g, id dtd, R, where g = <LG YDOXH=elem-id/> * X 3. idrefs reference X, id dtd, R, where X = {g 1,, g n } and g i = <LG YDOXH=elem-id i /> * X Satisfiability Conditions The value specified by elem-id does not occur as an ID of any XML elements in 0(R) There exists an XML element in 0(R) the ID of which is elem-id. For each i {1,, n}, there exists an XML element in 0(R) which is uniquely identified by elem-id i. In the sequel, let R be a description on Γ, which specifies an XML document against which a given XML element will be validated. Definition 10 [(TXDO,,V0HPEHU2I and,guhiv6solw8s constraints] Let (TXDO,,V0HPEHU2I and,guhiv6solw8s be constraint predicates in.. The constraints (TXDO,,V0HPEHU2I and,guhiv6solw8s on Γ are: 1. (TXDO(a 1, a 2 ), where a 1, a 2 $, 2.,V0HPEHU2I(a, X), where a $ X, X 2 ($X V), 3.,GUHIV6SOLW8S(<LGUHIV YDOXH=string/>, X), where string SVAR Σ * and X 2 ($X V). Such constraints are true constraints in Tcon iff they assume the forms: 1. (TXDO(g, g), where g *, 2.,V0HPEHU2I(g, X), where g * X, X 2 *X, and g X, 3.,GUHIV6SOLW8S(<LGUHIV YDOXH="string"/>, X), where string SVAR and X = {<LGYDOXH="string 1 "/>,, <LGYDOXH="string n "/>} 2 *X such that string is the white-spaced separated sequence of string 1, string 2,, string n. Intuitively, 1. For a 1, a 2 $, a constraint(txdo(a 1, a 2 ) is used to ensure that the objects a 1 and a 2 are identical. 2. For a $ X and X 2 ($X V), a constraint,v0hpehu2i(a, X) ensure that the XML element represented by a is a member of the set X. 3. For string SVAR Σ * and X 2 ($X V), a constraint,guhiv6solw8s(<lguhiv YDOXH=string/>, X) ensure that X is specialized to a set {<LGYDOXH="string 1 "/>,, <LGYDOXH="string n "/>} 2 *X such that string is the white-spaced separated sequence of string 1, string 2,, string n. Definition 11 [τ A, the attributelistdeclaration translator] Let τ A : ALD 2 XClause denote attribute-list-declaration translator. For an attribute-list declaration d ALD defined in the DTD dtd and having the form $77/,67 elem-type name 1 type 1 default 1 name n type n default n!, where n 1, τ A (d) is a set comprising m+1 clauses, where n m 2n. An algorithm describing the formulation of such m+1 clauses follows: Step 1: [Formulation of the first m clauses]

13 A Foundation for XML Document Databases: DTD Modeling 13 1: For (i=1; i n; i=i+1) 2: Let j = i : If (default i is 5(48,5(') 4: Let m = 1. 5: Formulate clause C i1, where C i1 : <dtd_elem-type_dwwu/lvw_i name i 6YDOXH 3DWWU/LVW /> <dtd_elem-type_dwwu/lvw_j 3DWWU/LVW />. 6: Else-If (default i is,03/,(') 7: Let m = 2. 8: Formulate clauses C i1 and C i2, where C i1 : <dtd_elem-type_dwwu/lvw_i 3DWWU/LVW /> <dtd_elem-type_dwwu/lvw_j 3DWWU/LVW />. C i2 : <dtd_elem-type_dwwu/lvw_i name i =6YDOXH 3DWWU/LVW /> <dtd_elem-type_dwwu/lvw_j 3DWWU/LVW />. 9: Else-If (default i is ),;(' fixed-value) 10: Let m = 1. 11: Formulate clause C i1, where C i1 : <dtd_elem-type_dwwu/lvw_i name i =6YDOXH 3DWWU/LVW /> <dtd_elem-type_dwwu/lvw_j 3DWWU/LVW />, (TXDO(<9DOXH>6YDOXH</9DOXH>, <9DOXH>fixed_value</9DOXH>). 12: Else-If (default i is fixed-value) 13: Let m = 2. 14: Formulate clauses C i1 and C i2, where C i1 : <dtd_elem-type_dwwu/lvw_i 3DWWU/LVW /> <dtd_elem-type_dwwu/lvw_j 3DWWU/LVW />. C i2 : <dtd_elem-type_dwwu/lvw_i name i =6YDOXH 3DWWU/LVW /> <dtd_elem-type_dwwu/lvw_j 3DWWU/LVW />. End-If. 15: If (type i is,') 16: For (k=1; k m; k=k+1) 17: Add the reference <LG YDOXH=6YDOXH />, f id,dtd, R to the body of clause C ik End-For. 18: Else-If (type i is,'5()) 19: For (k=1; k m; k=k+1) 20: Add the reference <LG YDOXH=6YDOXH />, f idref,dtd, R to the body of clause C ik End-For. 21: Else-If (type i is,'5()6) 22: For (k=1; k m; k=k+1) 23: Add the constraint,guhiv6solw8slguhiv YDOXH=6YDOXH />, 96HW2I,GV and the reference 96HW2I,GV, f idrefs,dtd, R to the body of clause C ik End-For. 24: Else-If (type i is an enumeration (value 1 value k )) 25: For (k=1; k m; k=k+1) 26: Add the constraint,v0hpehu2i9doxh!6ydoxh9doxh! ^9DOXH!value 1 9DOXH!,,9DOXH!value k 9DOXH!` to the body of clause C ik End-For. End-If. End-For.

14 A Foundation for XML Document Databases: DTD Modeling 14 Step 2: [Formulation of the (m+1) th clauses] 27: Let j = n : Formulate clause C n+1, where C n+1 : <dtd_elem-type_dwwu/lvw_j />. In order to determine the validity of a list of attribute-value pairs associated with an element of elem-type, these m+1 clauses, n m 2n, work in n+1 steps: In the i th step, 1 i n, the validity of the specification of the attribute name i is verified by means of clause C ij. If such specification is valid, the pair of that attribute name i and its value is removed from the list and the next step, i.e., the (i+1) th step, is taken. Otherwise, the verification fails. In the last step, i.e., the (n+1) th step, clause C n+1 verifies that no undeclared attribute can appear in the list, i.e., the list of attribute-value pairs must now be empty. Note that when there is no attribute-list declaration provided for elem-type, the following clause must be formulated instead: dtd_elem-typebdwwu/lvwb! Such clause merely restricts that elements of elem-type cannot have an associated list of attribute-value pairs. Example 3 As an example of the translation of an attribute-list declaration, let P 3 be a description obtained by translation of d 2, the declaration of attributes associated with &RQIHUHQFH element. In other words, P 3 = τ A (d 2 ) comprises the following five clauses, denoted by C 12 C 16 : C 12 : P\'7'B&RQIHUHQFHBDWWU/LVWBXUO 6YDOXH3DWWU/LVW! P\'7'B&RQIHUHQFHBDWWU/LVWB3DWWU/LVW! LGYDOXH 6YDOXH!,f id,p\'7',r C 13 : P\'7'B&RQIHUHQFHBDWWU/LVWBW\SH 6YDOXH3DWWU/LVW! P\'7'B&RQIHUHQFHBDWWU/LVWB3DWWU/LVW!,V0HPEHU2I9DOXH!6YDOXH9DOXH! ^9DOXH!,QWHUQDWLRQDO9DOXH!9DOXH!/RFDO9DOXH!` C 14 : P\'7'B&RQIHUHQFHBDWWU/LVWBFKDLU 6YDOXH3DWWU/LVW! P\'7'B&RQIHUHQFHBDWWU/LVWB3DWWU/LVW! LGYDOXH 6YDOXH!,f idref,p\'7',r C 15 : P\'7'B&RQIHUHQFHBDWWU/LVWB3DWWU/LVW! P\'7'B&RQIHUHQFHBDWWU/LVWB3DWWU/LVW! C 16 : P\'7'B&RQIHUHQFHBDWWU/LVWB! Clause C 12 specifies constraints imposed on the list of attribute-value pairs associated with a &RQIHUHQFH element. It ensures that the list contains a specification of XUO attribute, while the other attributes, represented by 3DWWU/LVW, will be additionally constrained by a clause the head of which is P\'7'B&RQIHUHQFHBDWWU/LVWB expression, i.e., clause C 13. Moreover, the id reference contained in the body of C 12 specifies that the value of XUO attribute, represented by 6XUO, must be unique with respect to description R, i.e., 6XUO does not occur as an ID for any element defined in description R. Clause C 13 imposes that a &RQIHUHQFH element must contain also a W\SH attribute the value of which must be either,qwhuqdwlrqdo or /RFDO. Clauses C 14 and C 15 then enforce that the element may optionally contain a FKDLU attribute. The idref reference contained in the body of C 14 specifies that the value of FKDLU attribute, represented by 6YDOXH, is a reference to another element defined in description R and having the same value as its,'. Clause C 16 specifies that the &RQIHUHQFH element cannot contain attributes other than the XUO, W\SH and FKDLU attributes. Definition 12 [τ DTD, the documenttypedeclaration translator] The element-type-and-attribute-list-declaration translator τ E&A : (ETD ALD) 2 XClause is: τ E&A (d) = τ E (d), if d ETD, τ E&A (d) = τ A (d), if d ALD. Let dtd = (d 1 d n ) DTD. The document-type-declaration translator τ DTD : DTD 2 XClause is:

15 A Foundation for XML Document Databases: DTD Modeling 15 τ DTD (dtd) = n i = 1 τ d E & A( i ) {dtd_elem-typebdwwu/lvwb! (/(0(17 elem-type content-model! dtd, $77/,67 elem-type name 1 type 1 default 1 name n type n default n! dtd }. Example 4 This example demonstrates a translation of P\'7' (Fig. 1). into a corresponding set of clauses. Let Q be a description obtained from translating P\'7'., Q = τ DTD (P\'7') = P 1 P 2 P 3 P 4, where P 4 comprises the six clauses C 17 C 25 : C 17 : P\'7'B1DPH! 1DPH3DWWU/LVW! 6SFGDWD 1DPH! P\'7'B1DPH! P\'7'B1DPHBDWWU/LVWB3DWWU/LVW! C 18 : P\'7'B1DPHBDWWU/LVWB! C 19 : P\'7'B2UJDQL]HU! 2UJDQL]HU3DWWU/LVW! 6SFGDWD 2UJDQL]HU! P\'7'B2UJDQL]HU! P\'7'B2UJDQL]HUBDWWU/LVWB3DWWU/LVW! C 20 : P\'7'B2UJDQL]HUBDWWU/LVWB! C 21 : P\'7'B6SRQVRU! 6SRQVRU3DWWU/LVW! 6SFGDWD 6SRQVRU! P\'7'B6SRQVRU! P\'7'B6SRQVRUBDWWU/LVWB3DWWU/LVW! C 22 : P\'7'B6SRQVRUBDWWU/LVWB! C 23 : P\'7'B3HUVRQ3DWWU/LVW! 3HUVRQ3DWWU/LVW! 6SFGDWD 3HUVRQ! P\'7'B3HUVRQ! P\'7'B3HUVRQBDWWU/LVWB3DWWU/LVW! C 24 : P\'7'B3HUVRQBDWWU/LVWBVVQ 6YDOXH3DWWU/LVW! P\'7'B3HUVRQBDWWU/LVWB3DWWU/LVW! LGYDOXH 6YDOXH!,f id,p\'7',r C 25 : P\'7'B3HUVRQBDWWU/LVWB! 3.3 DTD Translation Optimization Since this is an attempt to outline a general translation scheme for all possible XML DTDs, it may be pointed out that the number of DTD clauses obtained from modeling some particular DTD is rather large and could lead to an inefficient approach. This limitation can be alleviated by application of the optimization algorithm (cf. Appendix) which rewrites and removes redundant DTD clauses. For example, clauses C 24 and C 25 of Example 4 can be replaced by the clause: P\'7'B3HUVRQBDWWU/LVWBVVQ 6YDOXH! LGYDOXH 6YDOXH!,f id,p\'7',r Appendix also gives a description Q obtained by application of the developed optimization algorithm to description Q of Example 4.

16 A Foundation for XML Document Databases: DTD Modeling 16 4 XML Element Validity Checking Given an XML DTD represented by a description P, in order to determine the validity of an XML element, say x, with respect to such DTD, a single clause D is formulated: D: a <dtd_elem-type> x </dtd_elem-type>. (10) The head of D, represented by a, is an XML expression in $ X which will be derived if the given element x is valid. The body of D contains a single XML expression with a tag name of the form dtd_elem-type, where dtd is the name of the DTD to be checked and elem-type the type of the validated element. Such a body expression contains the validated element x as its only child element. If x is valid, the element represented by a will be derived from or contained in the meaning of the description (P {D}). More precisely, to say that x is valid, such a description (P {D}) must be able to be transformed equivalently and successively into the description (P {D }), where D is an ground unit clause of the form V : aθ. (11) Example 5 Referring to description Q representing P\'7', in order to determine whether the &RQIHUHQFH element: &RQIHUHQFHXUO KWWSZZZFVDLWDFWKVPDUWQHWW\SH,QWHUQDWLRQDOFKDLU! 1DPH!6PDUW1HW1DPH! 2UJDQL]HU!$VLDQ,QVWLWXWHRI7HFKQRORJ\2UJDQL]HU! 2UJDQL]HU!,QWHUQDWLRQDO)HGHUDWLRQ,QIRUPDWLRQ3URFHVVLQJ2UJDQL]HU! 2UJDQL]HU!7HOHFRPPXQLFDWLRQRI7KDLODQG2UJDQL]HU! &RQIHUHQFH! conforms to P\'7' or not, the following clause is formulated: D: 9DOLGB;0/XUO KWWSZZZFVDLWDFWKVPDUWQHW! P\'7'B&RQIHUHQFH! &RQIHUHQFHXUO KWWSZZZFVDLWDFWKVPDUWQHW W\SH,QWHUQDWLRQDOFKDLU! 1DPH!6PDUW1HW1DPH! 2UJDQL]HU!$VLDQ,QVWLWXWHRI7HFKQRORJ\2UJDQL]HU! 2UJDQL]HU!,QWHUQDWLRQDO)HGHUDWLRQ,QIRUPDWLRQ3URFHVVLQJ2UJDQL]HU! 2UJDQL]HU!7HOHFRPPXQLFDWLRQRI7KDLODQG2UJDQL]HU! &RQIHUHQFH! P\'7'B&RQIHUHQFH! Suppose that the referred description R, which represents an XML document to be validated against, comprises the two clauses E 1 and E 2 : E 1 : &RQIHUHQFHXUO KWWSZZZFVDLWDFWKLMZGOW\SH,QWHUQDWLRQDO! 1DPH!,QWHUQDWLRQDO-RLQW:RUNVKRSRQ'LJLWDO/LEUDULHV1DPH! 2UJDQL]HU!$VLDQ,QVWLWXWHRI7HFKQRORJ\2UJDQL]HU! &RQIHUHQFH! E 2 : 3HUVRQVVQ!9LODV:XZRQJVH3HUVRQ! Since the description (Q {D}) can be successively transformed into the description (Q {D }), where D : 9DOLGB;0/XUO KWWSZZZFVDLWDFWKVPDUWQHW!, the given &RQIHUHQFH element is valid with respect to P\'7'. Validating other &RQIHUHQFH elements is similar. 5 Conclusions An approach to the determinationof the grammatical correctness of a given XML element/document with respect to a particular DTD has been developed, by incorporation of the expressiveness and efficient computational mechanism facilitated by Declarative Description theory and Equivalent Transformation (ET) paradigm, respectively. It represents an XML DTD as a corresponding set of DTD clauses, which describe valid elements content models as well as restrictions on associated lists of attributes, e.g., uniqueness, referential and type

17 A Foundation for XML Document Databases: DTD Modeling 17 constraints. Thus, the developed approach is complete with respect to XML DTD modeling and document validating. Research on an extension of XML-ETC Engine, a Web-based XML processor developed under Equivalent Transformation Compiler (ETC) environment, by integration of supports for DTD modeling and validation is continuing. Moreover, formalisms for DTD transformation and combination, e.g., union, concatenation, intersection and complement, are envisaged, in order to provide a complete support for DTD and document processing. Acknowledgement This work was supported in part by Thailand Research Fund. References 1. Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, CA (2000) 2. Akama, K.: Declarative Semantics of Logic Programs on Parameterized Representation Systems. Advances in Software Science and Technology, Vol. 5 (1993) Akama, K.: Declarative Description with References and Equivalent Transformation of Negative References. Technical Report, Department of Information Engineering, Hokkaido University, Japan (1998) 4. Akama, K., Shimitsu, T., Miyamoto, E.: Solving Problems by Equivalent Transformation of Declarative Programs. Journal of the Japanese Society of Artificial Intelligence, Vol. 13 No. 6 (1998) (in Japanese) 5. Akama, K., Anutariya, C., Wuwongse, V. and Nantajeewarawat, E.: A Foundation for XML Document Databases: Query Formulation and Evaluation. Technical Report, Computer Science and Information Management Program, Asian Institute of Technology, Thailand (1999) 6. Anutariya, C., Wuwongse, V., Nantajeewarawat, E. and Akama, K.: Towards a Foundation for XML Document Databases. Proceedings of 1 st International Conference on Electronic Commerce and Web Technologies (EC-Web 2000), London, UK. Lecture Notes in Computer Science, Springer Verlag (2000) (to appear) 7. Apparao, V., Et Al.: Document Object Model (DOM) Level 1 Specification Version 1.0, October W3C Recommendation (1998) Available at 8. Beech, C., Malhotra, A., Rys, M.: A Formal Data Model and Algebra for XML. W3C XML Query Working Group Note, September 1999 (1999) 9. Bray, T., Paoli, J., Sperberg-McQueen, C.M.: Extensible Markup Language (XML) 1.0, February W3C Recommendation. (1998) Available at Buneman, P., Deutsch, A., Tan, W.C.: A Deterministic Model for Semi-Structured Data. Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats (1998) Available at Buneman, P., Fan, W., Weinstein, S.: Interaction between Path and Type Constraints. Technical Report, Department of Computer and Information Science, University of Pennsylvania (1998) Available at ftp://ftp.cis.upenn.edu/pub/papers/db-research/tr9816.ps.gz 12. Fernández, M., Siméon, J., Suciu, C. and Wadler, P.: A Data Model and Algebra for XML Query. Draft Manuscript (1999) Available at Goldman, R., McHugh, J., Widom, J.: From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. Proceedings of the 2nd International Workshop on the Web and Databases (WebDB '99), Philadelphia, Pennsylvania (1999) Available at Makoto, M.: Forest-regular Languages and Tree-regular Languages. Technical Report, Fuji Xerox Information Systems, (1995) Available at Makoto, M.: Transformation of Documents and Schemas by Patterns and Contextual Conditions. Principles of Document Processing, Proceedings of the Third International Workshop, Vol (1997) 16. Makoto, M.: DTD Transformation by Patterns and Contextual Conditions. SGML/XML '97 Conference Proceedings (1997) 17. Wuwongse, V., Akama, K., Anutariya, C. and Nantajeewarawat, E.: A Foundation for XML Document Databases: Data Model. Technical Report, Computer Science and Information Management Program, Asian Institute of Technology, Thailand (1999)

18 A Foundation for XML Document Databases: DTD Modeling 18 Appendix An optimization algorithm for the developed DTD translation scheme is sketched. Let D'7' be an XML DTD and P = {C 1,, C m } a description obtained by translation of D'7' into a corresponding set of DTD clauses, i.e., P = τ DTD (D'7'). An algorithm which can reduce the complexity of such a description P by removal and rewriting of some redundant DTD clauses contained in P follows: 1: Let Q = {C 1 θ 1,, C m θ m }, where θ 1,, θ m are specializations in 6 which rename variables in C 1,, C m, respectively, such that C 1 θ 1,, C m θ m do not have any variable name in common. 2: Repeat 3: Find a clause C = (H B 1, B 2,..., B n ) Q that satisfies the following two conditions: head(c) s tag name has the form D'7'_elem-type_level or D'7'_elem-type_DWWU/LVW_level where elem-type is an element type declared in D'7' and level is a sequence of number separated by underscores, e.g., BB. There is no clause D Q such that head(d) s tag name is the same as head(c) s tag name. 4: If (such a clause C (in Step 2) is found) 5: Let Q = Q {C}, i.e., remove clause C from description Q. 6: For each (clause D = (H B 1, B 2,..., B i-1, B i, B i+1,, B u) Q, where u 0) 7: If (there exists θ 6 such that B iθ = H) 8: Let D = (H θ B 1θ, B 2θ,..., B i-1θ, B 1, B 2,..., B n, B i+1θ,, B uθ). 9: Let Q = Q {D} {D }, i.e., replace clause D in Q by D. End-If. End-For-each. End-If. 10: Until (such a clause C is not found in Q). Based on the above algorithm, let Q be a description obtained by optimization of description Q of Example 4 and containing the following 13 DTD clauses, denoted by C 1 C 13: C 1: P\'7'B&RQIHUHQFH! &RQIHUHQFHXUO 6YDOXHW\SH 6YDOXH3DWWU/LVW! &RQIHUHQFH! P\'7'B&RQIHUHQFH! P\'7'B&RQIHUHQFHBDWWU/LVWB3DWWU/LVW! LGYDOXH 6YDOXH!,f id,p\'7',r,v0hpehu2i9doxh!6ydoxh9doxh! ^9DOXH!,QWHUQDWLRQDO9DOXH! 9DOXH!/RFDO9DOXH!` P\'7'B1DPH! P\'7'B1DPH! P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! C 2: P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB!

19 A Foundation for XML Document Databases: DTD Modeling 19 C 3: P\'7'B&RQIHUHQFHBB! P\'7'B&RQIHUHQFHBB! C 4: P\'7'B2UJDQL]HU! P\'7'B2UJDQL]HU! C 5: P\'7'B2UJDQL]HU! P\'7'B2UJDQL]HU! C 6: P\'7'B6SRQVRU! P\'7'B6SRQVRU! C 7: C 8: P\'7'B&RQIHUHQFHBDWWU/LVWBFKDLU 6YDOXH! LGYDOXH 6YDOXH!,f idref,p\'7',r C 9: P\'7'B&RQIHUHQFHBDWWU/LVWB! C 10: P\'7'B1DPH! 1DPH! 6SFGDWD 1DPH! P\'7'B1DPH! C 11: P\'7'B2UJDQL]HU! 2UJDQL]HU! 6SFGDWD 2UJDQL]HU! P\'7'B2UJDQL]HU! C 12: P\'7'B6SRQVRU! 6SRQVRU! 6SFGDWD 6SRQVRU! P\'7'B6SRQVRU! C 13: P\'7'B3HUVRQVVQ 6YDOXH! 3HUVRQ! 6SFGDWD 3HUVRQ! P\'7'B3HUVRQ! LGYDOXH 6YDOXH!,f id,p\'7',r

XML Declarative Description with Negative Constraints

XML Declarative Description with Negative Constraints XML Declarative Description with Negative Constraints Chutiporn Anutariya 1, Vilas Wuwongse 2 and Kiyoshi Akama 3 1 Department of Telematics, Norwegian University of Science and Technology 7491 Trondheim,

More information

Table 1. Variable Types. Variable Type Variable Names Beginning with Instantiation to N-variables: Name-variables $N Element types or attribute names

Table 1. Variable Types. Variable Type Variable Names Beginning with Instantiation to N-variables: Name-variables $N Element types or attribute names RDF Declarative Description (RDD): A Language for Metadata Chutiporn Anutariya and Vilas Wuwongse Computer Science and Information Management Program Asian Institute of Technology, Pathumtani 12120, Thailand

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

Specialization-based parallel Processing without Memo-trees

Specialization-based parallel Processing without Memo-trees Specialization-based parallel Processing without Memo-trees Hidemi Ogasawara, Kiyoshi Akama, and Hiroshi Mabuchi Abstract The purpose of this paper is to propose a framework for constructing correct parallel

More information

Conceptual Modeling of Dynamic Interactive Systems Using the Equivalent Transformation Framework

Conceptual Modeling of Dynamic Interactive Systems Using the Equivalent Transformation Framework 7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 253 Conceptual Modeling of Dynamic Interactive Systems Using the Equivalent Transformation Framework

More information

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS SRIVANI SARIKONDA 1 PG Scholar Department of CSE P.SANDEEP REDDY 2 Associate professor Department of CSE DR.M.V.SIVA PRASAD 3 Principal Abstract:

More information

Towards the Preservation of Referential Constraints in XML Data Transformation for Integration

Towards the Preservation of Referential Constraints in XML Data Transformation for Integration Towards the Preservation of Referential Constraints in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab School of Computer and Information Science University

More information

Slides for Faculty Oxford University Press All rights reserved.

Slides for Faculty Oxford University Press All rights reserved. Oxford University Press 2013 Slides for Faculty Assistance Preliminaries Author: Vivek Kulkarni vivek_kulkarni@yahoo.com Outline Following topics are covered in the slides: Basic concepts, namely, symbols,

More information

Solutions to Homework 10

Solutions to Homework 10 CS/Math 240: Intro to Discrete Math 5/3/20 Instructor: Dieter van Melkebeek Solutions to Homework 0 Problem There were five different languages in Problem 4 of Homework 9. The Language D 0 Recall that

More information

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong.

Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee. The Chinese University of Hong Kong. Algebraic Properties of CSP Model Operators? Y.C. Law and J.H.M. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong SAR, China fyclaw,jleeg@cse.cuhk.edu.hk

More information

Extending E-R for Modelling XML Keys

Extending E-R for Modelling XML Keys Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and

More information

DSD: A Schema Language for XML

DSD: A Schema Language for XML DSD: A Schema Language for XML Nils Klarlund, AT&T Labs Research Anders Møller, BRICS, Aarhus University Michael I. Schwartzbach, BRICS, Aarhus University Connections between XML and Formal Methods XML:

More information

Element Algebra. 1 Introduction. M. G. Manukyan

Element Algebra. 1 Introduction. M. G. Manukyan Element Algebra M. G. Manukyan Yerevan State University Yerevan, 0025 mgm@ysu.am Abstract. An element algebra supporting the element calculus is proposed. The input and output of our algebra are xdm-elements.

More information

A Logical Framework for XML Reference Specification

A Logical Framework for XML Reference Specification A Logical Framework for XML Reference Specification C. Combi, A. Masini, B. Oliboni, and M. Zorzi Department of Computer Science University of Verona Ca Vignal 2, Strada le Grazie 15, 37134 Verona, Italy

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 49 Plan of the course 1 Relational databases 2 Relational database design 3 Conceptual database design 4

More information

A Commit Scheduler for XML Databases

A Commit Scheduler for XML Databases A Commit Scheduler for XML Databases Stijn Dekeyser and Jan Hidders University of Antwerp Abstract. The hierarchical and semistructured nature of XML data may cause complicated update-behavior. Updates

More information

Approximate Functional Dependencies for XML Data

Approximate Functional Dependencies for XML Data Approximate Functional Dependencies for XML Data Fabio Fassetti and Bettina Fazzinga DEIS - Università della Calabria Via P. Bucci, 41C 87036 Rende (CS), Italy {ffassetti,bfazzinga}@deis.unical.it Abstract.

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9 XML databases Jan Chomicki University at Buffalo Jan Chomicki (University at Buffalo) XML databases 1 / 9 Outline 1 XML data model 2 XPath 3 XQuery Jan Chomicki (University at Buffalo) XML databases 2

More information

Relational Database: The Relational Data Model; Operations on Database Relations

Relational Database: The Relational Data Model; Operations on Database Relations Relational Database: The Relational Data Model; Operations on Database Relations Greg Plaxton Theory in Programming Practice, Spring 2005 Department of Computer Science University of Texas at Austin Overview

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington CS330 Lecture April 8, 2003 1 Overview From HTML to XML DTDs Querying XML: XPath Transforming XML: XSLT

More information

Data Analytics and Boolean Algebras

Data Analytics and Boolean Algebras Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com

More information

Indexing XML Data with ToXin

Indexing XML Data with ToXin Indexing XML Data with ToXin Flavio Rizzolo, Alberto Mendelzon University of Toronto Department of Computer Science {flavio,mendel}@cs.toronto.edu Abstract Indexing schemes for semistructured data have

More information

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn

More information

Indexing Keys in Hierarchical Data

Indexing Keys in Hierarchical Data University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science January 2001 Indexing Keys in Hierarchical Data Yi Chen University of Pennsylvania Susan

More information

A Universal Model for XML Information Retrieval

A Universal Model for XML Information Retrieval A Universal Model for XML Information Retrieval Maria Izabel M. Azevedo 1, Lucas Pantuza Amorim 2, and Nívio Ziviani 3 1 Department of Computer Science, State University of Montes Claros, Montes Claros,

More information

3. Relational Data Model 3.5 The Tuple Relational Calculus

3. Relational Data Model 3.5 The Tuple Relational Calculus 3. Relational Data Model 3.5 The Tuple Relational Calculus forall quantification Syntax: t R(P(t)) semantics: for all tuples t in relation R, P(t) has to be fulfilled example query: Determine all students

More information

Aspects of an XML-Based Phraseology Database Application

Aspects of an XML-Based Phraseology Database Application Aspects of an XML-Based Phraseology Database Application Denis Helic 1 and Peter Ďurčo2 1 University of Technology Graz Insitute for Information Systems and Computer Media dhelic@iicm.edu 2 University

More information

XML Information Set. Working Draft of May 17, 1999

XML Information Set. Working Draft of May 17, 1999 XML Information Set Working Draft of May 17, 1999 This version: http://www.w3.org/tr/1999/wd-xml-infoset-19990517 Latest version: http://www.w3.org/tr/xml-infoset Editors: John Cowan David Megginson Copyright

More information

An Analysis of Approaches to XML Schema Inference

An Analysis of Approaches to XML Schema Inference An Analysis of Approaches to XML Schema Inference Irena Mlynkova irena.mlynkova@mff.cuni.cz Charles University Faculty of Mathematics and Physics Department of Software Engineering Prague, Czech Republic

More information

Appendix 1. Description Logic Terminology

Appendix 1. Description Logic Terminology Appendix 1 Description Logic Terminology Franz Baader Abstract The purpose of this appendix is to introduce (in a compact manner) the syntax and semantics of the most prominent DLs occurring in this handbook.

More information

A Bottom-up Strategy for Query Decomposition

A Bottom-up Strategy for Query Decomposition A Bottom-up Strategy for Query Decomposition Le Thi Thu Thuy, Doan Dai Duong, Virendrakumar C. Bhavsar and Harold Boley Faculty of Computer Science, University of New Brunswick Fredericton, New Brunswick,

More information

Abstract. Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering (ISMSE 04) /04 $20.

Abstract. Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering (ISMSE 04) /04 $20. Spatial and Temporal Reasoning in Multimedia Information Retrieval and Composition with XDD Napat Sukthong CS&MIS Program, Faculty of Informatics, Mahararakham University, Mahasarakham 44150 THAILAND napatsukthong@hotmail.com

More information

Appendix 1. Description Logic Terminology

Appendix 1. Description Logic Terminology Appendix 1 Description Logic Terminology Franz Baader Abstract The purpose of this appendix is to introduce (in a compact manner) the syntax and semantics of the most prominent DLs occurring in this handbook.

More information

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m. CS 6110 S18 Lecture 8 Structural Operational Semantics and IMP Today we introduce a very simple imperative language, IMP, along with two systems of rules for evaluation called small-step and big-step semantics.

More information

Discovering XML Keys and Foreign Keys in Queries

Discovering XML Keys and Foreign Keys in Queries Discovering XML Keys and Foreign Keys in Queries Martin Nečaský, Irena Mlýnková Department of Software Engineering, Charles University in Prague, Czech Republic {martin.necasky,irena.mlynkova}@mff.cuni.cz

More information

Lexical Analysis. Introduction

Lexical Analysis. Introduction Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

More information

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and Chapter 6 The Relational Algebra and Relational Calculus Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 Outline Unary Relational Operations: SELECT and PROJECT Relational

More information

Generation of Correct Parallel Programs Guided by Rewriting Rules

Generation of Correct Parallel Programs Guided by Rewriting Rules Generation of Correct Parallel Programs Guided by Rewriting Rules Hidekatsu Koike Faculty of Social Information Sapporo Gakuin University 11-banchi Bunkyoudai, Ebetsu, Hokkaido 069-8555, Japan Email: koike@sgu.ac.jp

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

RSL Reference Manual

RSL Reference Manual RSL Reference Manual Part No.: Date: April 6, 1990 Original Authors: Klaus Havelund, Anne Haxthausen Copyright c 1990 Computer Resources International A/S This document is issued on a restricted basis

More information

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5

SDPL : XML Basics 2. SDPL : XML Basics 1. SDPL : XML Basics 4. SDPL : XML Basics 3. SDPL : XML Basics 5 2 Basics of XML and XML documents 2.1 XML and XML documents Survivor's Guide to XML, or XML for Computer Scientists / Dummies 2.1 XML and XML documents 2.2 Basics of XML DTDs 2.3 XML Namespaces XML 1.0

More information

Using Relational Database metadata to generate enhanced XML structure and document Abstract 1. Introduction

Using Relational Database metadata to generate enhanced XML structure and document Abstract 1. Introduction Using Relational Database metadata to generate enhanced XML structure and document Sherif Sakr - Mokhtar Boshra Faculty of Computers and Information Cairo University {sakr,mboshra}@cu.edu.eg Abstract Relational

More information

THEORY OF COMPUTATION

THEORY OF COMPUTATION THEORY OF COMPUTATION UNIT-1 INTRODUCTION Overview This chapter begins with an overview of those areas in the theory of computation that are basic foundation of learning TOC. This unit covers the introduction

More information

Ian Kenny. November 28, 2017

Ian Kenny. November 28, 2017 Ian Kenny November 28, 2017 Introductory Databases Relational Algebra Introduction In this lecture we will cover Relational Algebra. Relational Algebra is the foundation upon which SQL is built and is

More information

2.1 Sets 2.2 Set Operations

2.1 Sets 2.2 Set Operations CSC2510 Theoretical Foundations of Computer Science 2.1 Sets 2.2 Set Operations Introduction to Set Theory A set is a structure, representing an unordered collection (group, plurality) of zero or more

More information

On The Formal Specification of Database Schema Constraints

On The Formal Specification of Database Schema Constraints On The Formal Specification of Database Schema Constraints Ivan Luković University of Novi Sad, Faculty of Technical Sciences Novi Sad, Serbia and Montenegro ivan@uns.ns.ac.yu Sonja Ristić Higher Business

More information

Folder(Inbox) Message Message. Body

Folder(Inbox) Message Message. Body Rening OEM to Improve Features of Query Languages for Semistructured Data Pavel Hlousek Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic Abstract. Semistructured data can

More information

Introduction to Semistructured Data and XML

Introduction to Semistructured Data and XML Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of Washington Database Management Systems, R. Ramakrishnan 1 How the Web is Today HTML documents often

More information

XI International PhD Workshop OWD 2009, October Fuzzy Sets as Metasets

XI International PhD Workshop OWD 2009, October Fuzzy Sets as Metasets XI International PhD Workshop OWD 2009, 17 20 October 2009 Fuzzy Sets as Metasets Bartłomiej Starosta, Polsko-Japońska WyŜsza Szkoła Technik Komputerowych (24.01.2008, prof. Witold Kosiński, Polsko-Japońska

More information

Structure of Abstract Syntax trees for Colored Nets in PNML

Structure of Abstract Syntax trees for Colored Nets in PNML Structure of Abstract Syntax trees for Colored Nets in PNML F. Kordon & L. Petrucci Fabrice.Kordon@lip6.fr Laure.Petrucci@lipn.univ-paris13.fr version 0.2 (draft) June 26, 2004 Abstract Formalising the

More information

Integrating Path Index with Value Index for XML data

Integrating Path Index with Value Index for XML data Integrating Path Index with Value Index for XML data Jing Wang 1, Xiaofeng Meng 2, Shan Wang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080 Beijing, China cuckoowj@btamail.net.cn

More information

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function

TAFL 1 (ECS-403) Unit- V. 5.1 Turing Machine. 5.2 TM as computer of Integer Function TAFL 1 (ECS-403) Unit- V 5.1 Turing Machine 5.2 TM as computer of Integer Function 5.2.1 Simulating Turing Machine by Computer 5.2.2 Simulating Computer by Turing Machine 5.3 Universal Turing Machine 5.4

More information

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See   for conditions on re-use Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Data Definition! Basic Query Structure! Set Operations! Aggregate Functions! Null Values!

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Massachusetts Institute of Technology Language Definition Problem How to precisely define language Layered structure

More information

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

MIT Specifying Languages with Regular Expressions and Context-Free Grammars MIT 6.035 Specifying Languages with Regular essions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem How to precisely

More information

Monitoring Interfaces for Faults

Monitoring Interfaces for Faults Monitoring Interfaces for Faults Aleksandr Zaks RV 05 - Fifth Workshop on Runtime Verification Joint work with: Amir Pnueli, Lenore Zuck Motivation Motivation Consider two components interacting with each

More information

Lecturer 2: Spatial Concepts and Data Models

Lecturer 2: Spatial Concepts and Data Models Lecturer 2: Spatial Concepts and Data Models 2.1 Introduction 2.2 Models of Spatial Information 2.3 Three-Step Database Design 2.4 Extending ER with Spatial Concepts 2.5 Summary Learning Objectives Learning

More information

LECTURE 8: SETS. Software Engineering Mike Wooldridge

LECTURE 8: SETS. Software Engineering Mike Wooldridge LECTURE 8: SETS Mike Wooldridge 1 What is a Set? The concept of a set is used throughout mathematics; its formal definition matches closely our intuitive understanding of the word. Definition: A set is

More information

Compilers and computer architecture From strings to ASTs (2): context free grammars

Compilers and computer architecture From strings to ASTs (2): context free grammars 1 / 1 Compilers and computer architecture From strings to ASTs (2): context free grammars Martin Berger October 2018 Recall the function of compilers 2 / 1 3 / 1 Recall we are discussing parsing Source

More information

8. Relational Calculus (Part II)

8. Relational Calculus (Part II) 8. Relational Calculus (Part II) Relational Calculus, as defined in the previous chapter, provides the theoretical foundations for the design of practical data sub-languages (DSL). In this chapter, we

More information

Grammar vs. Rules. Diagnostics in XML Document Validation. Petr Nálevka

Grammar vs. Rules. Diagnostics in XML Document Validation. Petr Nálevka Grammar vs. Rules Diagnostics in XML Document Validation Petr Nálevka University of Economics, Prague Dept. of Information and Knowledge Engineering petr@nalevka.com http://nalevka.com This presentation

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

A Formalization of Transition P Systems

A Formalization of Transition P Systems Fundamenta Informaticae 49 (2002) 261 272 261 IOS Press A Formalization of Transition P Systems Mario J. Pérez-Jiménez and Fernando Sancho-Caparrini Dpto. Ciencias de la Computación e Inteligencia Artificial

More information

Compiler Construction

Compiler Construction Compiler Construction Exercises 1 Review of some Topics in Formal Languages 1. (a) Prove that two words x, y commute (i.e., satisfy xy = yx) if and only if there exists a word w such that x = w m, y =

More information

X-KIF New Knowledge Modeling Language

X-KIF New Knowledge Modeling Language Proceedings of I-MEDIA 07 and I-SEMANTICS 07 Graz, Austria, September 5-7, 2007 X-KIF New Knowledge Modeling Language Michal Ševčenko (Czech Technical University in Prague sevcenko@vc.cvut.cz) Abstract:

More information

Formal languages and computation models

Formal languages and computation models Formal languages and computation models Guy Perrier Bibliography John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman - Introduction to Automata Theory, Languages, and Computation - Addison Wesley, 2006.

More information

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure.

SQL. Lecture 4 SQL. Basic Structure. The select Clause. The select Clause (Cont.) The select Clause (Cont.) Basic Structure. SL Lecture 4 SL Chapter 4 (Sections 4.1, 4.2, 4.3, 4.4, 4.5, 4., 4.8, 4.9, 4.11) Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Modification of the Database

More information

CHAPTER 3 LITERATURE REVIEW

CHAPTER 3 LITERATURE REVIEW 20 CHAPTER 3 LITERATURE REVIEW This chapter presents query processing with XML documents, indexing techniques and current algorithms for generating labels. Here, each labeling algorithm and its limitations

More information

Introduction to Database Systems CSE 414

Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 14-15: XML CSE 414 - Spring 2013 1 Announcements Homework 4 solution will be posted tomorrow Midterm: Monday in class Open books, no notes beyond one hand-written

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

2.2 Syntax Definition

2.2 Syntax Definition 42 CHAPTER 2. A SIMPLE SYNTAX-DIRECTED TRANSLATOR sequence of "three-address" instructions; a more complete example appears in Fig. 2.2. This form of intermediate code takes its name from instructions

More information

Figure 1.1: This is an illustration of a generic set and its elements.

Figure 1.1: This is an illustration of a generic set and its elements. Chapter 1 Mathematical Review et theory is now generally accepted as the foundation of modern mathematics, and it plays an instrumental role in the treatment of probability. Unfortunately, a simple description

More information

Topic Maps Reference Model, version 6.0

Topic Maps Reference Model, version 6.0 Topic Maps Reference Model, 13250-5 version 6.0 Patrick Durusau Steven R. Newcomb July 13, 2005 This is a working draft of the Topic Maps Reference Model. It focuses on the integration of Robert Barta

More information

INCONSISTENT DATABASES

INCONSISTENT DATABASES INCONSISTENT DATABASES Leopoldo Bertossi Carleton University, http://www.scs.carleton.ca/ bertossi SYNONYMS None DEFINITION An inconsistent database is a database instance that does not satisfy those integrity

More information

XGA XML Grammar for JAVA

XGA XML Grammar for JAVA XGA XML Grammar for JAVA Reinhard CERNY Student at the Technical University of Vienna e0025952@student.tuwien.ac.at Abstract. Today s XML editors provide basic functionality such as creating, editing and

More information

Computer Science Technical Report

Computer Science Technical Report Computer Science Technical Report Feasibility of Stepwise Addition of Multitolerance to High Atomicity Programs Ali Ebnenasir and Sandeep S. Kulkarni Michigan Technological University Computer Science

More information

XML ELECTRONIC SIGNATURES

XML ELECTRONIC SIGNATURES XML ELECTRONIC SIGNATURES Application according to the international standard XML Signature Syntax and Processing DI Gregor Karlinger Graz University of Technology Institute for Applied Information Processing

More information

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Grammars Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

More information

WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES. Introduction. Production rules. Christian de Sainte Marie ILOG

WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES. Introduction. Production rules. Christian de Sainte Marie ILOG WHY WE NEED AN XML STANDARD FOR REPRESENTING BUSINESS RULES Christian de Sainte Marie ILOG Introduction We are interested in the topic of communicating policy decisions to other parties, and, more generally,

More information

The XQuery Data Model

The XQuery Data Model The XQuery Data Model 9. XQuery Data Model XQuery Type System Like for any other database query language, before we talk about the operators of the language, we have to specify exactly what it is that

More information

On The Theoretical Foundation for Data Flow Analysis in Workflow Management

On The Theoretical Foundation for Data Flow Analysis in Workflow Management Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2005 Proceedings Americas Conference on Information Systems (AMCIS) 2005 On The Theoretical Foundation for Data Flow Analysis in

More information

1.3. Conditional expressions To express case distinctions like

1.3. Conditional expressions To express case distinctions like Introduction Much of the theory developed in the underlying course Logic II can be implemented in a proof assistant. In the present setting this is interesting, since we can then machine extract from a

More information

Semistructured Data and XML

Semistructured Data and XML Semistructured Data and XML Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Structured Data The logical models we've covered thus far all use some type of schema to define the structure

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5 1 Not all languages are regular So what happens to the languages which are not regular? Can we still come up with a language recognizer?

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. Introduction to XML Yanlei Diao UMass Amherst April 17, 2008 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly

More information

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,,

Definition: A context-free grammar (CFG) is a 4- tuple. variables = nonterminals, terminals, rules = productions,, CMPSCI 601: Recall From Last Time Lecture 5 Definition: A context-free grammar (CFG) is a 4- tuple, variables = nonterminals, terminals, rules = productions,,, are all finite. 1 ( ) $ Pumping Lemma for

More information

3.7 Denotational Semantics

3.7 Denotational Semantics 3.7 Denotational Semantics Denotational semantics, also known as fixed-point semantics, associates to each programming language construct a well-defined and rigorously understood mathematical object. These

More information

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes

Announcements. Relational Model & Algebra. Example. Relational data model. Example. Schema versus instance. Lecture notes Announcements Relational Model & Algebra CPS 216 Advanced Database Systems Lecture notes Notes version (incomplete) available in the morning on the day of lecture Slides version (complete) available after

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Information Systems. Relational Databases. Nikolaj Popov

Information Systems. Relational Databases. Nikolaj Popov Information Systems Relational Databases Nikolaj Popov Research Institute for Symbolic Computation Johannes Kepler University of Linz, Austria popov@risc.uni-linz.ac.at Outline The Relational Model (Continues

More information

Journal of Computer and System Sciences

Journal of Computer and System Sciences Journal of Computer and System Sciences 78 (2012) 583 609 Contents lists available at SciVerse ScienceDirect Journal of Computer and System Sciences www.elsevier.com/locate/jcss A structural/temporal query

More information

Rigidity, connectivity and graph decompositions

Rigidity, connectivity and graph decompositions First Prev Next Last Rigidity, connectivity and graph decompositions Brigitte Servatius Herman Servatius Worcester Polytechnic Institute Page 1 of 100 First Prev Next Last Page 2 of 100 We say that a framework

More information

A Distributed Query Engine for XML-QL

A Distributed Query Engine for XML-QL A Distributed Query Engine for XML-QL Paramjit Oberoi and Vishal Kathuria University of Wisconsin-Madison {param,vishal}@cs.wisc.edu Abstract: This paper describes a distributed Query Engine for executing

More information

Database Theory VU , SS Codd s Theorem. Reinhard Pichler

Database Theory VU , SS Codd s Theorem. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2011 3. Codd s Theorem Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 29 March, 2011 Pichler 29 March,

More information

Querying Spatiotemporal XML Using DataFoX

Querying Spatiotemporal XML Using DataFoX Querying Spatiotemporal XML Using DataFoX Yi Chen Peter Revesz Computer Science and Engineering Department University of Nebraska-Lincoln Lincoln, NE 68588, USA {ychen,revesz}@cseunledu Abstract We describe

More information