XML Document Modeling from a Conceptual Schema

Similar documents
DATA MODELS FOR SEMISTRUCTURED DATA

A Workload-aware Approach for Optimizing the XML Schema Design Trade-off

Chapter 8: Enhanced ER Model

Enhanced Entity-Relationship (EER) Modeling

Chapter 9: Relational DB Design byer/eer to Relational Mapping Relational Database Design Using ER-to- Relational Mapping Mapping EER Model

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.

Extending E-R for Modelling XML Keys

DESIGN AND EVALUATION OF A GENERIC METHOD FOR CREATING XML SCHEMA. 1. Introduction

Chapter 4. Enhanced Entity- Relationship Modeling. Enhanced-ER (EER) Model Concepts. Subclasses and Superclasses (1)

NOTES ON OBJECT-ORIENTED MODELING AND DESIGN

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Systems:;-'./'--'.; r. Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington

LELCTURE 4: ENHANCED ENTITY-RELATIONSHIP MODELING (EER)

FUNDAMENTALS OF. Database S wctpmc. Shamkant B. Navathe College of Computing Georgia Institute of Technology. Addison-Wesley

Intro to DB CHAPTER 6

XCase - A Tool for Conceptual XML Data Modeling

DATABASE DESIGN I - 1DL300

ER to Relational Mapping

Entity Relationship Data Model. Slides by: Shree Jaswal

DATABASE DESIGN I - 1DL300

Chapter 2: Entity-Relationship Model


UML-Based Conceptual Modeling of Pattern-Bases

DATABASDESIGN FÖR INGENJÖRER F

Chapter (4) Enhanced Entity-Relationship and Object Modeling

COIS Databases

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 4-1

Relational DB Design by ER- and EER-to-Relational Mapping Design & Analysis of Database Systems

Using XSEM for Modeling XML Interfaces of Services in SOA

Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques. Fundamentals, Design, and Implementation, 9/e

Generating XML/GML Schemas from Geographic Conceptual Schemas

CEN/ISSS WS/eCAT. Terminology for ecatalogues and Product Description and Classification

Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques. Fundamentals, Design, and Implementation, 9/e

DATABASTEKNIK - 1DL116

Knowledge Representation, Ontologies, and the Semantic Web

Database Management

Data Modeling Using the Entity-Relationship (ER) Model

Outline. Note 1. CSIE30600 Database Systems ER/EER to Relational Mapping 2

Chapter 17. Methodology Logical Database Design for the Relational Model

2.0.3 attributes: A named property of a class that describes the range of values that the class or its instances (i.e., objects) may hold.

KNOWLEDGE MANAGEMENT VIA DEVELOPMENT IN ACCOUNTING: THE CASE OF THE PROFIT AND LOSS ACCOUNT

Chapter 6: Entity-Relationship Model

KDI EER: The Extended ER Model

E-R Model. Hi! Here in this lecture we are going to discuss about the E-R Model.

Aggregation Transformation of XML Schemas to Object-Relational Databases

FUNDAMENTALS OF SEVENTH EDITION

Fundarnentals of. Sharnkant B. Navathe College of Computing Georgia Institute of Technology

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

THE ENHANCED ER (EER) MODEL CHAPTER 8 (6/E) CHAPTER 4 (5/E)

Using High-Level Conceptual Data Models for Database Design A Sample Database Application Entity Types, Entity Sets, Attributes, and Keys

FRAMEWORK OF THE EXTENDED PROCESS TO PRODUCT MODELING (XPPM) FOR EFFICIENT IDM DEVELOPMENT

A Rule-Based Approach for the Recognition of Similarities and Differences in the Integration of Structural Karlstad Enterprise Modeling Schemata

TUML: A Method for Modelling Temporal Information Systems

Overview of Database Design Process Example Database Application (COMPANY) ER Model Concepts

2.0.3 attributes: A named property of a class that describes the range of values that the class or its instances (i.e., objects) may hold.

CMPT 354 Database Systems I

Indexing XML Data with ToXin

Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database

FRAMEWORK OF THE EXTENDED PROCESS TO PRODUCT MODELING (XPPM) FOR EFFICIENT IDM DEVELOPMENT

XSEM - A Conceptual Model for XML Data

Conceptual Database Design

Unified Modeling Language (UML)

XML: Extensible Markup Language

Evolution of XML Applications

Chapter 6: Entity-Relationship Model

A Universal Model for XML Information Retrieval

Copyright 2016 Ramez Elmasr and Shamkant B. Navathei

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

Using Data-Extraction Ontologies to Foster Automating Semantic Annotation

Information Technology Audit & Cyber Security

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 7 Data Modeling with Entity Relationship Diagrams

An Ontological Analysis of Metamodeling Languages

Configuration Management in the STAR Framework *

Semantics for and from Information Models Mapping EXPRESS and use of OWL with a UML profile for EXPRESS

Introduction to Data Management. Lecture #3 (Conceptual DB Design) Instructor: Chen Li

Towards the Preservation of Referential Constraints in XML Data Transformation for Integration

COMP102: Introduction to Databases, 13

Ontology Development. Qing He

Chapter 7: Entity-Relationship Model

Integrating SysML and OWL

Chapter 7: Entity-Relationship Model

Hierarchies in a multidimensional model: From conceptual modeling to logical representation

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

Chapter 13 XML: Extensible Markup Language

2. An implementation-ready data model needn't necessarily contain enforceable rules to guarantee the integrity of the data.

Unified Modeling Language (UML) Class Diagram

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability

A Methodology for Integrating XML Data into Data Warehouses

Conceptual Database Modeling

MERGING BUSINESS VOCABULARIES AND RULES

CS 338 The Enhanced Entity-Relationship (EER) Model

Reasoning on Business Processes and Ontologies in a Logic Programming Environment

Introduction. Web Pages. Example Graph

Relational Model: History

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 4 Entity Relationship (ER) Modeling

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition. Chapter 8 Data Modeling Advanced Concepts

Conceptual Modeling for Customized XML Schemas

Contents. Database. Information Policy. C03. Entity Relationship Model WKU-IP-C03 Database / Entity Relationship Model

Transcription:

XML Document Modeling from a Conceptual Schema Rebeca Schroeder, Ronaldo dos Santos Mello Depto. de Informática e Estatística Universidade Federal de Santa Catarina (UFSC) Caixa Postal 476 88040-900 Florianópolis SC Brazil {rebecks,ronaldo}@inf.ufsc.br Abstract. The amount of Web data repositories has been increasing for several application domains. Ontologies and conceptual schemata are important in this context to provide a semantic understanding of this data, which is usually instantiated through XML documents. In order to contribute to the problem of XML document engineering for a data repository in a given domain, this paper proposes an approach for converting a conceptual schema to an XML schema. Specifically, it focuses on the conversion of an Extended Entity- Relationship (EER) schema to an XML logical schema that abstracts different XML implementation schemata. The conversion is based on a set of rules as well as a process that controls rules execution to generate a schema in an XML logical model. Compared to related work, our approach considers all EER constructs to generate a suitable XML logical schema. 1. Introduction Conceptual schemata and ontologies have been defined and associated to data sources in several application domains, specially in the Web. They act as a consensual vocabulary and structure for this data, providing a better understanding of their semantics and sometimes acting also as base schemata for manipulation operations over them [Stephens et al 2004]. At the same time, XML documents are the main format for representing data instances, specially in Web data sources. Examples of XML sources on the Web are the ones related to academic bibliographic domain [Ley 2008, Merialdo and Silva 2002]. Due to the increasing availability and manipulation of XML data on the Web, methods for XML document engineering in Web data sources ruled by a domain ontology or a conceptual schema are required. Many efforts to deal with XML data modeling in different abstraction levels have been accomplished, some of them proposing new models and formalisms for XML data design [Embley et al 2004, Mani 2004, Fong et al 2006]. However, there is no agreement about methodologies for designing XML documents. On considering the literature for conventional database design, we have a threelevel modeling: conceptual, logical and implementation level [Batini et al 1992]. This methodology can be applied to XML document design [Graves and Goldfarb 2001]. In this case, we may use a semantic schema in the first level, like a predefined ontology schema or a schema modeled in a traditional conceptual model, like the Extended Entity-Relationship (EER) model [Batini et al 1992]. The XML data model is considered in the logical level, being a conceptual schema converted to a schema in an

XML logical model. An XML logical model must consider specific constructs and constraints of the XML data model, and it must be able to abstract different XML implementation models, like Document Type Definition (DTD) [Bray et al 2000] and XML Schema [Thompson et al 2004]. We propose in this paper an approach which is focused on the conversion of conceptual schemata to XML logical schemata. This approach is useful in the context of XML document engineering or even logical design of XML databases. We describe a process for converting an EER conceptual schema to a logical schema in an XML logical model. The focus here is on the EER model due to its widespread use for data conceptual modeling. Besides, EER is a simple and powerful model to provide a highlevel abstraction for the information of an application domain, being considered a generic conceptual or semantic model. It means that our approach could be applied on other conceptual models, like UML, and could be also used to model a domain ontology structure [Xu et al 2004, Duan et al 2006, Myroshnichenko 2007]. The proposed process is based on conversion rules that consider all EER model constructs, which represents, as pointed out before, all traditional semantic data model constructs, like aggregation and generalization [Smith 1997]. The process output is an equivalent schema in an XML logical model. This paper has two main contributions. First, we present a comparative analysis of existing approaches and algorithms for converting conceptual models to XML models. Second, we propose a conversion process for generating XML logical schemata from EER conceptual schemata. The strategy of our conversion approach is to generate hierarchical structures for representing each one of the EER association types. This strategy allows us to generate XML logical schemata which are able to be converted to any XML implementation model. Our XML logical model is an abstract representation for different XML implementation models, like the W3C recommendations (DTD and XML Schema). Such logical model is a hierarchical model that supports the representation of XML peculiar constructs and constraints. The remainder of this paper is organized as follows. Section 2 discusses and compares the related work. Section 3 gives an overview of the adopted XML logical model. Section 4 presents the conversion rules for the EER constructs as well as an algorithm that defines the application order of the rules. Section 5 presents a case study of our conversion process and section 6 is dedicated to the conclusion. 2. Related Work Several approaches have been proposed for XML document modeling in different abstract levels. Some approaches deal with the translation of traditional conceptual models into XML models. Other ones propose new models and formalisms for XML data modeling. In order to analyze and compare these approaches, we classify them in three categories, according to the models that are used by their conversion approaches, as follows: (i) ER/EER to a schema definition language for XML; (ii) ER/EER to an XML logical model; and (iii) new conceptual models to an XML model. At the end of

this section, we present a comparative analysis of the algorithms for translating ER/EER schemata to XML schemata. Some approaches are inserted in the first category. The algorithm proposed in [Pigozzo et al 2005] generates an XML Schema from an ER schema. The authors of [Choi et al 2003] and [Fong et al 2006] present a conversion process to map EER schemata into a DTD schema. These approaches are strongly tied to a schema definition language for XML. Our approach, instead, provides the conversion to a logical level, providing an abstraction of such languages. The conversion from the ER model to an XML logical model is also presented in the literature. The approaches of [Elmasri et al 2002] and [Lee et al 2001] convert ER schemata to a logical schema in a hierarchical model. Their logical models rely only on tree-based structures, do not representing the constructs and constraints of the XML data model, and their conversion methodologies do not support the mapping of some important conceptual constructs, like generalization and union types. The related work which is more close to our approach extends the ER model to represent the main features provided by XML and describes an algorithm to translate an EReX (ER extended for XML) schema to an XML logical schema defined by a grammar [Mani 2004]. Such a grammar, namely XGrammar, is based on XML schema languages notations (DTD, XML Schema and RELAX-NG). However, this approach relies on a new extension for the ER model. We argue that XML document design should be based on a consolidated conceptual model, as the EER model, and a new conceptual model for this purpose should be yet another (not necessary) model. The last category comprises works that propose new models for XML document design. A new conceptual model for XML is proposed by [Embley et al 2004], namely C-XML (Conceptual XML). C-XML is a model-equivalent with XML Schema and the mappings are proposed to this specific XML schema definition language. A method based on semantic networks for modeling XML data is proposed in [Chang et al 2002] and a new hierarchical model to represent semistructured data is introduced by [Dobbie et al 2001]. All of these approaches are a mix of XML logical models with semantic concepts of a conceptual model. Our approach relies on two independent models: the EER conceptual model and a suitable logical model for representing the XML data format. This independence among models is a fundamental requirement of methodologies for designing data schemata [Batini et al 1992]. Some of the presented approaches propose algorithms for converting ER or EER schemata to schemata in an XML model. We compare these algorithms in Table 1, showing the EER constructs that are considered in each one of them. We observe some lacks in the conversion of EER concepts, mainly in the conversion of generalization types and categories. Generalization/specialization types and category/union types [Elmasri et al 1985] are considered fundamental modeling constructs [Batini et al 1992]. They are more and more used for modeling conceptual relationships of complex and semistructured systems [Mani 2004]. Hence, they cannot be disregarded by a conversion process of conceptual schemata to XML schemata. Our conversion approach provides conversion treatments for all the EER constructs of Table 1, including generalization types and categories.

As shown in Table 1, the conversion approach proposed in [Fong et al 2006] considers most of EER concepts, including categories. However, it is tied to DTD constructs and also does not take into account some EER concepts and constraints regarding to generalization and categories. Our approach considers such aspects, generating a suitable XML logical schema which is able to abstract DTD and XML Schema recommendations. Table 1. Comparison of related work algorithms EER model concepts [Elmasri [Pigozzo [Choi [Mani [Lee [Fong 2002] 2005] 2003] 2004] 2001] 2006] Entity Monovalued attribute Multivalued attribute Composite attribute Key attribute Binary Relationship N-ary with n > 2 Relationship Total Generalization Partial Generalization Disjoint Generalization Overlap Generalization Total Category Partial Category 3. The XML Logical Model An XML logical model is proposed to represent the XML data model. It is able to abstract the main recommendations for XML schema language. Our logical model is not a new formalism, but an adaptation of existing hierarchical models for semistructured data modeling [Dobbie et al 2001 and Chang et al 2002]. Some additional concepts and notations are introduced to represent XML data constructs and constraints. Figure 1 presents an example of the XML logical schema. It is composed by attributes, elements, as well as hierarchical and reference relationships. An attribute models an element property which is defined by a single value, like the attribute size in the element LineItem. An attribute can act as an element identifier, like the identifier attribute code in the element LineItem. An attribute can also act as a reference attribute which refers to an element identifier, like the attribute lineitemref in Item referring to the LineItem identifier. We introduce a simple element notation, represented by a dashed shape. This concept models information that is defined by a simple data type and does not have attributes, like the element emissioncontact. A complex element, represented by a solid shape, models information that is composed by elements and/or attributes. The component elements of a complex element are organized by one of two component constructs: order or exclusive. An order construct defines n ordered component elements, with n >= 1. An exclusive construct defines n alternatives for the component elements, with n > 1. The default component construct for complex elements is order and the exclusive construct is represented by a line which crosses the component

elements of a complex element. Invoice and BusinessPartner are examples of complex elements with order and exclusive constructs, respectively. Figure 1. An example of XML logical schema. Hierarchical relationships are represented by lines between the source concept and the target concept. The source concept of a hierarchical relationship must be a complex element, and the target concept may be an attribute, a simple element, or a complex element. A hierarchical relationship defines the minimum and maximum occurrence of a target concept in a source concept. The relationship between Invoice and its Item component is an example: Invoice may have zero or more Items, and an Item must be in an Invoice element. The default minimum and maximum occurrence for target concepts is 1. A reference relationship is represented by a dashed line between a reference attribute (or a set of reference attributes) and an element. It refers to the identifier of another element, like the relationship between the reference attribute invoiceitemref and the element Item. As the XML logical model is an abstract representation for the XML implementation models, an XML logical schema can be converted to a DTD or an XML Schema specification in a straightforward way. Only a few decisions must be made in the conversion of identifier and reference attributes because DTD and XML Schema provide different supports for representing these constraints. 4. Conversion Process A conversion process is proposed to generate a schema in the XML logical model which represents the information and constraints modeled by an EER schema. In this section, we define conversion rules for the EER constructs and a conversion algorithm which specifies the application order for these rules. 4.1. Conversion Rules A set of conversion rules is proposed for converting the EER model constructs. The strategy of the conversion rules is to generate hierarchical structures and constraints over these structures to represent the EER constructs. We take care to avoid data redundancy in the XML logical schema and, at the same time, to keep the connections among the concepts of an EER schema. Our conversion strategy is summarized in 6 conversion

rules: 3 alternative rules for converting generalization and union types, and 3 rules for converting relationship types. These rules are based on general strategies to convert EER schemata to conventional database models defined in [Batini et al 1992]. We adapt these rules to consider the conversion to an XML data model. Generalization and union types are converted by Rule 1, Rule 2 or Rule 3. For each generalization or union type, a specific rule is taken by the algorithm defined in Section 4.2. The main difference among these rules is the different size of XML schema fragments that each one generates and the constraints that they can represent. Alternative rules are considered because the goal of our algorithm is to generate compact XML schemata. XML document size has been considered as a requirement on design methodologies for generating XML schemata, given that large volumes of data are being produced in the format of XML documents [Mok et al 2006]. However, the rules that generate the smallest XML fragment cannot be applied for some constraint cases of generalization types. Thus, alternative rules must be provided to deal with such cases. Figure 2. Alternative rules for converting generalization and union types. In Figure 2 (a), a generalization type with total and disjoint (t,d) constraint is presented. The three alternative ways to represent it is shown by (b), (c) and (d). For the first alternative, namely Rule 1 (a), only one XML element is generated to represent the hierarchy. In this case, the attributes of the subclasses become optional attributes in the element that represents the superclass. Rule 2 generates only elements to represent the subclasses. In this alternative, the attributes of the superclass are replicated in each created element. In these two first rules, the generalization constraints are implicitly represented. In Rule 2, for example, the total constraint is considered because only elements to represent the subclasses are created. It means that the superclass is always specialized by a subclass. Some restrictions for applying these rules are defined by the algorithm in the next section. For Rule3, disjointness constraints are represented by the component construct (order or exclusive) of the element which represents the superclass. The elements which represent the subclasses of a generalization type are represented as child elements of the element which represents the superclass. Completeness constraints are represented by the minimum occurrence of the subclass elements in the superclass element. Union types model a superclass/subclass relationship with more than one superclass. In this case, the subclass represents a collection that is the union of superclasses instances. However, a subclass instance must be a specialization of only one superclass. An union type can be total or partial. It is total if all superclasses instances are specialized, and partial, otherwise. Union types are also converted by the rules of Figure 2. However, some changes must be made to adapt the rules for this conceptual construct. For example, on applying Rule 1, elements created to represent the

superclasses become sub-elements of the subclass element. In Rule2, only elements for superclasses are created, and in Rule3 an element to represent the subclass and all the attributes of the superclasses is created. The three last rules are responsible to convert relationship types. A binary relationship type is shown by Figure 3 (a), and the three ways to represent it in the XML logical model are presented by (b), (c) and (d). Rule 4 generates only one element to represent the two entities and the relationship. As the entity B has required participation in R, its attributes are represented as attributes in the element which represents the entity A. Relationship attributes are also appended to element A. Given the optional participation of A in the relationship, the attributes of R and B are defined as optional attributes in element A. Figure 3. Alternative rules for converting relationship types. Rule 5 defines the entity which has required participation in R as the sub-element of the element A. In this case, the relationship attributes are defined as attributes of the element B. The occurrence of B in A is also defined as [0..1], given the optional participation of A in R. The last alternative is defined by Rule 6, which defines an element to represent the relationship as a sub-element of the element A. In this case, a reference relationship is established to connect the independent element B. In next section, we present and discuss the restrictions for applying each rule through an algorithm defined to convert EER schemata to XML logical schemata. 4.2. Conversion Algorithm The conversion rules are controlled by the GenerateXMLLogicalSchema algorithm which establishes the application order for them. It takes as input parameter an EER conceptual schema and generates a suitable XML logical schema as output. The algorithm first converts generalization and union types. They are converted before relationship types because they usually build a hierarchy of elements to represent the taxonomy of the entities of the EER generalizations and categories. For each generalization or union type, a conversion rule is selected by the lines 3-8. The execution restrictions for the rules are checked from the rule which generates the smallest schema (Rule 1) to the rule which generates the biggest schema (Rule 3). Rule 1 cannot be applied when there are relationships related to the subclasses because the conceptual relationships related to specific subclasses indicate that the distinctions between subclasses and superclass must be preserved. It means that elements to represent each subclass must be generated. A similar situation occurs for Rule 2 when there are relationships connected to the superclass. In this case, the superclass must be

represented by a specific element. In spite of that, if only elements to represent the subclasses are generated, the relationships with the superclass must be replicated to all the elements of subclasses, creating undesirable redundancy. Thus, Rule 2 can be applied for generalizations that do not have relationships connected to the superclasses. Algorithm: GenerateXMLLogicalSchema Input: An EER conceptual schema; Output: An XML logical schema (LS); BEGIN 1 GENERAL-UNION get the set {T 1,T 2,,T n} of generalization and union types; 2 For each T i in GENERAL-UNION do 3 If (there is no relationship associated with the subclasses of T i) 4 Apply Rule1(T i); 5 Else-if (there is no relationship associated with the superclass of T i) 6 Apply Rule2(T i); 7 Else 8 Apply Rule3(T i); 9 RELATIONS get the set {R 1,R 2,,R n} of relationships; 10 For each R i in RELATIONS do 11 If (R i is 1:1 AND an entity of R i has participation (1,1)) 12 Apply Rule4(R i); 13 Else-if (R i is binary AND an entity of R i has participation (1,1)) 14 Apply Rule5(R i); 15 Else 16 Apply Rule6(R i); 17 Generate a root element for LS; END; The last alternative (Rule 3) generates the biggest XML fragment. However, this alternative is flexible, specially when superclass or subclasses have relationships with other entities. There is another alternative for converting generalization types involved in multiple-inheritance cases. In these cases, some reference relationships are needed to represent the conceptual relationships among the entities. For the sake of paper space, we do not present this issue. A discussion about all constraint cases involving generalization and union types can be found in [Schroeder et al 2008]. The conversion of relationship types defines the final XML schema structure, connecting the elements which represent entities associated by relationship types. When there is an entity which has (1,1) participation in a binary relationship type with maximum cardinality 1:1, Rule 4 is applied. For the other cases, Rule 5 and 6 are selected. Rule 5 is only applied to binary relationships where one of the entities has (1,1) participation. Rule 6 is applied where it is not possible to generate hierarchical relationships. It occurs when the entities have optional participation in the relationship, or the cardinality for the relationship does not allow this kind of conversion, like N:M and n-ary relationships, with n > 2. Reference relationships are sometimes needed to represent the conceptual constructs. However, they produces disconnect and flat XML schemata which often leads to a poor query performance on conforming XML documents. For this reason, conversion rules that generate reference relationships are the last alternatives to be

selected by our algorithm. The overhead generated by reference relationships on query performance is presented and treated in [Schroeder et al 2008a]. The last step of the algorithm generates an element to represent the root element of the XML logical schema. 5. Case Study For the sake of understanding of the proposed conversion process, this section presents a case study. A conceptual schema for an Invoice domain is represented by the EER schema in the Figure 4. The XML logical schema in Figure 1 is a representative XML logical schema generated by our conversion process from the EER schema of Figure 4. Figure 4. An EER schema for the Invoice domain. First of all, the rules for the generalization types and categories are applied. The total and disjoint generalization involving the superclass BusinessPartner is converted to a hierarchy where the elements that represent the subclasses (Supplier, Customer and Carrier) become child elements of the BusinessPartner element. The component construct of the BusinessPartner element is defined as exclusive for representing the disjoint constraint. The occurrence of the subclass elements in BusinessPartner is assumed as the default ([1..1]) for representing the total constraint. In this case, Rule 3 is applied because there are relationships involving the superclass (acceptance) and the subclasses (emission and delivery). The LineItem category/union type is converted in the next step. As shown in Figure 1, the element Product and Service (superclasses) are represented by attributes in the element LineItem (subclass). Rule 1 is applied because the superclasses do not have any relationship related to them. In this case, the superclasses attributes are defined as optional because, according to the union types constraints, only a superclass can be represented by an instance of LineItem. The conversion of the 1:N emission relationship generates a hierarchical relationship where Invoice becomes a sub-element of the element Supplier. This strategy applied by Rule 5 is possible because the participation of Invoice in the relationship is (1,1). In the following, the conversion process deals with the associative entity Item and a new element Item is created as a child element of Invoice for representing this

relationship. The attributes of the relationship Item are defined as attributes in the element Item, as well as a reference attribute is created to establish a reference relationship with the element LineItem. Observe that this conversion strategy was applied to avoid data redundancy because Item is a N:M relationship type. After this conversion, the relationship delivery is processed by the same conversion approach, i.e., Rule 6. In this case, the optional participation (0,1) of Item in the relationship delivery does not allow to define Item as a child element of Carrier. Note that the attributes emissioncontact and payment are represented in Invoice as a simple element and as a complex element, respectively. emissioncontact is represented as a simple element because it is a multivalued attribute, and payment is represented as a complex element because it is composed by other component attributes. The conversion rules for entities and attributes are not presented here because they are simple rules that generate equivalent constructs in the XML logical schema from the conceptual schema. The relationship acceptance is also converted by Rule 6, given its maximum cardinality N:M. Notice that Rule 4 is not applied in this case study because only relationships with maximum cardinality 1:1 can be converted by this rule. At the end of conversion process, an element is created to be the root element of the XML logical schema and the elements LineItem and BusinessPartner become sub-elements of root. On considering the conversion approach proposed in Section 4, some user interventions can be required to make a few decisions on the resulting XML logical schema. For instance, the relationship Item could be represented as a child element either of the LineItem element or of the Invoice element, as done in the case study. We developed a prototype tool [Lima et al 2008] which implements the conversion approach. It considers these alternative conversion rules which are automatically selected for generating an XML logical schema. These alternative rules constitute different ways to convert a specific EER construct. In this paper, we presented a conversion process which provides a set of rules that considers all the EER constructs and represents all the conceptual information modeled in an EER schema, as shown in this case study. It generates a compact schema by selecting the conversion rule which produces the smallest possible XML fragment to represent a conceptual construct. 6. Conclusion This paper presents a process for converting an EER conceptual schema to an XML logical schema. Such process contributes to a methodology for logical design of XML documents, which is relevant in a scenario where a domain application is ruled by an ontology or conceptual schema, and the associated data source will be composed by XML data. We believe that such a scenario will be very common for Web data sources in a near future. The conversion to an XML logical model is suitable to XML document engineering because it provides an abstraction of XML implementation models, i.e., XML schema language recommendations. Compared to related work, our approach provides conversion rules that support all concepts of the EER model in order to generate a representative, and still abstract, logical schema in a model for XML data. The abstraction of the XML logical schema is

guaranteed by the conversion strategy which generates hierarchical constructions to represent the conceptual constructs of the EER model. Although the more recent W3C XML Schema recommendation provides some constructs like extension and abstract elements, they are not considered by our logical model, in order to guarantee the compatibility with DTDs and other implementation models. This compatibility is important because these models provide different supports for representing domain constraints. It means that an implementation model could be chosen by an application domain considering the constraints that a specific XML schema language can represent. Future work include the consideration of the implementation modeling level in order to support a complete modeling methodology for XML document engineering, as well as the consideration, by the prototype tool, of other conceptual formalisms as input, like UML class diagrams or OWL ontologies. 7. References Batini, C., Ceri, S., Navathe, S.: Conceptual Database Design: An Entity-Relationship Approach. The Benjamin/Cummings Publishing Company (1992) Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E.: Extensible Markup Language (XML) 1.0. W3C Recommendation, World Wide Web Consortium. http://www.w3.org/tr/rec-xml (2000) Chang, E., Dillon, T., Feng, L.: A Semantic Network-Based Design Methodology for XML Documents. In: ACM Transactions on Information Systems. pp. 390-421. ACM Press (2002) Choi, M., Lim, J., Joo, K.: Developing a Unified Design Methodology Based on Extended Entity-Relationship Model for XML. In: International Conference on Computational Science. pp. 920 929. Springer, Heidelberg (2003) Dobbie, G., Xiaoying, W., Ling, T. and Lee, M.: ORA-SS: An Object-Relationship- Attribute Model for Semi-Structured Data. Technical Report TR21/00. National University of Singapore (2001) Duan, Y., Fu, X., Cheung, S. C. and Gu, Y.: An Entity-Relationship Model based Conceptual Framework for Model Driven Development. In: IASTED International Conference on Software Engineering (ICSE 2006). pp.200-205 (2006) Elmasri, R., weeldreyer, J.A., Hevner, A.R.: The Category Concept: An Extension to the Entity-Relationship Model. Data Knowledge Engineering. 1. pp.75 116 (1985) Elmasri, R., Wu, Y., Hojabri, B., Li, C., Fu, J.: Conceptual Modeling for Customized XML Schemas. In: International Conference on Conceptual Modeling. pp. 429-443, Springer-Verlag (2002) Embley, D., Liddle, S., Kamba, S.: Enterprise Modeling with Conceptual XML. In: International Conference on Conceptual Modeling. pp. 150-165. Springer-Verlag (2004) Fong, J., Fong, A., Wong, H.K., Yu, P.: Translating Relational Schema with Constraints into Xml Schema. International Journal of Software Engineering and Knowledge Engineering. 16, pp. 201 244 (2006)

Graves, M., Goldfarb, C.: Designing XML Databases. Prentice Hall PTR, New Jersey, U.S.A (2001) Lee, M., Lee, S., Ling, T., Dobbie, G., Kalinichenki, L.: Designing Semistructured Databases: A Conceptual Approach. In: International Conference on Database Expert Systems Applications. pp. 12-21. Springer-Verlag (2001) Ley, M.: DBLP Bibliography. http://www.informatik.uni-trier.de/~ley/db/ (2008) Mani, M.: EReX: A Conceptual Model for XML. In: Internation XML Database Symposium. pp. 128-142. Springer-Verlag (2004) Merialdo, P., Silva, A.: ACM Sigmod Online. http://www.sigmod.org/record/xml (2002) Mok, W. Y., Embley, D. W.: Generating Compact Redundancy-free XML Documents from Conceptual-model Hypergraphs. In: IEEE Transactions on Knowledge and Data Engineering. 18. pp. 1082-1096 (2006) Myroshnichenko, I.: Mapping ER Schemas to OWL Ontologies in the SFSU ER Design Tools. Master Thesis. CS Department - San Francisco State University (2007) Pigozzo, P., Quintarelli, E.: An algorithm for generating XML Schemas from ER Schemas. In: 13th Italian Symposium on Advanced Database Systems. pp. 192 199 (2005) Lima, C., Schroeder, R., Mello, R. S.: Uma Ferramenta para Conversão de Esquemas Conceituais EER para Esquemas Lógicos XML. In: Escola Regional de Banco de Dados. Sociedade Brasileira de Computação (2008) Schroeder, R., Mello R. S.: Conversion of Generalization Hierarchies and Union Types from Extended Entity-Relationship Model to an XML Logical Model. In: ACM Symposium on Applied Computing. pp 1036-1037 (2008) Schroeder, R., Mello R. S.: Improving Query Performance on XML Documents: A Workload-driven Design Approach (to appear). In: ACM Symposium on Document Engineering (2008a) Smith, J. M., Smith, D. C. P.: Database Abstractions: Aggregation and Generalization. In: ACM Transactions on Database Systems. 2(2), pp. 105 133 (1977) Stephens, L. M., Gangam, A. K., Huhns, M. N.: Constructing Consensus Ontologies for the Semantic Web: A Conceptual Approach. In: World Wide Web Journal, 7(4), Kluwer Academic Publishers (2004) Thompson, H., Beech, D., Maloney, M., Mendelsohn, N.: XML Schema Part 1: Structures. W3C Recommendation, World Wide Web Consortium. http://www.w3.org/tr/2004/rec-xmlschema-1-20041028 (2004) Xu, Z., Cao, X., Dong, Y., Su, W.: Formal Approach and Automated Tool for Translating ER Schemata into OWL Ontologies. In: VIII Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD 2004). LNCS v. 3056, pp. 464-475 (2004)