XViews: XML views of relational schemas

Size: px

Start display at page:

Download "XViews: XML views of relational schemas"

Tamsyn Harrell
5 years ago
Views:

1 SDSC TR XViews: XML views of relational schemas Chaitanya Baru San Diego Supercomputer Center, University of California San Diego La Jolla, CA 92093, USA October 7, 1999 San Diego Supercomputer Center TECHNICAL REPORT

2 XViews: XML views of relational schemas SDSC TR Chaitanya Baru San Diego Supercomputer Center, University of California San Diego La Jolla, CA 92093, USA October 7, 1999 Abstract In this paper, we introduce the concept of XML document views over relational schemas. We refer to such views as, XViews. In many applications domains, XML is emerging as a Web standard for exchanging information. Dening XML views over relational schemas is of interest because, in many instances relational database systems are being used to manage a wide range of content that is served on the Web and, thus, it is interesting to study techniques for exporting this data in the XML form. In practice, relational schemas can be quite large and dicult to navigate. Thus, the paper describes various approaches for enumerating candidate XViews, which can then form the basis for further manual renement to derive the nal XView. Before discussing the data export issue, however, we present our work on the \inverse" problem, viz. the mapping of SGML/XML DTD's to relational schemas. This work was done as part of the Distributed Object Computation Testbed (DOCT) project at SDSC [7]. The XViews work is one of the activities being undertaken as part of the Mediation of Information using XML (MIX) project, which is a joint eort between the the San Diego Supercomputer Center (SDSC) and the Database Lab at the University of California San Diego. This MIX project is investigating the use of XML as the medium for information modeling and information interchange among heterogeneous information sources.

3 1 Introduction The Extensible Markup Language (XML) [22] facilitates a move towards viewing the Web as a large, semistructured database consisting of many autonomous sites which are modeled using XML and related standards for structure and ontology denitions. This includes existing and proposed standards for schema specication (XML-Data), metadata and content specication (RDF, DCD, namespaces), access APIs (DOM), as well as source query capability specications, and other standards that may describe the transactional capabilities of sites. General information on many of these standards is available at the W3C website [20]. In the MIX Project at SDSC and the UCSD Database Lab [16], we are focussing on wrappermediator systems, which use XML not only for information exchange but also for information modeling in an environment consisting of heterogeneous information sources [2, 1]. In this model, the wrapper associated with each source exports an XML view of the information at that source. The mediator is then responsible for selecting, restructuring, and merging information from autonomous sources and for providing an integrated XML view of the information. As part of MIX, we are developing wrappers for a variety of information sources including, relational databases, GIS systems [10], and Web sites [12]. The mediator processes queries in an XML-based query language called, XML Matching and Structuring (XMAS) language [3]. We are also developing a DTD-guided interactive query interface called the Blended Browsing and Querying (BBQ) interface [17]. In this paper, we discuss our approach for wrapping relational databases. We are interested in dening XML document-like views, or XViews, over such sources. Since the MIX mediator employs an XML-based data model, XML document views are the most \natural" means by which to model and exchange information content in this system. However, even outside of MIX, we expect many RDBMS-based information sources will be interested in a general facility that can export XViews. This paper reports on preliminary results in this area. Section 2 describes results from the DOCT project where we investigated the problem of nding mappings from SGML/XML DTDs to relational schemas. Section 3 presents approaches for identifying candidate XViews for a given relational schema and also provides techniques for generating such XViews. Section 4 presents related work and some ideas for future work are mentioned in Section 5. 2

4 2 Mapping XML DTD's to relational schemas Document Type Denitions, or DTD's, express the hierarchical structure of document elements, where the hierarchy indicates a \container" relationship among elements. For example, a section in a document contains one or more subsections which, in turn, contains sub-subsections, and so on. The following simple steps provide a mapping of this hierarchical, container-oriented structure into a relational schema: 1. Assign unique ids to every container object and contained object (e.g., every section and every subsection) 2. Normalize the 1-many relationship between the container object and the contained objects using two tables, one for the container and the other for the contained objects, with a primaryforeign key relationship on the \container id." Since a given element may be included in more than one parent element, a decision must be made whether a separate table needs to be dened for each inclusion in a parent element. This decision must typically be guided by user choice since it depends on whether the denition of the contained object is \local" and within the context of the parent, or is in a larger, \global" context. 3. Apply Steps 1 and 2 recursively down the document hierarchy. 2.1 A DTD Case Study As part of the DOCT project at SDSC, we have been working on the conversion of all U.S. patent documents from a proprietary mark-up form (referred to as Messenger text or Greenbook) into XML [7]. The XML DTD is derived from on an international DTD standard specied by the World Intellectual Property Organization (WIPO). The document type is called, PATDOC. In the following, we describe some of the transformations that were performed to map the PATDOC DTD to a relational schema Handling attributes The top-level of the PATDOC DTD is dened as follows (in psuedo-code). Note that the symbols ;?; +have the usual meaning as in DTD's: 3

5 ELEMENT PATDOC ( Bibliographic Identication, Abstract, Detailed Description, Claims Section, Drawing Description?, OCR Information?) ATTLIST PATDOC (Country, Patent ID, Date of Publication, File ID, Kind of patent, Status, DTD Version Number) In the above, PATDOC is the container object. A table is created for this object and the ID Number attribute provides the primary key for the table. Attributes specied in the ATTLIST of PATDOC are dened as columns in the table. Each of the contained objects, e.g. Bibliographic Information, Abstract, Claims, are stored in independent tables, with Patent ID as the foreign key to refer back to the original PATDOC container object Extent of normalization In determining a mapping of this particular DTD to a relational schema, we were guided by application requirements as well as available technology. For example, the US Patent and Trademark Oce (USPTO) provided us with 17 dierent application scenarios, where each scenario involved a set of one or more SQL queries and, in some cases, application programs with embedded SQL. Further, our relational implementation is based on database technology in production use at SDSC in The database systems used were Oracle 7.3 and DB2 Version 2. As part of our on-going work, we are experimenting with managing this data with more recently released versions of DBMS software such as, Oracle 8i[18] and Excelon[8]. The application scenarios were used to provide guidance both on the level of normalization during database design, and the level of indexing when implementing the database. Normalizing the DTD into a set of relational tables allows for ecient access to the various components (elements) of the original document. For our application, it was neccesary to normalize the DTD only to the paragraph level, i.e. to the point where each paragraph in the input document is represented in a 4

6 row of a table. Beyond that level, it was more ecient to view the data as text (with embedded tags) and employ text indexing techniques for searcing at the paragraph level. As one traverses down its nested-structure, the information expressed by the DTD evolves from specifying container relationships among document components to details of the textual content within a component. In general, of course, the level at which this transformation occurs can be dierent for dierent components. For example, in the patent documents the Abstract component, which contains (tagged) text but does not contain a nested structure, is dened as follows: ELEMENT Abstract (Text) ELEMENT Text (Paragraph)+ ELEMENT Paragraph (#PCDATA) ATTLIST Paragraph (ID) The Claims section, on the other hand, contains a heading paragraph and an ordered list of one or more claims. Each claim contains one or more paragraphs of tagged text. ELEMENT Claims Section (Head, Claims) ELEMENT Head (Text) ELEMENT Text (Paragraph) ELEMENT Claims (Claim)+ ELEMENT Claim (Paragraph)+ ATTLIST Claim (ID) ELEMENT Paragraph (#PCDATA) ATTLIST Paragraph (ID) The Abstract and Claims information are stored in the following three tables (Patent ID is the primary key of the patent): TBL Abstract (Patent ID, Para ID, Paragraph) TBL Claims Section (Patent ID, Head) 5

7 TBL Claims (Patent ID, Claim ID, Para ID, Paragraph) The Abstract table has one row for each paragraph in the abstract, for a given patent. The Claims Section table has only one row per patent, since there is only one occurence of the Claims section (and associated heading information) in each patent. The Claims table, however, has one row for each paragraph of each claim of a given patent. The Claim ID attribute is unique for each claim in a patent. Para ID is unique within the abstract section and also unique within each claim in a patent. The Head and Paragraph attributes are implemented as text elds (e.g. character large objects). 2.2 Storing and indexing text The PATDOC DTD actually species several possible sub-elements below the Paragraph element, even though only a single one (#PCDATA) is shown in this example. These options can arise since dierent types of objects are classied as paragraphs including, formulas, tables, and images. Depending upon the type of paragraph, the level of nesting in this particular application can be upto ve levels deep. Since the paragraph data is text, it can be implemented as text columns (say, in the form of character large objects) in database tables. Text indexes can be employed to eciently search for information within these elds. Ideally, this type of text indexing would recognize the DTD structure and support DTD-guided search. At the paragraph level in the DTD, element attributes are used for enforcing referential integrity among document elements. For example, a mathematical formula may be assigned a unique ID via an associated attribute. References to this formula must then use the IDREF mechanism in XML. Modeling information at this level in the database and requiring the system to perform referential integrity checking at this level requires storing and processing a large amount ofhypertext information in relational tables and can cause ineciencies in reconstructing documents and/or document components. Thus, we choose not to normalize database tables beyond the paragraph level. 6

8 Figure 1: (a) Example Star schema. (b) Containment relationship 3 Techniques for deriving XViews Converse to the problem of mapping XML DTDs to relational schemas is the problem of providing XML views of relational data. As mentioned before, an XML DTD provides an inherent container model wherein elements contain other sub-elements. Given a relational database schema, we are interested in nding candidate XViews, where an XView is informally dened as an aggregation of some or all of the relations in the schema in a container-oriented hierarchical structure. Some important cases where the container relationship is easy to derive among relations include the star schema and the snowake schema, both commonly encountered in data warehousing environments. The simple star schema shown in Figure 1(a) can be expressed using the containment relationship: Lineitem contains Store, Product, and Time Period, as depicted in Figure 1(b). In eect, this containment relationship reects the foreign key to primary key relationship that the Lineitem entity has with each of the Store, Product, and Time Period entities. The XView in Figure 1(b) can be enumerated by performing a join in the relational database between the Lineitem table and each of the other three tables. Candidate XViews can be enumerated by exploiting the existing, known structure of relational database schemas. Since it is possible to dene multiple XViews for a given schema, we believe that identifying and ranking \useful" XViews will require user assistance. At the physical database level, it is possible to dene a simplistic \universal" XView, which is the 4-level hierarchy shown in Figure 2. All relational databases can be trivially mapped to this view. In the \universal" view, the root node is the database. Each base table (and \base" view) in the database is a node at the next level; each row in the table/view is a node at the following level; 7

9 Figure 2: \Universal" view of any relational schema and, nally, each column in each row is a node below that. Such a view has also been proposed in [21]. The corresponding containment relationships can be expressed as: ELEMENT Database (Table*) ELEMENT Table (Row*) ELEMENT Row (Column*) This is a canonical \structural" view of any relational database. The star schema example in Figure 1 is an improvement over this canonical view, since it employs the foreign-primary key relationships among tables to derive an XView. In the following, we describe methods that further exploit these FK-PK relationships. Our general approach is to, rst, represent a relational schema as a directed graph, where each base table represents a node, and each FK-PK relationship represents a directed edge from the FK node to the PK node (i.e. in the many-to-one direction). Simple graph processing techniques are then employed to help identify candidate XViews. We illustrate these via an example based on the schema of a specic database that is in use at SDSC. 3.1 Case Study: The MCAT database MCAT is a relational database that supports the Storage Resource Broker (SRB) at SDSC [4]. The SRB uses MCAT toprovide attribute-based access to distributed, replicated data sets which may be stored in heterogeneous storage resources [15]. MCAT currently contains about 50 base tables. A graph representation of MCAT, using the approach mentioned above, reveals that the 8

10 in- and out-degrees of nodes range from 0 to 5, except for one particular node, corresponding to the User entity/table, which has an in-degree of 20. This node corresponds to the User table. The large in-degree indicates that the User entity participates in a large number of relationships in this database. A candidate XView can, thus, be derived by viewing the User entity as central to the schema, i.e. the root of the XView. Figure 3(a) shows a subgraph of the MCAT graph which shows relationships among entities User, Data, and storage Resource. Figure 3(b) shows the same information using an entityrelationship diagram. A data set is associated with one or more replicas, since the SRB supports replication of individual data sets. Information that is common to all replicas, such asname and FileType, is stored in the Data table. Information specic to a replica, such as TimeOfCreation, is stored in the Data Replica table. Each data replica is also directly associated with the actual physical storage resource (PSR) in which it is stored. The SRB implements replication using the notion of logical storage resources (LSR's). An LSR contains a set of one or more PSR's. Information about LSR's is stored in the Logical Resource table, while the Resource table stores information about PSR's. Each LSR, PSR, and data replica is associated with a user who owns that particular resource. Information about the access privileges that users have with respect to LSR's and data replicas, is maintained in the Resource Access and Data Access tables, respectively (users normally cannot directly access PSRs). The gure also shows a relationship between the User table and a User Type table, which is a token or code table, that keeps all allowable values for the column TypeOfUser in the User table. Similarly, the Resource table is also associated with a Resource Type table (not shown in gure) which indicates the type of a given storage resource (e.g. an Oracle database, a UNIX lesystem, or an HPSS archive [11]). Though we have shown only one such token table in this example, the MCAT database actually contains a number of such tables. 3.2 Candidate XViews: Node with maximum in-degree As mentioned above, the XView with the User at the root is a candidate view. Thus, User would be the top level element in the DTD and attributes of the User table could be represented as attributes of the User element in the DTD. The procedure for deriving the entire XView is based on depth- rst traversal of a modied schema graph. While the procedure can be specied recursively, we describe an iterative version below: 9

$Invert all \original" edges (i.e. edges that have not already been inverted) that are incident to the current node, so that they are now outgoing edges. 3.$

11 Figure 3: (a) Subgraph of the MCAT schema (b) Subgraph represented as an E-R diagram 1. Set the User node as the current node. Output this node as an element of the DTD. 2. Invert all \original" edges (i.e. edges that have not already been inverted) that are incident to the current node, so that they are now outgoing edges. 3. For each out-going edge, output the node at the other end of the edge, n, as a sub-element of the current node. Traverse the subtree rooted at node n by setting node n to be the current node, and repeating from Step 2 above. The resulting \unfolded" graph is shown in Figure 4. Nodes reached via multiple paths are repeated to emphasize that they potentially represent distinct node instances. For example, there are ve paths from User to Resource, and in each case the resource identied at the leaf level may be dierent. The DTD corresponding to this XView is as follows (in the DTD examples in this paper, we do not further expand elements that do not contain sub-elements, e.g. Resource): ELEMENT Database (User*) ELEMENT User (Data Access*, Resource Access*, Data Replica*, Logical Resource*, User Type, Resource) ELEMENT Data Access (Data Repl) ELEMENT Resource Access (Logical Resource) ELEMENT Data Repl (Data, Resource) ELEMENT Logical Resource (Resource) 10

12 Figure 4: XView with additional grouping level The interpretation of the XView in this case is that the database contains a set of documents describing users. Each document describes a particular user's ownership and access information with respect to a set of resources. 3.3 Candidate XViews: Nodes with in-degree =0 An alternate approach for generating candidate views is to begin with the set of nodes in a given schema graph that have in-degree = 0. These are typically nodes that capture many-many relationships among two or more base entities (for example, the Lineitem node in Figure 1 and the Data Access and Resource Access nodes in Figure 2). The procedure is as follows: 1. For each node with in-degree= 0 and which has not already been visited: (a) Output this node as an element in the DTD. (b) Starting with this node, perform a breadth-rst traversal of all nodes that have not already been visited. Output each node traversed as a sub-element of its parent node and mark the node as visited (see Section 3.4 for comments on how todeal with cycles in the graph). Thus, we begin with nodes with indegree=0, and we traverse the graph along all out-going links until we reach nodes that have no outgoing links (see the following subsection for comments regarding how to deal with cycles in the graph). This approach results in the graph with two connected 11

13 Figure 5: An alternative XView (in-degree=0) components shown in Figure 4. The DTD for the corresponding XView can be represented as: ELEMENT Database =(Subgraph1, Subgraph2) ELEMENT Subgraph1 =(Data Access*) ELEMENT Subgraph2 =(Resource Access*) ELEMENT Data Access (User,Data Replica) ELEMENT Data Replica (User,Resource) /* User is owner of replica */ ELEMENT Resource Access (User,Logical Resource) ELEMENT Logical Resource (User,Resource) /* User is owner of resource */ ELEMENT User (User Type) In this case, we interpret the XView as consisting of two disjoint document sets, one related to data access information and the other related to resource access information. Since there is no common root, there is no natural way to related information across the two types of documents. 3.4 User guidance The simple graph techniques described above provide only a beginning to the task of enumerating candidate XViews. In general, user guidance will be required in several places to complete this task. For example, if the schema graph has cycles, then each cycle needs to be broken to be able to generate candidate views. A cycle of length n has n candidate XViews, each rooted at a dierent node in the cycle. User input is required in this case to identify interesting root nodes. One approach for identifying root nodes is based on the cardinality of relations. For example, token 12

14 or code tables, which are characterized by low cardinalities, are generally not good candidates for root nodes. Conversely, tables with high cardinalities (e.g. Lineitem in Figure 1) could be viewed as being \more important" and may, thus, be desirable as root nodes. As part of our on-going research, we are studying approaches to enumerating candidate XViews based on table cardinalities. Another issue is when to represent information as sub-elements versus attributes of an element. So far, we have been assuming that the attributes of a table are always represented as attributes of the corresponding element in the DTD. However, these attributes could also be represented as sub-elements. Conversely, attributes of a code table could be represented as attributes of the associated base table. For example, the graph in Figure 2 shows that the User table is associated with a User Type token table. The attributes of the latter could be represented as attributes of the DTD element User in the DTD examples discussed above. In some cases it may be useful to introduce groupings of elements in the DTD for better understanding and interpretation of information. For example, given the graph of Figure 4, a user may provide the annotation that the Data Access and Resource Access nodes can be grouped together indicating the access rights of a user, and that Resource, Data Replica, and Logical Resource can be grouped together, indicating the ownership status of the user. Such an annotation leads to the DTD shown below: ELEMENT Database (User*) ELEMENT User (AccessibleEntities, OwnedEntities, User Type) ELEMENT AccessibleEntities (Data Access*, Resource Access*) ELEMENT OwnedEntities (Resources*, Data Replica*, Logical Resource*) ELEMENT Data Access (Data Replica) ELEMENT Resource Access (Logical Resource) ELEMENT Data Replica (Data, Resource) ELEMENT Logical Resource (Resource*) The above DTD is a faithful representation of the relational schema in the following sense. In the relational database, if a user U 1 has, say, read access to data set replicas, r 1 and r 2, then the Data Access table would contain the two corresponding rows: (U 1 ;r 1 ; read) and (U 1 ;r 2 ; read). 13

15 Each Data Access row represents an instance of the Data Access node in the tree of Figure 4, and each of these nodes has a single child node corresponding to the associated replica. An alternate interpretation of the information in the relational database is to view the Data Access node as a node that groups together, or \aggregates", all data replicas that have the same access permission with respect to the given user. Thus, in this example, each instance of the Data Access node would represent a particular type of access, say, read access, for the user U 1. The children of this would be the set of nodes corresponding to all replicas to which U 1 has read access (r 1 and r 2, in this example). For this latter case, the denition of the Data Access and Resource Access elements would be modied as follows: ELEMENT Data Access (Data Replica*) ELEMENT Resource Access (Logical Resource*) The distinction between the above two DTD's is important since it results in very dierent queries on the underlying relational database, to retrieve requested portions of the XView. For example, using an XML query language such as, say, XMAS [3], one can query the XView to obtain all instances of the Data Access element (without expanding the subtree under each such node). The SQL query generated for the relational database is dierent depending upon which DTD is in use. In the rst case, the SQL query would fetch all the rows of the Data Access table. In the second case, the query would group the rows in the Data Access table by type of access (e.g. read-only, read-write, etc.) and return only the signature information of each group (i.e. the Group-By attributes). Similarly, given an instance of a Data Access node, if one was interested in fetching all the related Data Replica information (i.e. the subtree rooted at that Data Access node), the former DTD would result in a single row fetch (i.e. the row corresponding to the single Data Access record), whereas the latter DTD would require fetching all the rows that fall under the data access group implied by the Data Access node. Thus, the types of queries that need to be supported clearly inuences the choice of an XView. Conversely, dierent XViews can result in very dierent processing requirements on the underlying relational database. 14

16 4 Related Work Recent related work includes the work by Deutsch, Fernandez, and Suciu [6] on storing semistructured data in relational databases. They provide techniques for mapping between the semistructured data model and the relational data model, using a query language called STORED. Their focus is on semistructured data that is not associated with an a priori DTD. Thus, they provide tools to impose some structure on the documents so that a mapping can be found to relational tables. They provide for the fact that the documents may have a \true" semistructured aspect to them, where a mapping to relational tables is not feasible or ecient. In this case, they allow storage of that part of the semistructured data in so-called overow tables. Thus, the work we have discussed here in Section 2 addressed a more structured problem than that addressed by STORED. In our case, the DTD of the XML document is known in advance. The recent work on integrating XML and SQL reported in [5] also discusses some issues in mapping between the complex hierarchical structures of XML and the tabular structures of SQL. In particular, they argue that the semantics of a hierarchical XML structure ought to be captured using the outer-join operation between tables corresponding to the root element and its sub-elements. Other work that is generally related to the issues discussed here is the prior work on wrapping non-relational sources and the work on conversion between the relational data model and other data models. For example, issues in representing extended entity-relationship structures in relational database have been addressed extensively in [14]. Approaches for providing database interfaces to semistructured information, especially data on Web sites and Web information sources, have been discussed in [13, 9, 19]. While the work described here is similar in approach, our focus is specically on the conversion to and from XML DTD's and relational schemas. 5 Future work In this paper, we have reported our preliminary work in the area of XViews. We are interested in studying a variety of schemas to understand which heuristics are most useful in enumerating candidate XViews. We are also interested in understanding how much of the work is automatable using tools to generate candidate XViews, before requiring user intervention. Also, we are interested in investigating the issue of using relation cardinalities and query workloads to determine 15

17 useful XViews and, also, to rank candidate XViews. Acknowledgements The DOCT project was funded by DARPA and USPTO. We thank Larry Cogut and Pam Rinehart of the USPTO for extensive discussions regarding the SGML DTD for patents and USPTO applications requirements. This work was also partly funded by grants from the National Archives and Records Administration (NARA) and the Department of Energy (DOE) under its ASCI program. Initial discussions on XViews were carried out with Bertram Ludaescher, a member of the MIX project at SDSC. References [1] C. Baru, A. Gupta, V. Chu, B. Ludaescher, R. Marciano, Y. Papakonstantinou, and P. Velikhov. XML- Based Information Mediation for Digital Libraries. In in Demo Session, ACM Digital Libraries'99, Berkeley, CA, [2] C. Baru, A. Gupta, B. Ludaescher, R. Marciano, Y. Papakonstantinou, and P. Velikhov. XML-Based Information Mediation with MIX. In in Demo Session, ACM-SIGMOD'99, Philadelphia, PA, [3] C. Baru, B. Ludaescher, Y. Papakonstantinou, P. Velikhov, and V. Vianu. Features and Requirements for an XML View Denition Language: Lessons from XML Information Mediation. In position paper, W3C Workshop on Query Languages, Boston, MA, [4] C. Baru, R. Moore, A. Rajasekar, and M. Wan. The SDSC Storage Resource Broker. In Procs. of CASCON'98, Toronto, Canada, [5] M. David. SQL-Based XML Structured Data Access. In WebTechniques.com, [6] A. Deutsch, M. Fernandez, and D. Suciu. Storing Semistructured Data with STORED. In Procs. of ACM SIGMOD'99, Philadelphia, PA, [7] DOCT. The Distributed Object Computation Testbed Project. In San Diego Supercomputer Center, La Jolla, CA, [8] Excelon. Object Design Inc. In [9] M. Fernandez, D. Florescu, A. Levy, and D. Suciu. Web-Site Management: The Strudel Approach. In IEEE Bulletin of the Technical Committee on Data Engineering, IEEE CS, Washington, DC, [10] A. Gupta, R. Marciano, I. Zaslavsky, and C. Baru. Integrating GIS and Imagery through XML-Based Information Mediation. In Procs. of NSF International Workshop on Integrated Spatial Databases: 16

18 Digital Images and GIS. To appear in Lecture Notes in Computer Science, Springer-Verlag, Portland, Maine, [11] HPSS. High Performance Storage System. In San Diego Supercomputer Center, La Jolla, CA, [12] B. Ludaescher and A. Gupta. Modeling Interactive Web Sources for Information Mediation. In Procs. of International Workshop on World-Wide Web and Conceptual Modeling, Lecture Notes in Computer Science, Springer, Paris, France, [13] S. Malaika. Resistance is Futile: The Web will assimilate your Database. In IEEE Bulletin of the Technical Committee on Data Engineering, IEEE CS, Washington, DC, [14] V. M. Markowitz and A. Shoshani. Representing extended entity-relationship structures in relational databases: A modular approach. TODS, 17(3):423{464, [15] MCAT. MCAT - A Meta-Information Catalog. In San Diego Supercomputer Center, La Jolla, CA, [16] MIX. Mediation of Information using XML. In San Diego Supercomputer Center, La Jolla, CA, [17] K. Munroe and Y. Papakonstantinou. BBQ: A Visual Interface for Integrated Browsing and Querying of XML. In paper submitted to 5th IFIP 2.6 Working Conference on Visual Database Systems, Fukuoka, Japan, [18] Oracle8i. Oracle and XML. In [19] S. Prasad and A. Rajaram. Virtual Database Technology, XML, and the Evolution of the Web. In IEEE Bulletin of the Technical Committee on Data Engineering, IEEE CS, Washington, DC, [20] W3C. The World Wide Web Consortium. In [21] W3C. XML Representation of a Relational Database. In World Wide Web Consortium, [22] XML. The Extensible Markup Language, Version 1.0. In

Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation

Page 1 of 5 Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation 1. Introduction C. Baru, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, V. Vianu XML indicates