XViews: XML views of relational schemas

Size: px
Start display at page:

Download "XViews: XML views of relational schemas"

Transcription

1 SDSC TR XViews: XML views of relational schemas Chaitanya Baru San Diego Supercomputer Center, University of California San Diego La Jolla, CA 92093, USA October 7, 1999 San Diego Supercomputer Center TECHNICAL REPORT

2 XViews: XML views of relational schemas SDSC TR Chaitanya Baru San Diego Supercomputer Center, University of California San Diego La Jolla, CA 92093, USA October 7, 1999 Abstract In this paper, we introduce the concept of XML document views over relational schemas. We refer to such views as, XViews. In many applications domains, XML is emerging as a Web standard for exchanging information. Dening XML views over relational schemas is of interest because, in many instances relational database systems are being used to manage a wide range of content that is served on the Web and, thus, it is interesting to study techniques for exporting this data in the XML form. In practice, relational schemas can be quite large and dicult to navigate. Thus, the paper describes various approaches for enumerating candidate XViews, which can then form the basis for further manual renement to derive the nal XView. Before discussing the data export issue, however, we present our work on the \inverse" problem, viz. the mapping of SGML/XML DTD's to relational schemas. This work was done as part of the Distributed Object Computation Testbed (DOCT) project at SDSC [7]. The XViews work is one of the activities being undertaken as part of the Mediation of Information using XML (MIX) project, which is a joint eort between the the San Diego Supercomputer Center (SDSC) and the Database Lab at the University of California San Diego. This MIX project is investigating the use of XML as the medium for information modeling and information interchange among heterogeneous information sources.

3 1 Introduction The Extensible Markup Language (XML) [22] facilitates a move towards viewing the Web as a large, semistructured database consisting of many autonomous sites which are modeled using XML and related standards for structure and ontology denitions. This includes existing and proposed standards for schema specication (XML-Data), metadata and content specication (RDF, DCD, namespaces), access APIs (DOM), as well as source query capability specications, and other standards that may describe the transactional capabilities of sites. General information on many of these standards is available at the W3C website [20]. In the MIX Project at SDSC and the UCSD Database Lab [16], we are focussing on wrappermediator systems, which use XML not only for information exchange but also for information modeling in an environment consisting of heterogeneous information sources [2, 1]. In this model, the wrapper associated with each source exports an XML view of the information at that source. The mediator is then responsible for selecting, restructuring, and merging information from autonomous sources and for providing an integrated XML view of the information. As part of MIX, we are developing wrappers for a variety of information sources including, relational databases, GIS systems [10], and Web sites [12]. The mediator processes queries in an XML-based query language called, XML Matching and Structuring (XMAS) language [3]. We are also developing a DTD-guided interactive query interface called the Blended Browsing and Querying (BBQ) interface [17]. In this paper, we discuss our approach for wrapping relational databases. We are interested in dening XML document-like views, or XViews, over such sources. Since the MIX mediator employs an XML-based data model, XML document views are the most \natural" means by which to model and exchange information content in this system. However, even outside of MIX, we expect many RDBMS-based information sources will be interested in a general facility that can export XViews. This paper reports on preliminary results in this area. Section 2 describes results from the DOCT project where we investigated the problem of nding mappings from SGML/XML DTDs to relational schemas. Section 3 presents approaches for identifying candidate XViews for a given relational schema and also provides techniques for generating such XViews. Section 4 presents related work and some ideas for future work are mentioned in Section 5. 2

4 2 Mapping XML DTD's to relational schemas Document Type Denitions, or DTD's, express the hierarchical structure of document elements, where the hierarchy indicates a \container" relationship among elements. For example, a section in a document contains one or more subsections which, in turn, contains sub-subsections, and so on. The following simple steps provide a mapping of this hierarchical, container-oriented structure into a relational schema: 1. Assign unique ids to every container object and contained object (e.g., every section and every subsection) 2. Normalize the 1-many relationship between the container object and the contained objects using two tables, one for the container and the other for the contained objects, with a primaryforeign key relationship on the \container id." Since a given element may be included in more than one parent element, a decision must be made whether a separate table needs to be dened for each inclusion in a parent element. This decision must typically be guided by user choice since it depends on whether the denition of the contained object is \local" and within the context of the parent, or is in a larger, \global" context. 3. Apply Steps 1 and 2 recursively down the document hierarchy. 2.1 A DTD Case Study As part of the DOCT project at SDSC, we have been working on the conversion of all U.S. patent documents from a proprietary mark-up form (referred to as Messenger text or Greenbook) into XML [7]. The XML DTD is derived from on an international DTD standard specied by the World Intellectual Property Organization (WIPO). The document type is called, PATDOC. In the following, we describe some of the transformations that were performed to map the PATDOC DTD to a relational schema Handling attributes The top-level of the PATDOC DTD is dened as follows (in psuedo-code). Note that the symbols ;?; +have the usual meaning as in DTD's: 3

5 ELEMENT PATDOC ( Bibliographic Identication, Abstract, Detailed Description, Claims Section, Drawing Description?, OCR Information?) ATTLIST PATDOC (Country, Patent ID, Date of Publication, File ID, Kind of patent, Status, DTD Version Number) In the above, PATDOC is the container object. A table is created for this object and the ID Number attribute provides the primary key for the table. Attributes specied in the ATTLIST of PATDOC are dened as columns in the table. Each of the contained objects, e.g. Bibliographic Information, Abstract, Claims, are stored in independent tables, with Patent ID as the foreign key to refer back to the original PATDOC container object Extent of normalization In determining a mapping of this particular DTD to a relational schema, we were guided by application requirements as well as available technology. For example, the US Patent and Trademark Oce (USPTO) provided us with 17 dierent application scenarios, where each scenario involved a set of one or more SQL queries and, in some cases, application programs with embedded SQL. Further, our relational implementation is based on database technology in production use at SDSC in The database systems used were Oracle 7.3 and DB2 Version 2. As part of our on-going work, we are experimenting with managing this data with more recently released versions of DBMS software such as, Oracle 8i[18] and Excelon[8]. The application scenarios were used to provide guidance both on the level of normalization during database design, and the level of indexing when implementing the database. Normalizing the DTD into a set of relational tables allows for ecient access to the various components (elements) of the original document. For our application, it was neccesary to normalize the DTD only to the paragraph level, i.e. to the point where each paragraph in the input document is represented in a 4

6 row of a table. Beyond that level, it was more ecient to view the data as text (with embedded tags) and employ text indexing techniques for searcing at the paragraph level. As one traverses down its nested-structure, the information expressed by the DTD evolves from specifying container relationships among document components to details of the textual content within a component. In general, of course, the level at which this transformation occurs can be dierent for dierent components. For example, in the patent documents the Abstract component, which contains (tagged) text but does not contain a nested structure, is dened as follows: ELEMENT Abstract (Text) ELEMENT Text (Paragraph)+ ELEMENT Paragraph (#PCDATA) ATTLIST Paragraph (ID) The Claims section, on the other hand, contains a heading paragraph and an ordered list of one or more claims. Each claim contains one or more paragraphs of tagged text. ELEMENT Claims Section (Head, Claims) ELEMENT Head (Text) ELEMENT Text (Paragraph) ELEMENT Claims (Claim)+ ELEMENT Claim (Paragraph)+ ATTLIST Claim (ID) ELEMENT Paragraph (#PCDATA) ATTLIST Paragraph (ID) The Abstract and Claims information are stored in the following three tables (Patent ID is the primary key of the patent): TBL Abstract (Patent ID, Para ID, Paragraph) TBL Claims Section (Patent ID, Head) 5

7 TBL Claims (Patent ID, Claim ID, Para ID, Paragraph) The Abstract table has one row for each paragraph in the abstract, for a given patent. The Claims Section table has only one row per patent, since there is only one occurence of the Claims section (and associated heading information) in each patent. The Claims table, however, has one row for each paragraph of each claim of a given patent. The Claim ID attribute is unique for each claim in a patent. Para ID is unique within the abstract section and also unique within each claim in a patent. The Head and Paragraph attributes are implemented as text elds (e.g. character large objects). 2.2 Storing and indexing text The PATDOC DTD actually species several possible sub-elements below the Paragraph element, even though only a single one (#PCDATA) is shown in this example. These options can arise since dierent types of objects are classied as paragraphs including, formulas, tables, and images. Depending upon the type of paragraph, the level of nesting in this particular application can be upto ve levels deep. Since the paragraph data is text, it can be implemented as text columns (say, in the form of character large objects) in database tables. Text indexes can be employed to eciently search for information within these elds. Ideally, this type of text indexing would recognize the DTD structure and support DTD-guided search. At the paragraph level in the DTD, element attributes are used for enforcing referential integrity among document elements. For example, a mathematical formula may be assigned a unique ID via an associated attribute. References to this formula must then use the IDREF mechanism in XML. Modeling information at this level in the database and requiring the system to perform referential integrity checking at this level requires storing and processing a large amount ofhypertext information in relational tables and can cause ineciencies in reconstructing documents and/or document components. Thus, we choose not to normalize database tables beyond the paragraph level. 6

8 Figure 1: (a) Example Star schema. (b) Containment relationship 3 Techniques for deriving XViews Converse to the problem of mapping XML DTDs to relational schemas is the problem of providing XML views of relational data. As mentioned before, an XML DTD provides an inherent container model wherein elements contain other sub-elements. Given a relational database schema, we are interested in nding candidate XViews, where an XView is informally dened as an aggregation of some or all of the relations in the schema in a container-oriented hierarchical structure. Some important cases where the container relationship is easy to derive among relations include the star schema and the snowake schema, both commonly encountered in data warehousing environments. The simple star schema shown in Figure 1(a) can be expressed using the containment relationship: Lineitem contains Store, Product, and Time Period, as depicted in Figure 1(b). In eect, this containment relationship reects the foreign key to primary key relationship that the Lineitem entity has with each of the Store, Product, and Time Period entities. The XView in Figure 1(b) can be enumerated by performing a join in the relational database between the Lineitem table and each of the other three tables. Candidate XViews can be enumerated by exploiting the existing, known structure of relational database schemas. Since it is possible to dene multiple XViews for a given schema, we believe that identifying and ranking \useful" XViews will require user assistance. At the physical database level, it is possible to dene a simplistic \universal" XView, which is the 4-level hierarchy shown in Figure 2. All relational databases can be trivially mapped to this view. In the \universal" view, the root node is the database. Each base table (and \base" view) in the database is a node at the next level; each row in the table/view is a node at the following level; 7

9 Figure 2: \Universal" view of any relational schema and, nally, each column in each row is a node below that. Such a view has also been proposed in [21]. The corresponding containment relationships can be expressed as: ELEMENT Database (Table*) ELEMENT Table (Row*) ELEMENT Row (Column*) This is a canonical \structural" view of any relational database. The star schema example in Figure 1 is an improvement over this canonical view, since it employs the foreign-primary key relationships among tables to derive an XView. In the following, we describe methods that further exploit these FK-PK relationships. Our general approach is to, rst, represent a relational schema as a directed graph, where each base table represents a node, and each FK-PK relationship represents a directed edge from the FK node to the PK node (i.e. in the many-to-one direction). Simple graph processing techniques are then employed to help identify candidate XViews. We illustrate these via an example based on the schema of a specic database that is in use at SDSC. 3.1 Case Study: The MCAT database MCAT is a relational database that supports the Storage Resource Broker (SRB) at SDSC [4]. The SRB uses MCAT toprovide attribute-based access to distributed, replicated data sets which may be stored in heterogeneous storage resources [15]. MCAT currently contains about 50 base tables. A graph representation of MCAT, using the approach mentioned above, reveals that the 8

10 in- and out-degrees of nodes range from 0 to 5, except for one particular node, corresponding to the User entity/table, which has an in-degree of 20. This node corresponds to the User table. The large in-degree indicates that the User entity participates in a large number of relationships in this database. A candidate XView can, thus, be derived by viewing the User entity as central to the schema, i.e. the root of the XView. Figure 3(a) shows a subgraph of the MCAT graph which shows relationships among entities User, Data, and storage Resource. Figure 3(b) shows the same information using an entityrelationship diagram. A data set is associated with one or more replicas, since the SRB supports replication of individual data sets. Information that is common to all replicas, such asname and FileType, is stored in the Data table. Information specic to a replica, such as TimeOfCreation, is stored in the Data Replica table. Each data replica is also directly associated with the actual physical storage resource (PSR) in which it is stored. The SRB implements replication using the notion of logical storage resources (LSR's). An LSR contains a set of one or more PSR's. Information about LSR's is stored in the Logical Resource table, while the Resource table stores information about PSR's. Each LSR, PSR, and data replica is associated with a user who owns that particular resource. Information about the access privileges that users have with respect to LSR's and data replicas, is maintained in the Resource Access and Data Access tables, respectively (users normally cannot directly access PSRs). The gure also shows a relationship between the User table and a User Type table, which is a token or code table, that keeps all allowable values for the column TypeOfUser in the User table. Similarly, the Resource table is also associated with a Resource Type table (not shown in gure) which indicates the type of a given storage resource (e.g. an Oracle database, a UNIX lesystem, or an HPSS archive [11]). Though we have shown only one such token table in this example, the MCAT database actually contains a number of such tables. 3.2 Candidate XViews: Node with maximum in-degree As mentioned above, the XView with the User at the root is a candidate view. Thus, User would be the top level element in the DTD and attributes of the User table could be represented as attributes of the User element in the DTD. The procedure for deriving the entire XView is based on depth- rst traversal of a modied schema graph. While the procedure can be specied recursively, we describe an iterative version below: 9

11 Figure 3: (a) Subgraph of the MCAT schema (b) Subgraph represented as an E-R diagram 1. Set the User node as the current node. Output this node as an element of the DTD. 2. Invert all \original" edges (i.e. edges that have not already been inverted) that are incident to the current node, so that they are now outgoing edges. 3. For each out-going edge, output the node at the other end of the edge, n, as a sub-element of the current node. Traverse the subtree rooted at node n by setting node n to be the current node, and repeating from Step 2 above. The resulting \unfolded" graph is shown in Figure 4. Nodes reached via multiple paths are repeated to emphasize that they potentially represent distinct node instances. For example, there are ve paths from User to Resource, and in each case the resource identied at the leaf level may be dierent. The DTD corresponding to this XView is as follows (in the DTD examples in this paper, we do not further expand elements that do not contain sub-elements, e.g. Resource): ELEMENT Database (User*) ELEMENT User (Data Access*, Resource Access*, Data Replica*, Logical Resource*, User Type, Resource) ELEMENT Data Access (Data Repl) ELEMENT Resource Access (Logical Resource) ELEMENT Data Repl (Data, Resource) ELEMENT Logical Resource (Resource) 10

12 Figure 4: XView with additional grouping level The interpretation of the XView in this case is that the database contains a set of documents describing users. Each document describes a particular user's ownership and access information with respect to a set of resources. 3.3 Candidate XViews: Nodes with in-degree =0 An alternate approach for generating candidate views is to begin with the set of nodes in a given schema graph that have in-degree = 0. These are typically nodes that capture many-many relationships among two or more base entities (for example, the Lineitem node in Figure 1 and the Data Access and Resource Access nodes in Figure 2). The procedure is as follows: 1. For each node with in-degree= 0 and which has not already been visited: (a) Output this node as an element in the DTD. (b) Starting with this node, perform a breadth-rst traversal of all nodes that have not already been visited. Output each node traversed as a sub-element of its parent node and mark the node as visited (see Section 3.4 for comments on how todeal with cycles in the graph). Thus, we begin with nodes with indegree=0, and we traverse the graph along all out-going links until we reach nodes that have no outgoing links (see the following subsection for comments regarding how to deal with cycles in the graph). This approach results in the graph with two connected 11

13 Figure 5: An alternative XView (in-degree=0) components shown in Figure 4. The DTD for the corresponding XView can be represented as: ELEMENT Database =(Subgraph1, Subgraph2) ELEMENT Subgraph1 =(Data Access*) ELEMENT Subgraph2 =(Resource Access*) ELEMENT Data Access (User,Data Replica) ELEMENT Data Replica (User,Resource) /* User is owner of replica */ ELEMENT Resource Access (User,Logical Resource) ELEMENT Logical Resource (User,Resource) /* User is owner of resource */ ELEMENT User (User Type) In this case, we interpret the XView as consisting of two disjoint document sets, one related to data access information and the other related to resource access information. Since there is no common root, there is no natural way to related information across the two types of documents. 3.4 User guidance The simple graph techniques described above provide only a beginning to the task of enumerating candidate XViews. In general, user guidance will be required in several places to complete this task. For example, if the schema graph has cycles, then each cycle needs to be broken to be able to generate candidate views. A cycle of length n has n candidate XViews, each rooted at a dierent node in the cycle. User input is required in this case to identify interesting root nodes. One approach for identifying root nodes is based on the cardinality of relations. For example, token 12

14 or code tables, which are characterized by low cardinalities, are generally not good candidates for root nodes. Conversely, tables with high cardinalities (e.g. Lineitem in Figure 1) could be viewed as being \more important" and may, thus, be desirable as root nodes. As part of our on-going research, we are studying approaches to enumerating candidate XViews based on table cardinalities. Another issue is when to represent information as sub-elements versus attributes of an element. So far, we have been assuming that the attributes of a table are always represented as attributes of the corresponding element in the DTD. However, these attributes could also be represented as sub-elements. Conversely, attributes of a code table could be represented as attributes of the associated base table. For example, the graph in Figure 2 shows that the User table is associated with a User Type token table. The attributes of the latter could be represented as attributes of the DTD element User in the DTD examples discussed above. In some cases it may be useful to introduce groupings of elements in the DTD for better understanding and interpretation of information. For example, given the graph of Figure 4, a user may provide the annotation that the Data Access and Resource Access nodes can be grouped together indicating the access rights of a user, and that Resource, Data Replica, and Logical Resource can be grouped together, indicating the ownership status of the user. Such an annotation leads to the DTD shown below: ELEMENT Database (User*) ELEMENT User (AccessibleEntities, OwnedEntities, User Type) ELEMENT AccessibleEntities (Data Access*, Resource Access*) ELEMENT OwnedEntities (Resources*, Data Replica*, Logical Resource*) ELEMENT Data Access (Data Replica) ELEMENT Resource Access (Logical Resource) ELEMENT Data Replica (Data, Resource) ELEMENT Logical Resource (Resource*) The above DTD is a faithful representation of the relational schema in the following sense. In the relational database, if a user U 1 has, say, read access to data set replicas, r 1 and r 2, then the Data Access table would contain the two corresponding rows: (U 1 ;r 1 ; read) and (U 1 ;r 2 ; read). 13

15 Each Data Access row represents an instance of the Data Access node in the tree of Figure 4, and each of these nodes has a single child node corresponding to the associated replica. An alternate interpretation of the information in the relational database is to view the Data Access node as a node that groups together, or \aggregates", all data replicas that have the same access permission with respect to the given user. Thus, in this example, each instance of the Data Access node would represent a particular type of access, say, read access, for the user U 1. The children of this would be the set of nodes corresponding to all replicas to which U 1 has read access (r 1 and r 2, in this example). For this latter case, the denition of the Data Access and Resource Access elements would be modied as follows: ELEMENT Data Access (Data Replica*) ELEMENT Resource Access (Logical Resource*) The distinction between the above two DTD's is important since it results in very dierent queries on the underlying relational database, to retrieve requested portions of the XView. For example, using an XML query language such as, say, XMAS [3], one can query the XView to obtain all instances of the Data Access element (without expanding the subtree under each such node). The SQL query generated for the relational database is dierent depending upon which DTD is in use. In the rst case, the SQL query would fetch all the rows of the Data Access table. In the second case, the query would group the rows in the Data Access table by type of access (e.g. read-only, read-write, etc.) and return only the signature information of each group (i.e. the Group-By attributes). Similarly, given an instance of a Data Access node, if one was interested in fetching all the related Data Replica information (i.e. the subtree rooted at that Data Access node), the former DTD would result in a single row fetch (i.e. the row corresponding to the single Data Access record), whereas the latter DTD would require fetching all the rows that fall under the data access group implied by the Data Access node. Thus, the types of queries that need to be supported clearly inuences the choice of an XView. Conversely, dierent XViews can result in very dierent processing requirements on the underlying relational database. 14

16 4 Related Work Recent related work includes the work by Deutsch, Fernandez, and Suciu [6] on storing semistructured data in relational databases. They provide techniques for mapping between the semistructured data model and the relational data model, using a query language called STORED. Their focus is on semistructured data that is not associated with an a priori DTD. Thus, they provide tools to impose some structure on the documents so that a mapping can be found to relational tables. They provide for the fact that the documents may have a \true" semistructured aspect to them, where a mapping to relational tables is not feasible or ecient. In this case, they allow storage of that part of the semistructured data in so-called overow tables. Thus, the work we have discussed here in Section 2 addressed a more structured problem than that addressed by STORED. In our case, the DTD of the XML document is known in advance. The recent work on integrating XML and SQL reported in [5] also discusses some issues in mapping between the complex hierarchical structures of XML and the tabular structures of SQL. In particular, they argue that the semantics of a hierarchical XML structure ought to be captured using the outer-join operation between tables corresponding to the root element and its sub-elements. Other work that is generally related to the issues discussed here is the prior work on wrapping non-relational sources and the work on conversion between the relational data model and other data models. For example, issues in representing extended entity-relationship structures in relational database have been addressed extensively in [14]. Approaches for providing database interfaces to semistructured information, especially data on Web sites and Web information sources, have been discussed in [13, 9, 19]. While the work described here is similar in approach, our focus is specically on the conversion to and from XML DTD's and relational schemas. 5 Future work In this paper, we have reported our preliminary work in the area of XViews. We are interested in studying a variety of schemas to understand which heuristics are most useful in enumerating candidate XViews. We are also interested in understanding how much of the work is automatable using tools to generate candidate XViews, before requiring user intervention. Also, we are interested in investigating the issue of using relation cardinalities and query workloads to determine 15

17 useful XViews and, also, to rank candidate XViews. Acknowledgements The DOCT project was funded by DARPA and USPTO. We thank Larry Cogut and Pam Rinehart of the USPTO for extensive discussions regarding the SGML DTD for patents and USPTO applications requirements. This work was also partly funded by grants from the National Archives and Records Administration (NARA) and the Department of Energy (DOE) under its ASCI program. Initial discussions on XViews were carried out with Bertram Ludaescher, a member of the MIX project at SDSC. References [1] C. Baru, A. Gupta, V. Chu, B. Ludaescher, R. Marciano, Y. Papakonstantinou, and P. Velikhov. XML- Based Information Mediation for Digital Libraries. In in Demo Session, ACM Digital Libraries'99, Berkeley, CA, [2] C. Baru, A. Gupta, B. Ludaescher, R. Marciano, Y. Papakonstantinou, and P. Velikhov. XML-Based Information Mediation with MIX. In in Demo Session, ACM-SIGMOD'99, Philadelphia, PA, [3] C. Baru, B. Ludaescher, Y. Papakonstantinou, P. Velikhov, and V. Vianu. Features and Requirements for an XML View Denition Language: Lessons from XML Information Mediation. In position paper, W3C Workshop on Query Languages, Boston, MA, [4] C. Baru, R. Moore, A. Rajasekar, and M. Wan. The SDSC Storage Resource Broker. In Procs. of CASCON'98, Toronto, Canada, [5] M. David. SQL-Based XML Structured Data Access. In WebTechniques.com, [6] A. Deutsch, M. Fernandez, and D. Suciu. Storing Semistructured Data with STORED. In Procs. of ACM SIGMOD'99, Philadelphia, PA, [7] DOCT. The Distributed Object Computation Testbed Project. In San Diego Supercomputer Center, La Jolla, CA, [8] Excelon. Object Design Inc. In [9] M. Fernandez, D. Florescu, A. Levy, and D. Suciu. Web-Site Management: The Strudel Approach. In IEEE Bulletin of the Technical Committee on Data Engineering, IEEE CS, Washington, DC, [10] A. Gupta, R. Marciano, I. Zaslavsky, and C. Baru. Integrating GIS and Imagery through XML-Based Information Mediation. In Procs. of NSF International Workshop on Integrated Spatial Databases: 16

18 Digital Images and GIS. To appear in Lecture Notes in Computer Science, Springer-Verlag, Portland, Maine, [11] HPSS. High Performance Storage System. In San Diego Supercomputer Center, La Jolla, CA, [12] B. Ludaescher and A. Gupta. Modeling Interactive Web Sources for Information Mediation. In Procs. of International Workshop on World-Wide Web and Conceptual Modeling, Lecture Notes in Computer Science, Springer, Paris, France, [13] S. Malaika. Resistance is Futile: The Web will assimilate your Database. In IEEE Bulletin of the Technical Committee on Data Engineering, IEEE CS, Washington, DC, [14] V. M. Markowitz and A. Shoshani. Representing extended entity-relationship structures in relational databases: A modular approach. TODS, 17(3):423{464, [15] MCAT. MCAT - A Meta-Information Catalog. In San Diego Supercomputer Center, La Jolla, CA, [16] MIX. Mediation of Information using XML. In San Diego Supercomputer Center, La Jolla, CA, [17] K. Munroe and Y. Papakonstantinou. BBQ: A Visual Interface for Integrated Browsing and Querying of XML. In paper submitted to 5th IFIP 2.6 Working Conference on Visual Database Systems, Fukuoka, Japan, [18] Oracle8i. Oracle and XML. In [19] S. Prasad and A. Rajaram. Virtual Database Technology, XML, and the Evolution of the Web. In IEEE Bulletin of the Technical Committee on Data Engineering, IEEE CS, Washington, DC, [20] W3C. The World Wide Web Consortium. In [21] W3C. XML Representation of a Relational Database. In World Wide Web Consortium, [22] XML. The Extensible Markup Language, Version 1.0. In

Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation

Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation Page 1 of 5 Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation 1. Introduction C. Baru, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, V. Vianu XML indicates

More information

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS Reagan W. Moore San Diego Supercomputer Center San Diego, CA, USA Abstract Scientific applications now have data management requirements that extend

More information

Knowledge-based Grids

Knowledge-based Grids Knowledge-based Grids Reagan Moore San Diego Supercomputer Center (http://www.npaci.edu/dice/) Data Intensive Computing Environment Chaitan Baru Walter Crescenzi Amarnath Gupta Bertram Ludaescher Richard

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

A Bottom-up Strategy for Query Decomposition

A Bottom-up Strategy for Query Decomposition A Bottom-up Strategy for Query Decomposition Le Thi Thu Thuy, Doan Dai Duong, Virendrakumar C. Bhavsar and Harold Boley Faculty of Computer Science, University of New Brunswick Fredericton, New Brunswick,

More information

Implementing Trusted Digital Repositories

Implementing Trusted Digital Repositories Implementing Trusted Digital Repositories Reagan W. Moore, Arcot Rajasekar, Richard Marciano San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA 92093-0505 {moore, sekar, marciano}@sdsc.edu

More information

Collection-Based Persistent Digital Archives - Part 1

Collection-Based Persistent Digital Archives - Part 1 Página 1 de 16 D-Lib Magazine March 2000 Volume 6 Number 3 ISSN 1082-9873 Collection-Based Persistent Digital Archives - Part 1 Reagan Moore, Chaitan Baru, Arcot Rajasekar, Bertram Ludaescher, Richard

More information

An approach to the model-based fragmentation and relational storage of XML-documents

An approach to the model-based fragmentation and relational storage of XML-documents An approach to the model-based fragmentation and relational storage of XML-documents Christian Süß Fakultät für Mathematik und Informatik, Universität Passau, D-94030 Passau, Germany Abstract A flexible

More information

Information Mediation Across Heterogeneous Government Spatial Data Sources

Information Mediation Across Heterogeneous Government Spatial Data Sources Information Mediation Across Heterogeneous Government Spatial Data Sources Amarnath Gupta Ashraf Memon Joshua Tran Rajiv P. Bharadwaja Ilya Zaslavsky San Diego Supercomputer Center University of California

More information

XML: Extensible Markup Language

XML: Extensible Markup Language XML: Extensible Markup Language CSC 375, Fall 2015 XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both. Matthew Might Slides slightly modified

More information

DATA MODELS FOR SEMISTRUCTURED DATA

DATA MODELS FOR SEMISTRUCTURED DATA Chapter 2 DATA MODELS FOR SEMISTRUCTURED DATA Traditionally, real world semantics are captured in a data model, and mapped to the database schema. The real world semantics are modeled as constraints and

More information

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources In Proceedings of the Fifth National Computer Science and Engineering Conference (NSEC 2001), Chiang Mai University, Chiang Mai, Thailand, November 2001. Overview of the Integration Wizard Project for

More information

Web site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client.

Web site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client. (Published in WebNet 97: World Conference of the WWW, Internet and Intranet, Toronto, Canada, Octobor, 1997) WebView: A Multimedia Database Resource Integration and Search System over Web Deepak Murthy

More information

Browsing in the tsimmis System. Stanford University. into requests the source can execute. The data returned by the source is converted back into the

Browsing in the tsimmis System. Stanford University. into requests the source can execute. The data returned by the source is converted back into the Information Translation, Mediation, and Mosaic-Based Browsing in the tsimmis System SIGMOD Demo Proposal (nal version) Joachim Hammer, Hector Garcia-Molina, Kelly Ireland, Yannis Papakonstantinou, Jerey

More information

Using Relational Database metadata to generate enhanced XML structure and document Abstract 1. Introduction

Using Relational Database metadata to generate enhanced XML structure and document Abstract 1. Introduction Using Relational Database metadata to generate enhanced XML structure and document Sherif Sakr - Mokhtar Boshra Faculty of Computers and Information Cairo University {sakr,mboshra}@cu.edu.eg Abstract Relational

More information

The International Journal of Digital Curation Issue 1, Volume

The International Journal of Digital Curation Issue 1, Volume Towards a Theory of Digital Preservation 63 Towards a Theory of Digital Preservation Reagan Moore, San Diego Supercomputer Center June 2008 Abstract A preservation environment manages communication from

More information

Semistructured Data Store Mapping with XML and Its Reconstruction

Semistructured Data Store Mapping with XML and Its Reconstruction Semistructured Data Store Mapping with XML and Its Reconstruction Enhong CHEN 1 Gongqing WU 1 Gabriela Lindemann 2 Mirjam Minor 2 1 Department of Computer Science University of Science and Technology of

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1 Slide 27-1 Chapter 27 XML: Extensible Markup Language Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree) Data Model. XML Documents, DTD, and XML Schema.

More information

Wrapper 2 Wrapper 3. Information Source 2

Wrapper 2 Wrapper 3. Information Source 2 Integration of Semistructured Data Using Outer Joins Koichi Munakata Industrial Electronics & Systems Laboratory Mitsubishi Electric Corporation 8-1-1, Tsukaguchi Hon-machi, Amagasaki, Hyogo, 661, Japan

More information

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry I-Chen Wu 1 and Shang-Hsien Hsieh 2 Department of Civil Engineering, National Taiwan

More information

SXML: an XML document as an S-expression

SXML: an XML document as an S-expression SXML: an XML document as an S-expression Kirill Lisovsky, Dmitry Lizorkin Institute for System Programming RAS, Moscow State University lisovsky@acm.org lizorkin@hotbox.ru Abstract This article is the

More information

Querying XML data: Does One Query Language Fit All? Abstract 1.0 Introduction 2.0 Background: Querying XML documents

Querying XML data: Does One Query Language Fit All? Abstract 1.0 Introduction 2.0 Background: Querying XML documents Querying XML data: Does One Query Language Fit All? V. Ramesh, Arijit Sengupta and Bryan Reinicke venkat@indiana.edu, asengupt@indiana.edu, breinick@indiana.edu Kelley School of Business, Indiana University,

More information

For our sample application we have realized a wrapper WWWSEARCH which is able to retrieve HTML-pages from a web server and extract pieces of informati

For our sample application we have realized a wrapper WWWSEARCH which is able to retrieve HTML-pages from a web server and extract pieces of informati Meta Web Search with KOMET Jacques Calmet and Peter Kullmann Institut fur Algorithmen und Kognitive Systeme (IAKS) Fakultat fur Informatik, Universitat Karlsruhe Am Fasanengarten 5, D-76131 Karlsruhe,

More information

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington CS330 Lecture April 8, 2003 1 Overview From HTML to XML DTDs Querying XML: XPath Transforming XML: XSLT

More information

Controlled Access and Dissemination of XML Documents

Controlled Access and Dissemination of XML Documents Controlled Access and Dissemination of XML Documents Elisa Bertino Silvana Castano Elena Ferrari Dip. di Scienze dell'informazione Universita degli Studi di Milano Via Comelico, 39/41 20135 Milano, Italy

More information

Generalized Document Data Model for Integrating Autonomous Applications

Generalized Document Data Model for Integrating Autonomous Applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Generalized Document Data Model for Integrating Autonomous Applications Zsolt Hernáth, Zoltán Vincellér Abstract

More information

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment Paul Watry Univ. of Liverpool, NaCTeM pwatry@liverpool.ac.uk Ray Larson Univ. of California, Berkeley

More information

Heading-Based Sectional Hierarchy Identification for HTML Documents

Heading-Based Sectional Hierarchy Identification for HTML Documents Heading-Based Sectional Hierarchy Identification for HTML Documents 1 Dept. of Computer Engineering, Boğaziçi University, Bebek, İstanbul, 34342, Turkey F. Canan Pembe 1,2 and Tunga Güngör 1 2 Dept. of

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson

XML RETRIEVAL. Introduction to Information Retrieval CS 150 Donald J. Patterson Introduction to Information Retrieval CS 150 Donald J. Patterson Content adapted from Manning, Raghavan, and Schütze http://www.informationretrieval.org OVERVIEW Introduction Basic XML Concepts Challenges

More information

Resolving Schema and Value Heterogeneities for XML Web Querying

Resolving Schema and Value Heterogeneities for XML Web Querying Resolving Schema and Value Heterogeneities for Web ing Nancy Wiegand and Naijun Zhou University of Wisconsin 550 Babcock Drive Madison, WI 53706 wiegand@cs.wisc.edu, nzhou@wisc.edu Isabel F. Cruz and William

More information

A Simple Mass Storage System for the SRB Data Grid

A Simple Mass Storage System for the SRB Data Grid A Simple Mass Storage System for the SRB Data Grid Michael Wan, Arcot Rajasekar, Reagan Moore, Phil Andrews San Diego Supercomputer Center SDSC/UCSD/NPACI Outline Motivations for implementing a Mass Storage

More information

ST.96 - ANNEX VI TRANSFORMATION RULES AND GUIDELINES. Version 3.0

ST.96 - ANNEX VI TRANSFORMATION RULES AND GUIDELINES. Version 3.0 page: 3.96.vi.1 ST.96 - ANNEX VI TRANSFORMATION RULES AND GUIDELINES Version 3.0 Revision approved by the XML4IP Task Force of the Committee of WIPO Standards (CWS) on February 26, 2018 Table of Contents

More information

Folder(Inbox) Message Message. Body

Folder(Inbox) Message Message. Body Rening OEM to Improve Features of Query Languages for Semistructured Data Pavel Hlousek Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic Abstract. Semistructured data can

More information

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t Data Reduction - an Adaptation Technique for Mobile Environments A. Heuer, A. Lubinski Computer Science Dept., University of Rostock, Germany Keywords. Reduction. Mobile Database Systems, Data Abstract.

More information

UML-Based Conceptual Modeling of Pattern-Bases

UML-Based Conceptual Modeling of Pattern-Bases UML-Based Conceptual Modeling of Pattern-Bases Stefano Rizzi DEIS - University of Bologna Viale Risorgimento, 2 40136 Bologna - Italy srizzi@deis.unibo.it Abstract. The concept of pattern, meant as an

More information

Detecting Code Similarity Using Patterns. K. Kontogiannis M. Galler R. DeMori. McGill University

Detecting Code Similarity Using Patterns. K. Kontogiannis M. Galler R. DeMori. McGill University 1 Detecting Code Similarity Using atterns K. Kontogiannis M. Galler R. DeMori McGill University 3480 University St., Room 318, Montreal, Canada H3A 2A7 Abstract Akey issue in design recovery is to localize

More information

Introduction. Web Pages. Example Graph

Introduction. Web Pages. Example Graph COSC 454 DB And the Web Introduction Overview Dynamic web pages XML and databases Reference: (Elmasri & Navathe, 5th ed) Ch. 26 - Web Database Programming Using PHP Ch. 27 - XML: Extensible Markup Language

More information

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values XML Storage CPS 296.1 Topics in Database Systems Approaches Text files Use DOM/XSLT to parse and access XML data Specialized DBMS Lore, Strudel, exist, etc. Still a long way to go Object-oriented DBMS

More information

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON.

Fundamentals of. Database Systems. Shamkant B. Navathe. College of Computing Georgia Institute of Technology PEARSON. Fundamentals of Database Systems 5th Edition Ramez Elmasri Department of Computer Science and Engineering The University of Texas at Arlington Shamkant B. Navathe College of Computing Georgia Institute

More information

Fundamentals of Design, Implementation, and Management Tenth Edition

Fundamentals of Design, Implementation, and Management Tenth Edition Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition Chapter 3 Data Models Database Systems, 10th Edition 1 Objectives In this chapter, you will learn: About data modeling

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Beyond XML Query Languages Citation for published version: Buneman, P, Deutsch, A, Fan, W, Liefke, H, Sahuguet, A & Tan, W-C 1998, Beyond XML Query Languages. in Query Language

More information

NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0. Version 1 October 2017

NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0. Version 1 October 2017 NISO STS (Standards Tag Suite) Differences Between ISO STS 1.1 and NISO STS 1.0 Version 1 October 2017 1 Introduction...1 1.1 Four NISO STS Tag Sets...1 1.2 Relationship of NISO STS to ISO STS...1 1.3

More information

XML DATA WAREHOUSE: MODELLING AND QUERYING

XML DATA WAREHOUSE: MODELLING AND QUERYING XML DATA WAREHOUSE: MODELLING AND QUERYING Jaroslav Pokorný Department of Software Engineering Faculty of Mathematics and Physics Malostranske nam. 25 118 00 Prague - Czech Republic email: pokorny@ksi.ms.mff.cuni.cz

More information

UNIT 3 XML DATABASES

UNIT 3 XML DATABASES UNIT 3 XML DATABASES XML Databases: XML Data Model DTD - XML Schema - XML Querying Web Databases JDBC Information Retrieval Data Warehousing Data Mining. 3.1. XML Databases: XML Data Model The common method

More information

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is

More information

Database Systems Concepts *

Database Systems Concepts * OpenStax-CNX module: m28156 1 Database Systems Concepts * Nguyen Kim Anh This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract This module introduces

More information

Information Management (IM)

Information Management (IM) 1 2 3 4 5 6 7 8 9 Information Management (IM) Information Management (IM) is primarily concerned with the capture, digitization, representation, organization, transformation, and presentation of information;

More information

Review -Chapter 4. Review -Chapter 5

Review -Chapter 4. Review -Chapter 5 Review -Chapter 4 Entity relationship (ER) model Steps for building a formal ERD Uses ER diagrams to represent conceptual database as viewed by the end user Three main components Entities Relationships

More information

Universita degli Studi di Roma Tre. Dipartimento di Informatica e Automazione. Design and Maintenance of. Data-Intensive Web Sites

Universita degli Studi di Roma Tre. Dipartimento di Informatica e Automazione. Design and Maintenance of. Data-Intensive Web Sites Universita degli Studi di Roma Tre Dipartimento di Informatica e Automazione Via della Vasca Navale, 84 { 00146 Roma, Italy. Design and Maintenance of Data-Intensive Web Sites Paolo Atzeni y, Giansalvatore

More information

Building the Archives of the Future: Self-Describing Records

Building the Archives of the Future: Self-Describing Records Building the Archives of the Future: Self-Describing Records Kenneth Thibodeau Director, Electronic Records Archives Program National Archives and Records Administration July 18, 2001 The Electronic Records

More information

Mitigating Risk of Data Loss in Preservation Environments

Mitigating Risk of Data Loss in Preservation Environments Storage Resource Broker Mitigating Risk of Data Loss in Preservation Environments Reagan W. Moore San Diego Supercomputer Center Joseph JaJa University of Maryland Robert Chadduck National Archives and

More information

Ecient XPath Axis Evaluation for DOM Data Structures

Ecient XPath Axis Evaluation for DOM Data Structures Ecient XPath Axis Evaluation for DOM Data Structures Jan Hidders Philippe Michiels University of Antwerp Dept. of Math. and Comp. Science Middelheimlaan 1, BE-2020 Antwerp, Belgium, fjan.hidders,philippe.michielsg@ua.ac.be

More information

The Architecture of a System for the Indexing of Images by. Content

The Architecture of a System for the Indexing of Images by. Content The Architecture of a System for the Indexing of s by Content S. Kostomanolakis, M. Lourakis, C. Chronaki, Y. Kavaklis, and S. C. Orphanoudakis Computer Vision and Robotics Laboratory Institute of Computer

More information

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata COGNOS (R) 8 FRAMEWORK MANAGER GUIDELINES FOR MODELING METADATA Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata GUIDELINES FOR MODELING METADATA THE NEXT LEVEL OF PERFORMANCE

More information

USING XML AS A MEDIUM FOR DESCRIBING, MODIFYING AND QUERYING AUDIOVISUAL CONTENT STORED IN RELATIONAL DATABASE SYSTEMS

USING XML AS A MEDIUM FOR DESCRIBING, MODIFYING AND QUERYING AUDIOVISUAL CONTENT STORED IN RELATIONAL DATABASE SYSTEMS USING XML AS A MEDIUM FOR DESCRIBING, MODIFYING AND QUERYING AUDIOVISUAL CONTENT STORED IN RELATIONAL DATABASE SYSTEMS Iraklis Varlamis 1, Michalis Vazirgiannis 2, Panagiotis Poulos 3 1,2) Dept of Informatics,

More information

Sangam: A Framework for Modeling Heterogeneous Database Transformations

Sangam: A Framework for Modeling Heterogeneous Database Transformations Sangam: A Framework for Modeling Heterogeneous Database Transformations Kajal T. Claypool University of Massachusetts-Lowell Lowell, MA Email: kajal@cs.uml.edu Elke A. Rundensteiner Worcester Polytechnic

More information

Accuracy Avg Error % Per Document = 9.2%

Accuracy Avg Error % Per Document = 9.2% Quixote: Building XML Repositories from Topic Specic Web Documents Christina Yip Chung and Michael Gertz Department of Computer Science, University of California, Davis, CA 95616, USA fchungyjgertzg@cs.ucdavis.edu

More information

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom

Dataspaces: A New Abstraction for Data Management. Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Dataspaces: A New Abstraction for Data Management Mike Franklin, Alon Halevy, David Maier, Jennifer Widom Today s Agenda Why databases are great. What problems people really have Why databases are not

More information

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.

Introduction to XML. Yanlei Diao UMass Amherst April 17, Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. Introduction to XML Yanlei Diao UMass Amherst April 17, 2008 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau. 1 Structure in Data Representation Relational data is highly

More information

Reuse Contracts As Component Interface. Descriptions. Koen De Hondt, Carine Lucas, and Patrick Steyaert. Programming Technology Lab

Reuse Contracts As Component Interface. Descriptions. Koen De Hondt, Carine Lucas, and Patrick Steyaert. Programming Technology Lab Reuse Contracts As Component Interface Descriptions Koen De Hondt, Carine Lucas, and Patrick Steyaert Programming Technology Lab Computer Science Department Vrije Universiteit Brussel Pleinlaan 2, B-1050

More information

CSE 880. Advanced Database Systems. Semistuctured Data and XML

CSE 880. Advanced Database Systems. Semistuctured Data and XML CSE 880 Advanced Database Systems Semistuctured Data and XML S. Pramanik 1 Semistructured Data 1. Data is self describing with schema embedded to the data itself. 2. Theembeddedschemacanchangewithtimejustlike

More information

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca

2. PRELIMINARIES MANICURE is specically designed to prepare text collections from printed materials for information retrieval applications. In this ca The MANICURE Document Processing System Kazem Taghva, Allen Condit, Julie Borsack, John Kilburg, Changshi Wu, and Je Gilbreth Information Science Research Institute University of Nevada, Las Vegas ABSTRACT

More information

Chapter 4. The Relational Model

Chapter 4. The Relational Model Chapter 4 The Relational Model Chapter 4 - Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations and relations in the relational model.

More information

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction

CS425 Fall 2016 Boris Glavic Chapter 1: Introduction CS425 Fall 2016 Boris Glavic Chapter 1: Introduction Modified from: Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Textbook: Chapter 1 1.2 Database Management System (DBMS)

More information

Using semantic causality graphs to validate MAS models

Using semantic causality graphs to validate MAS models Using semantic causality graphs to validate MAS models Guillermo Vigueras 1, Jorge J. Gómez 2, Juan A. Botía 1 and Juan Pavón 2 1 Facultad de Informática Universidad de Murcia Spain 2 Facultad de Informática

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

P to ~o, c E E' D' INl -a~s

P to ~o, c E E' D' INl -a~s 02- -4S The 6th World Muiticonference on Sy,sttemics, Cyib.emetics and [nformatics July 14-18,2002 - Orlando, Florida, USA P to ~o, c E E' D' INl -a~s ~ - - \.. t ~ 1. \.. t. ~ \ Volume VII Information

More information

XML publishing. Querying and storing XML. From relations to XML Views. From relations to XML Views

XML publishing. Querying and storing XML. From relations to XML Views. From relations to XML Views Querying and storing XML Week 5 Publishing relational data as XML XML publishing XML DB Exporting and importing XML data shared over Web Key problem: defining relational-xml views specifying mappings from

More information

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade Storage Resource Broker Digital Curation and Preservation: Defining the Research Agenda for the Next Decade Reagan W. Moore moore@sdsc.edu http://www.sdsc.edu/srb Background NARA research prototype persistent

More information

extensible Markup Language

extensible Markup Language extensible Markup Language XML is rapidly becoming a widespread method of creating, controlling and managing data on the Web. XML Orientation XML is a method for putting structured data in a text file.

More information

Standard modeling support Automatic propagation of a foreign key from parent to child entities in a physical model x x

Standard modeling support Automatic propagation of a foreign key from parent to child entities in a physical model x x F E AT U R E S design it build it run it design it build it run it design it build it run it design it build it run it design it build it run it design it build it run it design it build it run it design

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Garlic Query Services Catalog. Data Source. Data Source

Garlic Query Services Catalog. Data Source. Data Source An Optimizer for Heterogeneous Systems with NonStandard Data and Search Capabilities Laura M. Haas Donald Kossmann y Edward L. Wimmers z Jun Yang x IBM Almaden Research Center San Jose, CA 95120 Abstract

More information

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability

More information

Indexing XML Data with ToXin

Indexing XML Data with ToXin Indexing XML Data with ToXin Flavio Rizzolo, Alberto Mendelzon University of Toronto Department of Computer Science {flavio,mendel}@cs.toronto.edu Abstract Indexing schemes for semistructured data have

More information

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J.

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J. Compilation Issues for High Performance Computers: A Comparative Overview of a General Model and the Unied Model Abstract This paper presents a comparison of two models suitable for use in a compiler for

More information

A System for Storing, Retrieving, Organizing and Managing Web Services Metadata Using Relational Database *

A System for Storing, Retrieving, Organizing and Managing Web Services Metadata Using Relational Database * BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 6, No 1 Sofia 2006 A System for Storing, Retrieving, Organizing and Managing Web Services Metadata Using Relational Database

More information

Development of an Ontology-Based Portal for Digital Archive Services

Development of an Ontology-Based Portal for Digital Archive Services Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw

More information

Chapter 1 SQL and Data

Chapter 1 SQL and Data Chapter 1 SQL and Data What is SQL? Structured Query Language An industry-standard language used to access & manipulate data stored in a relational database E. F. Codd, 1970 s IBM 2 What is Oracle? A relational

More information

Chapter 1: Semistructured Data Management XML

Chapter 1: Semistructured Data Management XML Chapter 1: Semistructured Data Management XML XML - 1 The Web has generated a new class of data models, which are generally summarized under the notion semi-structured data models. The reasons for that

More information

A Metadata Catalog Service for Data Intensive Applications

A Metadata Catalog Service for Data Intensive Applications Metadata Catalog Service Draft August 5, 2002 A Metadata Catalog Service for Data Intensive Applications Ann Chervenak, Ewa Deelman, Carl Kesselman, Laura Pearlman, Gurmeet Singh Version 1.0 1 Introduction

More information

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 A Web Service-Based System for Sharing Distributed XML Data Using Customizable

More information

10. Documents and Data Models... and Modeling

10. Documents and Data Models... and Modeling 10. Documents and Data Models... and Modeling INFO 202-1 October 2008 Bob Glushko Plan for INFO Lecture #10 Modeling across the "Document Type Spectrum" Document models {and,or,vs} data models "Berkeley

More information

Relational Data Model is quite rigid. powerful, but rigid.

Relational Data Model is quite rigid. powerful, but rigid. Lectures Desktop - 2 (C) Page 1 XML Tuesday, April 27, 2004 8:43 AM Motivation: Relational Data Model is quite rigid. powerful, but rigid. With the explosive growth of the Internet, electronic information

More information

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Part XII Mapping XML to Databases Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321 Outline of this part 1 Mapping XML to Databases Introduction 2 Relational Tree Encoding Dead Ends

More information

Data integration supports seamless access to autonomous, heterogeneous information

Data integration supports seamless access to autonomous, heterogeneous information Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources

More information

FACULTY OF ENGINEERING B.E. 4/4 (CSE) II Semester (Old) Examination, June Subject : Information Retrieval Systems (Elective III) Estelar

FACULTY OF ENGINEERING B.E. 4/4 (CSE) II Semester (Old) Examination, June Subject : Information Retrieval Systems (Elective III) Estelar B.E. 4/4 (CSE) II Semester (Old) Examination, June 2014 Subject : Information Retrieval Systems Code No. 6306 / O 1 Define Information retrieval systems. 3 2 What is precision and recall? 3 3 List the

More information

A Mixed Fragmentation Methodology For. Initial Distributed Database Design. Shamkant B. Navathe. Georgia Institute of Technology.

A Mixed Fragmentation Methodology For. Initial Distributed Database Design. Shamkant B. Navathe. Georgia Institute of Technology. A Mixed Fragmentation Methodology For Initial Distributed Database Design Shamkant B. Navathe Georgia Institute of Technology Kamalakar Karlapalem Hong Kong University of Science and Technology Minyoung

More information

Chapter 17. Methodology Logical Database Design for the Relational Model

Chapter 17. Methodology Logical Database Design for the Relational Model Chapter 17 Methodology Logical Database Design for the Relational Model Chapter 17 - Objectives How to derive a set of relations from a conceptual data model. How to validate these relations using the

More information

Using UML To Define XML Document Types

Using UML To Define XML Document Types Using UML To Define XML Document Types W. Eliot Kimber ISOGEN International, A DataChannel Company Created On: 10 Dec 1999 Last Revised: 14 Jan 2000 Defines a convention for the use of UML to define XML

More information

0. Database Systems 1.1 Introduction to DBMS Information is one of the most valuable resources in this information age! How do we effectively and efficiently manage this information? - How does Wal-Mart

More information

A new generation of tools for SGML

A new generation of tools for SGML Article A new generation of tools for SGML R. W. Matzen Oklahoma State University Department of Computer Science EMAIL rmatzen@acm.org Exceptions are used in many standard DTDs, including HTML, because

More information

On Computing the Minimal Labels in Time. Point Algebra Networks. IRST { Istituto per la Ricerca Scientica e Tecnologica. I Povo, Trento Italy

On Computing the Minimal Labels in Time. Point Algebra Networks. IRST { Istituto per la Ricerca Scientica e Tecnologica. I Povo, Trento Italy To appear in Computational Intelligence Journal On Computing the Minimal Labels in Time Point Algebra Networks Alfonso Gerevini 1;2 and Lenhart Schubert 2 1 IRST { Istituto per la Ricerca Scientica e Tecnologica

More information

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure Databases databases Terminology of relational model Properties of database relations. Relational Keys. Meaning of entity integrity and referential integrity. Purpose and advantages of views. The relational

More information

XML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1

XML in Databases. Albrecht Schmidt.   al. Albrecht Schmidt, Aalborg University 1 XML in Databases Albrecht Schmidt al@cs.auc.dk http://www.cs.auc.dk/ al Albrecht Schmidt, Aalborg University 1 What is XML? (1) Where is the Life we have lost in living? Where is the wisdom we have lost

More information

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems Practical Database Design Methodology and Use of UML Diagrams 406.426 Design & Analysis of Database Systems Jonghun Park jonghun@snu.ac.kr Dept. of Industrial Engineering Seoul National University chapter

More information

An ODBC CORBA-Based Data Mediation Service

An ODBC CORBA-Based Data Mediation Service An ODBC CORBA-Based Data Mediation Service Paul L. Bergstein Dept. of Computer and Information Science University of Massachusetts Dartmouth, Dartmouth MA pbergstein@umassd.edu Keywords: Data mediation,

More information

A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar

A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar ABSTRACT Management of multihierarchical XML encodings has attracted attention of a

More information

Motivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion

Motivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion JSON Schema-less into RDBMS Most of the material was taken from the Internet and the paper JSON data management: sup- porting schema-less development in RDBMS, Liu, Z.H., B. Hammerschmidt, and D. McMahon,

More information