Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation

Similar documents
XML: Extensible Markup Language

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

Chapter 1: Getting Started. You will learn:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

You may print, preview, or create a file of the report. File options are: PDF, XML, HTML, RTF, Excel, or CSV.

Describe The Differences In Meaning Between The Terms Relation And Relation Schema

The Extensible Markup Language (XML) and Java technology are natural partners in helping developers exchange data and programs across the Internet.

Introduction to XML. XML: basic elements

Web Services Interfaces

Generalized Document Data Model for Integrating Autonomous Applications

A Bottom-up Strategy for Query Decomposition

Design issues in XML formats

Introduction to Databases CS348

Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization

Beginning To Define ebxml Initial Draft

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

Briefing Paper: developing the DOI Namespace

The Myx Architectural Style

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

Web Technologies Present and Future of XML

Query Processing and Optimization on the Web

Part III. Issues in Search Computing

A SEMANTIC MATCHMAKER SERVICE ON THE GRID

Lesson 5 Web Service Interface Definition (Part II)

1.1 Jadex - Engineering Goal-Oriented Agents

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

An ODBC CORBA-Based Data Mediation Service

The XML Metalanguage

New Approach to Graph Databases

Publishing the Norwegian Petroleum Directorate s FactPages as Semantic Web Data

Essay Question: Explain 4 different means by which constrains are represented in the Conceptual Data Model (CDM).

Wide Area Query Systems The Hydra of Databases

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

Model-Solver Integration in Decision Support Systems: A Web Services Approach

Database Heterogeneity

R2RML by Assertion: A Semi-Automatic Tool for Generating Customised R2RML Mappings

Vocabulary Harvesting Using MatchIT. By Andrew W Krause, Chief Technology Officer

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Schema-Agnostic Indexing with Azure Document DB

ORDBMS - Introduction

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

E-Agricultural Services and Business

An ECA Engine for Deploying Heterogeneous Component Languages in the Semantic Web

Lecture 8. Database Management and Queries

A Framework for Generic Integration of XML Sources. Wolfgang May Institut für Informatik Universität Freiburg Germany

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

Semantic search and reporting implementation on platform. Victor Agroskin

Ontology Servers and Metadata Vocabulary Repositories

Chapter 6 Object Persistence, Relationships and Queries

An Evolution of Mathematical Tools

XViews: XML views of relational schemas

Introduction to Information Systems

Distributed Databases: SQL vs NoSQL

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

(All chapters begin with an Introduction end with a Summary, Exercises, and Reference and Bibliography) Preliminaries An Overview of Database

1. Introduction to the Common Language Infrastructure

Inventions on using LDAP for different purposes- Part-3

CSE 544 Data Models. Lecture #3. CSE544 - Spring,

Conceptual schema matching with the Ontology Mapping Language: requirements and evaluation

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Approach for Mapping Ontologies to Relational Databases

Using the Semantic Web in Ubiquitous and Mobile Computing

11. EXTENSIBLE MARKUP LANGUAGE (XML)

Managing and Securing Computer Networks. Guy Leduc. Chapter 2: Software-Defined Networks (SDN) Chapter 2. Chapter goals:

Database Management

Problem Set 2 Solutions

Automated Classification. Lars Marius Garshol Topic Maps

Introduction p. 1 An XML Primer p. 5 History of XML p. 6 Benefits of XML p. 11 Components of XML p. 12 BNF Grammar p. 14 Prolog p. 15 Elements p.

Application of the Peer-to-Peer Paradigm in Digital Libraries

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Querying Microsoft SQL Server (461)

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

> Semantic Web Use Cases and Case Studies

Applying Code Generation Approach in Fabrique Kirill Kalishev, JetBrains

6232B: Implementing a Microsoft SQL Server 2008 R2 Database

foreword to the first edition preface xxi acknowledgments xxiii about this book xxv about the cover illustration

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Efficient, Scalable, and Provenance-Aware Management of Linked Data

By Chung Yeung Pang. The Cases to Tackle:

FINANCIAL REGULATORY REPORTING ACROSS AN EVOLVING SCHEMA

Supports 1-1, 1-many, and many to many relationships between objects

The OWL API: An Introduction

UNIT V *********************************************************************************************

The notion delegation of tasks in Linked Data through agents

XML ALONE IS NOT SUFFICIENT FOR EFFECTIVE WEBEDI

Table of Contents. iii

516. XSLT. Prerequisites. Version 1.2

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources

JSON & MongoDB. Introduction to Databases CompSci 316 Fall 2018

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

Data Warehouse and Data Mining

XML Metadata Standards and Topic Maps

ASGRT Automated Report Generation System

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining

Variable Scope The Main() Function Struct Functions Overloading Functions Using Delegates Chapter 7: Debugging and Error Handling Debugging in Visual

Relational Databases

Transcription:

Page 1 of 5 Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation 1. Introduction C. Baru, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, V. Vianu XML indicates a move towards viewing the Web as a big semistructured database, consisting of multiple autonomous sites which will be modeled around (guess what?) XML and sibling standards for structure and ontology definitions (XML-Data, RDF, DCD, namespaces), APIs (most probably some extension of DOM), source query capability specifications (see below), other standards that may describe the transactional abilities of the sites, and so on. In such an environment a query language will serve for more than "a souped-up version of X-Pointer". Our position focuses on the use of an XML query language as a view definition language that drives XML mediator systems. A mediator selects, restructures and merges information from multiple autonomous sources/sites. It exports an integrated XML view document. Possible applications of such mediator systems are numerous (e.g., virtual shopping malls, virtual agencies,...). Section 2 gives a brief presentation of the under development MIX (Mediation of Information using XML) mediator system in order to illustrate the architecture and the functionality of a typical mediator system. Section 3 presents a list of issues and challenges that we have found in the course of working on the MIX project and the XMAS (XML Matching And Structuring) view definition language. 2. The Architecture and Functionality of the MIX Mediator System The figure below illustrates the architecture of the MIX mediator system [1]. The system accesses a set of XML sources, which are currently information systems wrapped to provide (1) an XML view of their data and (2) a set of XMAS queries that they can answer. (It is likely that in the future there will be XML databases that require no wrapping.) Conceptually, MIX exports an integrated view of the source data. The view definition, provided by the view developer, specifies how the source data will be integrated. Notice that the MIX does not materialize the integrated view in advance. During runtime, an application (which may be the BBQ GUI [1]) issues a XMAS query against the integrated view. The mediator decomposes the client query into queries that are routed to the XML sources. The sources generate and send to the mediator the query results, which are XML documents. The mediator integrates them into the client query result document.

Page 2 of 5 Note that the client accesses the query result using our home-grown version of DOM, called DOM-VXD (DOM for Virtual XML Documents). DOM-VXD does not require a complete copy of the query result to be materialized at the client site. In this way we can pipeline the computation and delivery of the query result document and even navigate into it.

Page 3 of 5 3. What we have for XMAS and What we'd like a Santa "QL Committee" Claus to bring us There are numerous desiderata for a general purpose XML QL; see for an excellent overview. Here, we focus on first lessons learned from designing and implementing the XMAS view definition language for information mediation. A recurring theme of the following discussion is that expressive power comes at a price which may be: inefficient query processing, complex query processors, inability to perform type inference, etc. XMAS currently leans towards a conservative database approach thereby delivering the benefits described in the sections below. The cost is some limitations in the expressive power. Undoubtedly, more powerful features (e.g., see Section 3.6); will also have to be encompassed in XMAS (and, in general, any XML view definition languages). The language should allow for; "graceful" scaling of the expressive power -- a task that will not always be easy. 3.1 XML Faithfulness An XML view definition should be faithful to XML, i.e., it should always produce XML output. The immediate benefit for mediation is that the view is no different from any other XML document. An ideal solution would be to have a language that can never produce non-xml output. However, this approach may compromise the expressive power of the language. A more viable solution is to develop static analysis tools that can notify the view designer of potential errors, i.e., cases where non-xml output may be produced. Note that by controlling; the complexity of the language we will be able to develop effective static analysis tools. XMAS takes a middle-of-the-road approach: Its ability to move data from a source's content to element names or attribute names creates the possibility of non-xml output. This instance of potential unfaithfulness can be caught using a trivial static safety check. On the other hand, XMAS avoids many sources of potential unfaithfulness. For example XMAS has avoided (insofar) the use of Skolem functions, which can be a major source of unfaithfulness. For example, they make it difficult to check whether attributes of elements are unique. Nevertheless, Skolem functions can deliver expressive power that cannot always be substituted by the SQL-like group-by mechanism of XMAS. 3.2 Type Inference Given descriptions of the source documents (such as DTDs, XML-Data specifications, or DCDs), the mediator should be able to compute a correct and precise description of the view. We have developed algorithms [1,2] that compute precise view DTDs from the source DTDs and the view definition. Again, the expressive power versus complexity tradeoff is encountered: DTD inference is not possible for views that move data from a source's content to element names or attribute names; such views are allowed in XMAS. On the other hand, the explicit group-by operator allows us to derive precise types. A closely related property is type conformance of a view definition to a predefined DTD. Similar challenges and tradeoffs with type inference appear in this case as well. 3.3 Closure Under Composition A typical task for query processors (and not only XML ones) is the following: Given (1) a client query q

Page 4 of 5 that refers to a view V and (2) the query which defines V in terms of some source database(s) S, derive a query q' that is equivalent to q and refers directly to S. It is important for the simplicity and efficiency of query processing that q' is expressed in the same language as q and V. That is, the language should be closed under query composition. It is important to note that closure under composition requires careful design of the query language. For example, one can show that a query language that allows Skolem object-ids and regular path expressions but does not allow any other form of recursion, is not closed under composition. In XMAS, closure under composition is achieved by employing a group-by construct instead of Skolem functions. Once again, a compromise in expressiveness leads to a desired property. 3.4 Order: Manipulating, Preserving, and Never-Minding it XML has order. Elements are organized in lists, as opposed to sets. However applications, in general, have different requirements on the order issue. An XML query language should address the following issues under a single semantics: Order Preservation: By default the element order of the input is preserved. Order Manipulation The view designer is able to define a new order of elements. A typical task is the ordering of elements using a ranking function (e.g., in the style of SQL's order-by). Once again notice that extremely powerful order manipulation mechanisms will compromise type inference and closure under composition. Order Nondeterminism: Finally the view designer may want to specify that order in the view is unimportant. In this case he can specify that the order is nondeterministic thereby leaving room for optimization. The above policies deal with specifying an order for the view. Similar considerations apply to checking the order of the input elements. In this case XMAS' default is "don't care" about the order of elements in the input. Conversely, if the view/query has to put a condition on the order of input elements an horizontal navigation regular expression can be used [2]. 3.5 Query Processing: Put an Algebra in Your System In order to apply similar optimization techniques as those developed for relational databases, an XML query language should have an equivalent algebra. An algebra will also facilitate the implementation of pipelining/browsing as described in Section 2. 3.6 All Processing in One Language: The Challenge of Structural Recursion It is desirable to have a language that tackles all tasks required in the mediation process. In this way we can avoid the impedance mismatch caused by using multiple languages and we create extra opportunities for optimization. It seems that two language paradigms have emerged in the XML querying context and they will have to be reconciled: The database paradigm has an underlying logic semantics and is convenient for selecting and integrating objects. It has recently been extended to allow navigational recursion in a clean way. However it does not support (without the use of hard-to-process general recursion) restructuring of "deep" structures.

Page 5 of 5 The functional paradigm, exemplified by XSL and now XQL, is particularly suitable for "deep" transformations of XML documents. However it is cumbersome or less powerful when it comes to conventional search and join. [UnQL] shows a promising direction in bridging the two paradigms. 3.7 Describing the Query Capabilities of Sources XML sources will only be able to support a subset of the queries over the "schema" they export. For example, consider a site that exports a flight information schema (say, DTD) but it only accepts queries that specify a destination, an arrival, and a optional price range. It is clear that a mediator system or, in general, any agent that queries this source must conform to the set of supported queries exported by this source. So it is important that mechanisms are developed for describing the set of queries that are supported by a site [3]. 3.8 Metadata In communities such as, decision support, OLAP, and scientific databases, the term "metadata" refers to auxiliary information associated with defined data sets. For example, this may include attributes that provide further information about the data, annotations/explications associated with the data, and/or computed/derived information related to the data. In general, whether a piece of information is data or metadata depends on one's point of view. In defining a virtual site, the view specification mechanism must provide the capability to specify the data as well as the metadata for that site. Once this distinction between data and metadata has been made at the site level, this information can be used in a variety of ways in the system. For example, it can be used at the interface level to support different query and presentation modes for data vs. metadata, and it can be used at the data transport level to encode data differently than metadata. 4. Conclusions Information mediation poses a challenging set of requirements to an XML query and view definition language. We discuss above several important issues: XML faithfulness, type inference, order, closure under composition, etc. A recurring theme is the search for the right balance between expressive power, on the one side, and efficiency and viability on the other.