Some aspects of references behaviour when querying XML with XQuery

Similar documents
Semistructured Data Store Mapping with XML and Its Reconstruction

XML Query Processing and Optimization

Indexing XML Data with ToXin

Path Query Reduction and Diffusion for Distributed Semi-structured Data Retrieval+

Aspects of an XML-Based Phraseology Database Application

METAXPath. Utah State University. From the SelectedWorks of Curtis Dyreson. Curtis Dyreson, Utah State University Michael H. Böhen Christian S.

Creating a Mediated Schema Based on Initial Correspondences

XML-QE: A Query Engine for XML Data Soures

Chapter 13 XML: Extensible Markup Language

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

A Framework for Processing Complex Document-centric XML with Overlapping Structures Ionut E. Iacob and Alex Dekhtyar

A Web-Based OO Platform for the Development of Didactic Multimedia Collaborative Applications

XML in Databases. Albrecht Schmidt. al. Albrecht Schmidt, Aalborg University 1

An Efficient XML Index Structure with Bottom-Up Query Processing

Folder(Inbox) Message Message. Body

Approaches. XML Storage. Storing arbitrary XML. Mapping XML to relational. Mapping the link structure. Mapping leaf values

Pre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data

Introduction to Semistructured Data and XML

Overview of the Integration Wizard Project for Querying and Managing Semistructured Data in Heterogeneous Sources

Interactive Query and Search in Semistructured Databases æ

Nested XPath Query Optimization for XML Structured Document Database

Full-Text and Structural XML Indexing on B + -Tree

Lab Assignment 3 on XML

Data Structures for Maintaining Path Statistics in Distributed XML Stores

Using Relational Database metadata to generate enhanced XML structure and document Abstract 1. Introduction

Introduction to Semistructured Data and XML. Overview. How the Web is Today. Based on slides by Dan Suciu University of Washington

COMP9321 Web Application Engineering

ToX The Toronto XML Engine

XML and information exchange. XML extensible Markup Language XML

Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database

DSD: A Schema Language for XML

XML: Extensible Markup Language

The Research on Coding Scheme of Binary-Tree for XML

DATA MODELS FOR SEMISTRUCTURED DATA

XSLT program. XSLT elements. XSLT example. An XSLT program is an XML document containing

Introduction to Database Systems CSE 444

Querying Spatiotemporal XML Using DataFoX

Introduction to Database Systems CSE 414

Introduction to Data Management CSE 344

Content Management for the Defense Intelligence Enterprise

ADT 2009 Other Approaches to XQuery Processing

An Improved Prefix Labeling Scheme: A Binary String Approach for Dynamic Ordered XML

Effective Schema-Based XML Query Optimization Techniques

Design of Index Schema based on Bit-Streams for XML Documents

An X-Ray on Web-Available XML Schemas

10/24/12. What We Have Learned So Far. XML Outline. Where We are Going Next. XML vs Relational. What is XML? Introduction to Data Management CSE 344

DISCUSSION 5min 2/24/2009. DTD to relational schema. Inlining. Basic inlining

Informatics 1: Data & Analysis

Efficient Re-construction of Document Versions Based on Adaptive Forward and Backward Change Deltas

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

Optimize Twig Query Pattern Based on XML Schema

CHAOS: An Active Security Mediation System

Framework for Supporting Metadata Services

XML. Jonathan Geisler. April 18, 2008

Structured documents

Querying XML Data. Mary Fernandez. AT&T Labs Research David Maier. Oregon Graduate Institute

A survey of graphical query languages for XML data

Structured Information Retrieval in XML documents

Designing a High Performance Database Engine for the Db4XML Native XML Database System

Ontology Structure of Elements for Web-based Natural Disaster Preparedness Systems

Semantic Web and Databases: Relationships and some Open Problems

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Use of XML Schema and XML Query for ENVISAT product data handling

Association Rule Mining from XML Data

Implementing Web Content

Querying purexml Part 1 The Basics

XML and Agent Communication

Semistructured Content

The XQuery Data Model

Expressing Internationalization and Localization information in XML

Generalized Document Data Model for Integrating Autonomous Applications

Choosing a Data Model and Query Language for Provenance

XSLT. Announcements (October 24) XSLT. CPS 116 Introduction to Database Systems. Homework #3 due next Tuesday Project milestone #2 due November 9

XML, DTD, and XPath. Announcements. From HTML to XML (extensible Markup Language) CPS 116 Introduction to Database Systems. Midterm has been graded

Introduction to Database Systems CSE 414

An Extended Byte Carry Labeling Scheme for Dynamic XML Data

The XOO7 XML Management System Benchmark

CHAPTER 3 LITERATURE REVIEW

Element Algebra. 1 Introduction. M. G. Manukyan

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN EFFECTIVE KEYWORD SEARCH OF FUZZY TYPE IN XML

SphinX: Schema-conscious XML Indexing

Querying XML Data. Querying XML has two components. Selecting data. Construct output, or transform data

XML Data Integration By Graph Restructuring

SXML: an XML document as an S-expression

XML Metadata Standards and Topic Maps

Lecture 3 February 9, 2010

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

An approach to the model-based fragmentation and relational storage of XML-documents

XML databases. Jan Chomicki. University at Buffalo. Jan Chomicki (University at Buffalo) XML databases 1 / 9

Fragmentation of XML Documents

IBM DB2 11 DBA for z/os Certification Review Guide Exam 312

Semistructured Content

Lesson 14 SOA with REST (Part I)

Informatics 1: Data & Analysis

Part XII. Mapping XML to Databases. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 321

Introduction to XML 3/14/12. Introduction to XML

Database Management Systems (CPTR 312)

A distributed editing environment for XML documents

Index-Driven XQuery Processing in the exist XML Database

Transcription:

Some aspects of references behaviour when querying XML with XQuery c B.Khvostichenko boris.khv@pobox.spbu.ru B.Novikov borisnov@acm.org Abstract During the XQuery query evaluation, the query output is built as a completey new structure, containing copies of all elements that satisfy the query. At the same time, references contained in these data point to the original elements, hence the crossreferences between extracted data are not represented in the query output. In this paper, we propose a method for preserving cross-references, thus enriching the tree produced in the query evaluation with a graph structure. 1 Introduction XML is considered the most relevant standard for data representation and exchange among various applications on the Internet. The popularity of XML is in large part due to its flexibility for representing many kinds of information. As the importance of XML has increased, a series of standards has grown up around it, defined by the World Wide Web Consortium (W3C): XML Schema, XPath, XSLT etc. XML data are often quite heterogeneous, and distribute their meta-data throughout the document. XML documents may contain many levels of nested elements and can represent missing information simply by the absence of an element. For these reasons several special languages were designed for querying XML data. One of these languages is XQuery [10], being developed by XML Query Working Group. One can think of XML document as of tree, but XML Standard allows to define references between elements using ID-IDREF couples. Thus, XML documente turns into directed graph where directed edges represent parent-child relationship or IDREF- ID reference. This paper describes the problem arised with XML references when querying XML data with XQuery. It is organized in following way: Section 2 describes the probem in details, Section 3 introduces other approaches to dealing with references. Sec- This works was supported by the Russian Foundation for Basic Research under grant 04-01-00173. Proceedings of the Spring Young Researcher s Colloquium on Database and Information Systems SYR- CoDIS, St.-Petersburg, Russia, 2004 tion 4 expounds suggested solution and Section 5 summarizes the topic. 2 Problem Statement During query evaluation, XQuery builds new structure containing copies of all elements that satisfy the query. While XQuery uses XPath data model, described in [9], it copies IDs and IDREFs as attributes from original into newly created structure. It does not treat references in any special way, considering them as common element attributes. This could lead into some tricky situations, described below (we assume that if original document contains element, referencing with IDREF, then it also contains element with corresponing ID): 1. ID attribute is copied into new structure, IDREF is not copied. The problem is that the same ID already exists in original document. 2. IDREF attribute is copied into new structure, ID is not copied (hanging IDREF). As XML is semistructured data model, we do not pay attention to lack of some elements or attributes. 3. Both ID and corresponding IDREF are copied into new structure. This combines first case with need to distinguish original reference edge from new one. Problem: Define and implement a modified query evaluation technique for XQuery which is able to handle cross-references between extracted XML elements. 2.1 Example As an example we used fragment of musical encyclopaedia, that can be found in Figure 1. It stores bands and musicians in one catalog, containing casts for each band and instruments that person can play. Each cast contains references to musicians that played at this cast. Thus, different casts might probably reference the same musician (e.g. Robert Fripp is in every King Crimson s cast). Moreover, different bands can reference same musician (e.g. Greg Lake appeared in Emerson, Lake and Palmer and King Crimson both). Queries to this encyclopaedia, corresponding to problems described above, are as follows:

<catalog> <band name= Emerson, Lake and Palmer style= rock ID= elp > <artist IDREF= emerson >K. Emerson <band name= King Crimson style= rock ID= kingcrimson > <cast number= 1 years= 1969-1970 > <artist IDREF= giles >M. Giles <cast number= 2 years= 1970-1972 > <artist IDREF= collins >M. Collins <artist IDREF= haskell >G. Haskell <musician name= Keith Emerson ID= emerson > <instrument>piano</instrument> <instrument>organ</instrument> <musician name= Greg Lake ID= lake > <musician name= Carl Palmer ID= palmer > <instrument>percussion</instrument> <musician name= Robert Fripp ID= fripp > </catalog> Figure 1: Music encyclopaedia (fragment) for $i in document( catalog.xml )/catalog/musician where $i/instrument = Guitar return <guitar-player> $i/@id <name>data($i/@name)</name> for $j in $i/instrument return $j Figure 2: Query A. Select all guitar players A. Select all guitar players Figure 2, results are in Figure 5. B. Choose first cast of King Crimson Figure 3, results are in Figure 6. C. Find performers, that contain Lake in their name Figure 4, results are in Figure 7. for $i in document( catalog.xml ) /catalog/band[@name= King Crimson ] return for $j in $i/cast[@number= 1 ] return $j Figure 3: Query B. Choose first cast of King Crimson 3 Related Work 3.1 XQuery As mentioned above, this problem arose with XQuery query language [10] and XPath data model [9]. XQuery does not offer any special tool to treat IDREFs. Moreover, in XPath data model every element should have intrinsic identifier, but this internal identifier exists independently from ID that was given by user.

for $i in document( catalog.xml )// where contains(data($i/name), Lake ) return $i Figure 4: Query C. Find performers, that contain Lake in their name <guitar-player ID= lake > <name>greg Lake</name> <guitar-player ID= fripp > <name>robert Fripp</name> 3.2 Lorel Figure 5: Query A evaluation result Lorel [1] is a query language for Lore [6] database system for semistructured data. Originally, Lore was developed over own data model called OEM. Substantial difference between OEM and XML is that OEM represents graph with directed edges, while XML represents tree. Any object in OEM has its unique identifier and can be referenced with this identifier. Thus, this probem doesn t exist for OEM data model. After migrating Lore from OEM to XML [5], developers paid attention to this difference and introduced new Lore XML-based data model where user can choose, how to interpret IDREFs (as graph edges or as text attributes). 3.3 ODMG ODMG proposed Object Data Model and Object Query Language in its standard ODMG 3.0 [3]. They allow objects to reference each other, but ODMG s data model is strictly typed and has integrity constraints. They treat internal references as graph edges and do not have problems described in Section 2 It was suggested to use XML as an exchange format between applications that comply with ODMG standard [2]. But this suggestion does not use IDREF-ID option of referencing within XML document, it uses special structure to encode relationships. XML is used as media-language to transfer objects from one system to another, so there is no need to query this data. 4 Proposed Solution Proposed solution for problems, described in Section 2, consists of treating IDREF-ID references as graph edges with respect to XML data model and validity constraints. <cast number= 1 years= 1969-1970 > <artist IDREF= giles >M. Giles Figure 6: Query B evaluation result <band name= Emerson, Lake and Palmer style= rock ID= elp > <artist IDREF= emerson >K. Emerson <musician name= Greg Lake ID= lake > Figure 7: Query C evaluation result 4.1 Copied ID (without corresponding IDREF) The problem here relies to ID validity constraint of XML [8]. In case of using original document together with created one further in query processing, ID ambiguity appears. Proposed solution is to copy ID attribute from original element and change its value (to be in line with ID validity constraint). But in order to provide access to original element, we suggest to create IDREF attribute in newely created element with value equals to ID of original element. Thus, whenever one needs to access original element from new document, he should traverse two graph edges (to created element and then to original one). 4.2 Copied IDREF (without corresponding ID) This is not a problem, because XML is a semistructured data model, and it can be incomplete by its nature. One can consider this hanging IDREF in two ways: 1. Reference to the element from original document (if it is used afterwards). 2. True hanging reference, when original document is not used further in query processing. 4.3 Copied both ID and corresponding IDREF This problem here also relies to ID validity constraint, but has some more insight. When the entire

edge is copied from one graph to another, both elements might not be the same as they were in original document. Moreover, type of both elements could possibly change (and XQuery is sensitive to element types). Thus we suggest to proceed in this case as follows. For the element that is referenced, its ID is changed and IDREF reference is created to the original element. For the element that references, IDREF value changes accordingly to new ID value of abovementioned element. This will solve ID validity problem and allow to access originally referenced element if needed. 4.4 Implementation Proposed implementation is straight and simple. First, XQuery processor evaluates query result as temporary document. Second, it searches through this document for IDs that were extracted from original document and creates dictionary of changes, where original ID is the key and new ID is the value. This dictionary must provide uniqueness of all IDs. Third, query processor searches through temporary document once again and does follows: 1. If element has ID that is to be replaced, then ID is replaced with the one taken from the dictionary and IDREF attribute is added with value of original ID (reference to original element). 2. If element has IDREF, referencing to ID that is subject to change, then IDREF changes to appropriate value (taken the from dictionary). <guitar-player ID= lake-01 IDREF= lake > <name>greg Lake</name> <guitar-player ID= fripp-01 IDREF= fripp > <name>robert Fripp</name> Figure 8: Query A updated evaluation result <cast number= 1 years 1969-1970 > <artist IDREF= giles >M. Giles Figure 9: Query B updated evaluation result no changes <band name= Emerson, Lake and Palmer style= rock ID= elp-01 IDREF= elp > <artist IDREF= emerson >K. Emerson <artist IDREF= lake-01 >G. Lake <musician name= Greg Lake ID= lake-01 IDREF= lake > Figure 10: Query C updated evaluation result 4.5 Example Let s show now how our proposal will be reflected in examples given in Section 2. For the simplicity, we used -N postfix for newly created ID, where N is a number. A. First example the problem with ID duplicates. We need to change IDs of lake and fripp elements and create references from them to original elements (Figure 8). B. The second case is not really a problem, so nothing is going to change there (Figure 9). C. Third example combines two problems. We need to solve problem with elp duplicating ID (as in first case) and then solve problem with crossreference between artist in elp and lake (Figure 10). 5 Conclusion In this paper we investigated behaviour of IDREF- ID references in XML document during XQuery query evaluation. We found a problem with extracting ID and IDREF attributes from original document to a new one and proposed solution to this problem, that can be applied to the existing XQuery data model. While our straight approach can be a solution for the moment, there is another aspect that was uncovered in this article. Current version of XML allows usage of XML Namespaces [7], but there is no details on IDREF-ID references between namespaces. Moreover, there is no clarity if XML element could have different IDs for different namespaces. In case it can, one might use namespaces to propose another solution to the problem described in the article. So, the question is open and might be a good subject for further investigations. Furthermore, one can think of optimizing queries, using data about types (like DTD schema) [4]. This raises another problem, as reference types might change, and is also a good subject for investigation.

References [1] Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68 88, 1997. Also http://www-db. stanford.edu/pub/papers/lorel96.ps from Stanford DB group on-line publications http: //www-db.stanford.edu/pub/. [2] G.M. Bierman. Using xml as an object interchange format, 2000. [3] R. G. G. Cattell, Douglas K. Barry, Mark Berler, Jeff Eastman, David Jordan, Craig Russell, Olaf Schadow, Torsten Stanienda, and Fernando Velez. The Object Data Standard: ODMG 3.0. Elsevier Science and Technology Books, 1999. [4] Chin-Wan Chung Chang-Won Park, Jun- Ki Min. Structural function inlining technique for structurally recursive XML queries. In Proc. VLDB 2002, 2002. [5] R. Goldman, J. McHugh, and J. Widom. From semistructured data to XML: Migrating the lore data model and query language. In Workshop on the Web and Databases (WebDB 99), pages 25 30, 1999. [6] J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A database management system for semistructured data. SIGMOD Record (ACM Special Interest Group on Management of Data), 26(3):54 66, September 1997. [7] Namespaces in XML, 1999. http://www.w3.org/tr/rec-xml-names/. [8] Extensible markup language (XML) 1.0 (third edition), 2004. http://www.w3.org/tr/2004/rec-xml- 20040204/. [9] XQuery 1.0 and XPath 2.0 Data Model, 2003. http://www.w3.org/tr/xpath-datamodel/. [10] XQuery 1.0: An XML Query Language, 2003. http://www.w3.org/tr/xquery/.