Use of XML Schema and XML Query for ENVISAT product data handling

Similar documents
XML: Extensible Markup Language

COMP9321 Web Application Engineering

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

M359 Block5 - Lecture12 Eng/ Waleed Omar

Earth Observation Payload Data Ground Systems Infrastructure Evolution LTDP SAFE. EO Collection and EO Product metadata separation Trade-Off

XFDU packaging contribution to an implementation of the OAIS reference model

Agenda. Summary of Previous Session. XML for Java Developers G Session 6 - Main Theme XML Information Processing (Part II)

x ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

Introduction to XML 3/14/12. Introduction to XML

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Future Core Ground Segment Scenarios

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

XML Metadata Standards and Topic Maps

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

Microsoft XML Namespaces Standards Support Document

Microsoft XML Namespaces Standards Support Document

S-100 schemas and other files

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 5: Multimedia description schemes

Agenda. Summary of Previous Session. XML for Java Developers G Session 7 - Main Theme XML Information Rendering (Part II)

Structured documents

XML information Packaging Standards for Archives

XML Technologies Dissected Erik Wilde Swiss Federal Institute of Technology, Zürich

Pascal Gilles H-EOP-GT. Meeting ESA-FFG-Austrian Actors ESRIN, 24 th May 2016

The XML Metalanguage

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

XML Applications. Introduction Jaana Holvikivi 1

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 2: Description definition language

UNIT 3 XML DATABASES

Foreword... v Introduction... vi. 1 Scope Normative references Terms and definitions Extensible Datatypes schema overview...

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

ISO/IEC INTERNATIONAL STANDARD. Information technology ECMAScript for XML (E4X) specification

Progress report on INSTAT/XML

HMA Standardisation Status

XML for Java Developers G Session 8 - Main Theme XML Information Rendering (Part II) Dr. Jean-Claude Franchitti

COMP9321 Web Application Engineering

SRI VIDYA COLLEGE OF ENGINEERING & TECHNOLOGY- VIRUDHUNAGAR

Extreme Java G Session 3 - Sub-Topic 5 XML Information Rendering. Dr. Jean-Claude Franchitti

Development of Software Interfaces using TCL/Tk

XML ELECTRONIC SIGNATURES

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

COMP9321 Web Application Engineering

Pre-Discussion. XQuery: An XML Query Language. Outline. 1. The story, in brief is. Other query languages. XML vs. Relational Data

Introduction to XML. XML: basic elements

Data Exchange. Hyper-Text Markup Language. Contents: HTML Sample. HTML Motivation. Cascading Style Sheets (CSS) Problems w/html

This document is a preview generated by EVS

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

A FRAMEWORK FOR EFFICIENT DATA SEARCH THROUGH XML TREE PATTERNS

This work is licensed under the Creative Commons Attribution 4.0 International License. Page 1 of 10

ASN.1: A Powerful Schema Notation for XML

XML BASED DICTIONARIES FOR MXF/AAF APPLICATIONS

Consolidation Team INSPIRE Annex I data specifications testing Call for Participation

Chapter 13 XML: Extensible Markup Language

ISO/IEC Information technology Multimedia content description interface Part 7: Conformance testing

The GIGAS Methodology

Metadata and Encoding Standards for Digital Initiatives: An Introduction

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Content Management for the Defense Intelligence Enterprise

XML Primer Plus By Nicholas Chase

COMP9321 Web Application Engineering

Generalized Document Data Model for Integrating Autonomous Applications

XML and Agent Communication

> Semantic Web Use Cases and Case Studies

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 1: Systems

Copernicus Space Component. Technical Collaborative Arrangement. between ESA. and. Enterprise Estonia

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Jay Lofstead under the direction of Calton Pu

The Semantic Planetary Data System

D WSMO Data Grounding Component

Geographic Information Fundamentals Overview

Hospital System Lowers IT Costs After Epic Migration Flatirons Digital Innovations, Inc. All rights reserved.

[MS-XMLSS]: Microsoft XML Schema (Part 1: Structures) Standards Support Document


XML-based production of Eurostat publications

Foreword... v Introduction... vi. 1 Scope Normative references Terms and definitions DTLL schema overview...

XML Extensible Markup Language

BinX Usage Standard PE-TN-ESA-GS-120

Java EE 7: Back-end Server Application Development 4-2

XML Update. Royal Society of the Arts London, December 8, Jon Bosak Sun Microsystems

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

XML APIs Testing Using Advance Data Driven Techniques (ADDT) Shakil Ahmad August 15, 2003

SDMX self-learning package No. 3 Student book. SDMX-ML Messages

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

Developing markup metaschemas to support interoperation among resources with different markup schemas

11. EXTENSIBLE MARKUP LANGUAGE (XML)

Intro to XML. Borrowed, with author s permission, from:

Chapter 1: Introduction

IBM. XML and Related Technologies Dumps Braindumps Real Questions Practice Test dumps free

GREASE Grid Aware End-to-end Analysis and Simulation Environment

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

Netml A Language and Website for Collaborative Work on Networks and their Algorithms

This document is a preview generated by EVS

Overview of the GMES Space Component & DMC as a GMES Contributing Mission

Introduction p. 1 An XML Primer p. 5 History of XML p. 6 Benefits of XML p. 11 Components of XML p. 12 BNF Grammar p. 14 Prolog p. 15 Elements p.

Les outils CNES. The «BEST» WORKBENCH. Béatrice LARZUL Danièle BOUCON Dominique HEULET. March The «BEST» Workbench

The CASPAR Finding Aids

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

References differences between SVG 1.1 Full and SVG 1.2 Tiny

Transcription:

Use of XML Schema and XML Query for ENVISAT product data handling Stéphane Mbaye stephane.mbaye@gael.fr GAEL Consultant Cité Descartes, 8 rue Albert Einstein 77420 Champs-sur-Marne, France Abstract * This paper proposes a solution to use XML and related technologies such as XML Schema, XML Query and XSL-FO in order to access, transform and render ENVISAT product data. The solution extends the XML technologies to be directly exploitable for non-xml documents such as ENVISAT products. It defines an extension of XML Schema to describe the physical and logical structure down to the bit level representation. A dedicated API named DRB (similar to the DOM [0] API specified by the W3C) allows to access the products. DRB supports the latest XML Query language on top of XPath to locate, select and transform any data from one or more ENVISAT products. In addition, the output can be stored into one of the handled format (including ENVISAT products themselves). As an application of DRB, we introduce Derby software, a datamining tool for managing and analyzing data. Derby is an integrated environment that takes advantage of the DRB functionality through a graphical user interface. We also provide some EO applications illustrating the benefits of such concept during design and operational phases.. Introduction In addition to the definition of the extensible Markup Language (XML) [], the World Wide Web Consortium (W3C) develops a large amount of adjacent technology specifications. The objective is to provide the Web developer community with a complete and unified set of recommendations gathering all emerging and consolidated concepts in the domain. Although most of the W3C tasks remain under development, an increasing number of applications This study has been funded in the framework of a contract N 5247/0/I-LG Development of ENVISAT LVL0 Analysis and Product Comparison Tool agreed between European Space Agency ESA/ESRIN ENVISAT Ground Segment Engineering Division and GAEL Consultant. are already available. The Space Engineering community was not long in coming to XML technologies by developing standards such as the Data Entity Dictionary Specification Language, the Baseband Data Archive Interchange Format 2, DIMAP 3 or the latest International Metadata Standard 4. These applications are focused on the management of metadata for which XML language is suitable. On the contrary, XML does not fit the management of EO data themselves (i.e. signal data or measurement data). The translation of such data to XML is actually ineffective in most cases. For instance, it is obviously worthless to translate binary encoded data or large datasets to XML character encoding: such operation would result in both a dramatic increase of size and a decrease of access performances. Despite these drawbacks, the use of XML-related technologies such as XML Schema [2] and XQuery language [4] remain however advantageous to the management of EO data. It was therefore necessary to find a way to combine the use of XML-related technologies with the handling of EO data disregarding their encoding format. In the continuity of GAEL Consultant experiences in the handling of EO formats, we have studied and are currently developing a system for this purpose. Its first application is dedicated to the ENVISAT product data analysis during the maintenance and operational phases of the instrument processors. Data Entity Dictionary Specification Language (DEDSL) is a standard developed by the Consultative Committee for Space Data System (CCSDS). 2 The Baseband Data Archive Interchange Format (CEOS ICF) is a standard developed by the Committee on Earth Observation Satellites (CEOS). 3 DIMAP is the format for the SPOT 5 products dissemination. 4 International Metadata Standard (ISO/DIS 95) is a Draft International Standard developed by the ISO Technical Committee 2.

2. Use of XML Schema Recently the W3C has recommended the new XML Schema [2][3] dedicated to the validation of the XML documents. As a summary, XML Schema is an XML language for describing and constraining the content of XML documents. 2.. Need of extension The XML Schema language is focused on the logical structure of the XML documents and does not provide any functionality for the description of their physical structure. In order to handle ENVISAT products, it was therefore necessary to extend the XML Schema with specific markups. The extension is performed through a new namespace [9] currently identified by http://www.gael.fr/drb/. The markups defined within this namespace are inserted in the XML Schema document as defined in the W3C recommendation [2] (see example below): <schema xmlns= http://www.w3.org/200/xmlschema/ xmlns:drb= http://www.gael.fr/drb/ > <complextype name= MPH > <sequence> <element name= FileName type= String > <drb:byteoffset>0</drb:byteoffset> </element> </sequence> </complextype> Figure. Example of physical description markup The extension of XML Schema for the ENVISAT product description has been preferred to the development of any new language. Thanks to this choice, the same files define both the logical and the physical description of the products. This choice facilitates the maintenance of the file descriptors and guarantees their integrity. 2.2. New markups The following list provides the most important markups which have been added and which handle the description of the products physical structure. <drb:byteoffset> The byte offset of an element. <drb:bitoffset> <drb:length> The bit offset of an element. The bit length of an element. <drb:occurrence> The occurrence count of an element. Figure 2. Example of new markups 3. DRB, an API for accessing products 3.. DOM interfaces The W3C provides a standard programming interface for manipulating XML Documents: the Document Object Model (DOM) [0]. It is designed essentially for the management of HTML pages in Web browsers or servers. Even if the DOM definition has been refined several times, it does not already support XML-Schema (in particular Schema Types), XPath 2.0 or XQuery languages. 3.2. DRB interfaces Because DOM interfaces do not already support neither XML-Schema nor XQuery, it was necessary to design and develop a new API called: Data Request Broker (DRB). DRB proposes several interfaces fixing the problems listed above. We intend however to minimize the departure from DOM specifications. 3.3. DRB implementations We are currently developing several implementations of DRB interfaces necessary to support ENVISAT products. At the end of the project, the software will be able to browse and query XML documents and binary data files such as ENVISAT products. These implementations are presented in the next sections. 3.4. XML implementation The DRB API is able to handle XML documents. The XML implementation has not been redeveloped but consist in a wrapping to the xerces API provided by Apache organization. XML implementation is for instance necessary to read the XML Schema documents. 3.5. SDF implementation The implementation dedicated to the management of ENVISAT products is called Structured Data File (SDF). It uses the XML Schema definitions and the DRB extensions presented previously to locate and extract the ENVISAT product fields. 4. Queries for transformations 4.. XPath to locate information The XPath [7] is used to locate the data within the documents. For instance the next example locates all the Top of atmosphere radiance values of the first band in an ENVISAT MERIS Level parent product. mds stands for Measurement Data Set, mdsr for Measurement Data Set Record and toa_rad for the Top of Atmosphere Radiance. 2

document( MER_RR P )/mds[]//mdsr/toa_rad Figure 3. Top of Atmosphere Radiance locator The node names are not embedded in the ENVISAT products but derived from the XML Schema. Their types as well as their documentation are also retrieved from the same XML Schema definition. 4.2. XQuery On top of XPath, XQuery language [4] provides a powerful solution for the selection and transformation of data extracted from the products. Even if the XQuery is not already recommended by W3C, its syntax basics are stable enough to implement them. Using XQuery, it is possible to access any information from one or more ENVISAT products. The following examples provide an overview of what can be done with XQuery language applied on ENVISAT products. FOR $rec IN /mds[2]//mdsr[range 00 TO 200] IF $rec/val_qi = 0 THEN RETURN $rec/toa_rad[range 00 TO 500] Figure 6. Conditional expression The same query as Figure 5, using conditional expressions IF and THEN instead of WHERE clause. FOR $current_mds IN //mds RETURN count($current_mds/mdsr[val_qi = - ]) Figure 7. Built-in function The query of Figure 7 presents the use of a built-in function (i.e. count() in the example) that simplifies the editing of the query and optimizes the processing performances. /mds[range 2 TO 5]//mdsr[val_qi= 0 ] Figure 4. Simple selection The query presented in Figure 4 extracts the qualified scan lines (i.e. quality indicator of measurement equal to 0) from band 2 to 5 of an ENVISAT MERIS product. FOR $p IN document( MER RR )/mds[]//mdsr, $p2 IN document( MER FR )/mds[]//mdsr WHERE $p/time = $p2/time RETURN count($p) Figure 8. Joins FOR $rec IN /mds[2]//mdsr[range 00 TO 200] WHERE $rec/val_qi = 0 RETURN $rec/toa_rad[range 500 TO 600] Figure 5. Selection with constraint The query presented in Figure 5 applies a FLWR expression [4] on a MERIS product. The result is a window extracted from the band 2, bounded from columns 500 to 600 and lines 00 to 200. As for Figure 4, only the qualified lines are extracted. The use of joins (i.e. selections from multiple documents with interrelated constraints) is probably one of the most interesting functionality of queries for ENVISAT products. In the example above, the measurement data set records are extracted from two distinct files (i.e. a reduced resolution product and a full resolution product). The extracted records are synchronized along time within the FOR loop using a WHERE clause. Such case may be useful to compare or extract values from both a parent and a child product (a child product is an extraction performed on a product, on request of ENVISAT product final users). 5. Rendering The results of the queries can be displayed into several views. As an example we are developing display modules handling tables, plots, images and 3

.5 0.5 0-0.5 - -.5 reports. The rendering definition is done using the XSL Formatting Objects (XSL-FO) language [6].,5 0,5 0-0,5 - -,5 Figure 9. Example of rendered plot In addition, the output of the queries may be saved into one of the supported formats (XML or ENVISAT formats) for further complementary analysis or exchanges with other systems. 6. Derby software As an application of DRB, we are currently developing the Derby software, a datamining tool for managing and analyzing data. Derby is an integrated environment that takes advantage of the DRB functionality through a graphical user interface. Figure 0. Derby main interface As an example, Derby will enable to browse all ENVISAT product fields from a tree representation. It will also permit to edit XQuery scripts and to render their results into tables, plots, image views or compile them into a configurable report. Additional functionality such as syntax highlighting or automatic completion for query editing will ease the usage and facilitate the training. The first release of Derby software will be made available by the beginning of next year. Its operational usage is foreseen during the ENVISAT early phases to support the changes in the different instrument processors. 7. Benefits and example of applications The presented system enables the use of ENVISAT product data with a minimum knowledge of their physical representation. Each part of the product is described by a unique description based on XML Schema, an existing and public standard that is opened to any other system. All information can be extracted from one or more products through XQuery scripts. These main features make the system advantageous for many EO applications. A non-exhaustive list of such applications are introduced in the next sections. 7.. Processor configuration management The system is helpful for the configuration management of the instrument processors. It actually enable to compare the processed products against reference ones, highlighting possible discrepancies between them. This application was the starting point of the present study. 7.2. New missions Even if initially developed for ENVISAT product data handling, the opened architecture of the system reduces the manpower required to support additional missions. 7.3. Products definition The XML Schema is a good candidate for the editing and maintenance of the product definitions in a unique repository that may be shared among data users. From the XML Schema it is actually possible to extract all information usually printed out in the product specification documents with the advantage of being directly exploitable from software in a networked or local environment. 7.4. Translations The system is able to read and write information disregarding the physical format. It is therefore possible to use it as a format translator defining an output-encoding format different from the input one. In addition minor processing (at least the one made possible from the XQuery operators) may be applied during translation. 7.5. Software development The DRB software may be used as standard API for accessing all supported products. This may avoid the reengineering of product specification by software developers and minimize the implementations of components requiring access to product data. 7.6. Quality analysis The system may be useful for the Systematic Quality Analysis (SQA) as well as Long Loop 4

Performance Analysis (LLPA) by providing a simple interface for the extraction and interpretation of relevant product information. It is already foreseen by ESA/ESRIN to use this software as an integrated part of the next operational quality control system of ENVISAT. 8. Conclusions The outputs of the performed study confirmed the feasibility of a system combining the use of XMLrelated technologies with the handling of ENVISAT product data. Moreover they emphasized the interest of providing the highest level of abstraction from the data physical representation, allowing EO scientists, engineers and managers to concentrate on their lines of business. This promising concept should lead to further investigations and developments. [8] XML Linking Language, W3C Recommendation 27 June 200, http://www.w3.org/tr/xlink/ [9] Namespaces in XML, W3C Recommendation 4 January 999, http://www.w3.org/tr/rec-xml-names [0] Document Object Model (DOM) Level 3 Core Specification, W3C Working Draft 3 September 200, http://www.w3.org/tr/dom-level-3-core Acknowledgments We would like to thank Eric Monjoux from ESA ESRIN for his contribution and comments in reviewing the present paper. References [] Extensible Markup Language (XML).0 (Second Edition), W3C Recommendation 6 October 2000,http://www.w3.org/TR/2000/ REC-xml-2000006 [2] XML Schema Part : Structures, W3C Recommendation 2 May 200, http://www.w3.org/tr/xmlschema-/ [3] XML Schema Part 2 : Datatypes, W3C Recommendation 2 May 200, http://www.w3.org/tr/xmlschema-2/ [4] XQuery.0 : an XML Query Language, W3C Working Draft 07 May 200, http://www.w3.org/tr/xquery/ [5] Extensible Stylesheet Language (XSL) Version.0, W3C Recommendation 5 October 200, http://www.w3.org/tr/xsl/ [6] XSL Transformations (XSLT) Version.0, W3C Recommendation 6 November 999, http://www.w3.org/tr/xslt [7] XML Path Language (XPath), W3C Recommendation 6 November 999, http://www.w3.org/tr/xpath 5