GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

Similar documents
Research on the Interoperability Architecture of the Digital Library Grid

Joining the BRICKS Network - A Piece of Cake

Personalized Faceted Navigation in the Semantic Web

Interoperability and eservices

Grid Resources Search Engine based on Ontology

Interoperability for Digital Libraries

Opus: University of Bath Online Publication Store

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Fedora Relationships and Information Network Overlays. CS 431 April 19, 2006 Carl Lagoze Cornell University

Proposal of a Multi-agent System for Indexing and Recovery applied to Learning Objects

Semantic-Based Web Mining Under the Framework of Agent

UNICORE Globus: Interoperability of Grid Infrastructures

Harvesting RDF Triples

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

Using metadata for interoperability. CS 431 February 28, 2007 Carl Lagoze Cornell University

Chinese-European Workshop on Digital Preservation. Beijing (China), July 14 16, 2004

An Annotation Tool for Semantic Documents

Reducing Consumer Uncertainty

Developing an Institutional Repository Service in Chinese Academy of Sciences

The CASPAR Finding Aids

Semantic Web Search Model for Information Retrieval of the Semantic Data *

An Archiving System for Managing Evolution in the Data Web

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

Survey: Grid Computing and Semantic Web

A Distributed Media Service System Based on Globus Data-Management Technologies1

SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES

Semantic Exploitation of Engineering Models: An Application to Oilfield Models

Winery A Modeling Tool for TOSCA-Based Cloud Applications

Mappings from BPEL to PMR for Business Process Registration

OAI-PMH implementation and tools guidelines

An RDF NetAPI. Andy Seaborne. Hewlett-Packard Laboratories, Bristol

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Development of an Ontology-Based Portal for Digital Archive Services

Ontology Servers and Metadata Vocabulary Repositories

Porting Social Media Contributions with SIOC

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

An Introduction to the Grid

Resource Load Balancing Based on Multi-agent in ServiceBSP Model*


Metadata Harvesting Framework

Provenance-Aware Faceted Search in Drupal

Information Retrieval (IR) through Semantic Web (SW): An Overview

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Semantic Web: vision and reality

Ways for a Machine-actionable Processing Chain for Identifier, Metadata, and Data

Web Ontology for Software Package Management

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

A Fast and High Throughput SQL Query System for Big Data

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Harvesting RDF triples

Creating a National Federation of Archives using OAI-PMH

A Finite State Mobile Agent Computation Model

Agent-Enabling Transformation of E-Commerce Portals with Web Services

Delivering Data Management for Engineers on the Grid 1

2nd Technical Validation Questionnaire - interim results -

Knowledge Discovery Services and Tools on Grids

RiMOM Results for OAEI 2009

Introduction to Grid Computing

New research on Key Technologies of unstructured data cloud storage

OAI-PMH. DRTC Indian Statistical Institute Bangalore

Semantic Web and Natural Language Processing

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability

SECTION 10 EXCHANGE PROTOCOL

Contribution of OCLC, LC and IFLA

An Approach to Enhancing Workflows Provenance by Leveraging Web 2.0 to Increase Information Sharing, Collaboration and Reuse

A Technique for Representing Course Knowledge Using Ontologies and Assessing Test Problems

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 4, Jul-Aug 2015

ISSN: Supporting Collaborative Tool of A New Scientific Workflow Composition

Research on the Key Technologies of Geospatial Information Grid Service Workflow System

Development of Contents Management System Based on Light-Weight Ontology

Applying SOAP to OAI-PMH

An Architecture to Share Metadata among Geographically Distributed Archives

Collaborative Ontology Construction using Template-based Wiki for Semantic Web Applications

Personal Grid Running at the Edge of Internet *

Scientific Data Curation and the Grid

Grid-Based Data Mining and the KNOWLEDGE GRID Framework

SERVO - ACES Abstract

KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache

ICENI: An Open Grid Service Architecture Implemented with Jini Nathalie Furmento, William Lee, Anthony Mayer, Steven Newhouse, and John Darlington

Market Information Management in Agent-Based System: Subsystem of Information Agents

Collaboration System using Agent based on MRA in Cloud

Metadata Issues in Long-term Management of Data and Metadata

Digital Library Interoperability. Europeana

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

An aggregation system for cultural heritage content

Grid Computing. Lectured by: Dr. Pham Tran Vu Faculty of Computer and Engineering HCMC University of Technology

Share.TEC Repository System

GlobalStat: A Statistics Service for Diverse Data Collaboration. and Integration in Grid

Ing. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research

An Engineering Computation Oriented Visual Grid Framework

Metadata in the Driver's Seat: The Nokia Metia Framework

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering

B2FIND: EUDAT Metadata Service. Daan Broeder, et al. EUDAT Metadata Task Force

A New Model of Search Engine based on Cloud Computing

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Design and Implementation of XML DBMS Based on Generic Data Model * 1

The Open Archives Initiative and the Sheet Music Consortium

Transcription:

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment Yaping Zhu, Ming Zhang, Kewei Wei, and Dongqing Yang School of Electronics Engineering and Computer Science, Peking University, 100871, Beijing, China {zhuyaping, mzhang, wkw, ydq}@db.pku.edu.cn Abstract. The imperative demand on the description of semantic metadata and the processing of real-time data presents unique challenge to Grid Metadata Service. Grid Monitoring Architecture (GMA), which is a framework for dynamic data management, is limited by its conventional interface of relational database and therefore fails to address the problem of interoperability. Faced with the problem of metadata publishing in GMA, we present a new publishharvest protocol for semantic metadata called GMA-PSMH (Grid Monitoring Architecture-Protocol for Semantic Metadata Harvesting) by modifying the OAI-PMH metadata harvest framework. As part of the Semantic Metadata Service Project in Peking University, the associated dynamic metadata management framework is then implemented according to the above protocol. At the end, we make the conclusion and overview the future work. 1 Introduction The imperative demand on scientific distributed processing, cross-domain cooperative computing and resources sharing has greatly accelerated the development of Grid Computing. Grid Computing is an integrated environment of resource and service 1. Its major objective is to solve the complex scientific and engineering problems by sharing resource and services under a distributed and heterogeneous environment. To achieve this goal, two prerequisites are required: zthe cross-domain resource used in modern scientific activities is characterized by its diversity, so a simple, standard and extensible description mechanism is required. zin modern scientific cooperation, massive data and resource are processed in real-time, thus an effective retrieval method and dynamic, synchronous update mechanism is on demand in resource management. Under such environment, resource turns into the core of the whole grid architecture. It is only by effective description of resource that the above goals could be achieved. Metadata and metadata service has consequently become the key to solve the above two problems 2 3. E.A. Fox et al. (Eds.): ICADL 2005, LNCS 3815, pp. 446 456, 2005. Springer-Verlag Berlin Heidelberg 2005

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol 447 Meanwhile, research on semantic resource description and intelligent information retrieval is developing rapidly in the domain of the Semantic Web 4. The Semantic Web Activity statement of the World Wide Web Consortium (W3C) describes the Semantic Web as an extension of the current Web in which information is given well-defined meaning. It is the idea of having data on the Web defined and linked in a way that it can be used for more effective discovery, automation, integration, and reuse across various applications. 5 Therefore, semantic metadata service will necessarily become the trend of general metadata service. By making full use of the Ontology to describe the semantic relationship between concepts, semantic metadata service could enhance the conventional metadata description in Grid to a knowledge level, and establish a solid foundation for effective resource description and sharing 6. As part of the Grid Computing Resource Service Middleware Project in Peking University, the Semantic Metadata Service Project is supported by the National Science Foundation (NSF) in China under grant No. 90412010 and ChinaGrid project of the Ministry of Education in China. Per the demand of scientific activities in Grid Computing, the objective of the Semantic Metadata Service Project is to establish the metadata model and classification in a semantic and extensible way, and build the associated semantic metadata services such as resource sharing, discovery, and dynamic management. To meet the need of metadata publishing in Grid Monitoring Architecture, we designed a new semantic metadata publish-harvest protocol called GMA-PSMH by expanding the OAI-PMH metadata harvest framework. A new dynamic semantic metadata management system is then developed according to the above protocol. The rest of the paper is structured as follows. Section 2 and 3 describes related work and the architecture of the Semantic Metadata Service Project in Peking University. Section 4 describes the design of GMA-PSMH protocol. In Section 5 and 6, we describe the dynamic semantic metadata management framework and its implementation. The paper concludes with a summary and outlines future research. 2 Related Work 2.1 GMA Grid Monitoring Architecture (GMA) is defined within the Global Grid Forum (GGF) 7. Its major purpose is to monitor the real-time data and information under Grid environment. The architecture consists of three components (shown in Figure 1): Consumers, Producers, and a Registry. Fig. 1. Grid Monitoring Architecture

448 Y. Zhu et al. Ã Producers: register its URL and the type of data available with the Registry; Ã Consumers: query the Registry to find out the desired type of information and to locate the corresponding Producer. Then the Consumer could get the real-time data by contacting a specific Producer directly with its URL; Ã Registry: provides registry service Producers and information retrieval for Consumers 8. As part of the European Data Grid Project, the Relational Grid Monitoring Architecture (R-GMA) is an implementation of the above-mentioned Grid Monitoring Architecture 9. It is based on the traditional relational data model, and users could insert and retrieve data from the repository by issuing SQL queries such as SQL INSERT and SQL SELECT statements. Nevertheless, limited by its interface of relational databases, R-GMA fails to address the problem of interoperability. Nor could it make any support to data publishing and harvesting. 2.2 OAI-PMH Open Archive Initiative 3URWRFRO IRU 0HWDGDWD +DUYHVWLQJ 2$,-PMH) is a framework designed for metadata interoperability in the domain of Digital Library 10. The framework logically has two kinds of collaborators (shown in Figure 2). Data Provider provides its general repository information, metadata formats and metadata records in response to OAI requests. Service Provider then uses the harvested metadata as a foundation for providing value-added services 11. User User User Service Providers/ Harvesters OAI-PMH Request OAI-PMH Response Service Providers/ Repositories Fig. 2. OAI-PMH However, OAI-PMH is designed for traditional metadata publishing, and there exists no standard currently for semantic metadata publishing and harvesting. 3 Semantic Metadata Service Project Berners-Lee proposed a seven-layer architecture for describing different layers of resource in Semantic Web 9. According to the need of resource description in Grid environment, the semantic metadata are classified into two layers in our project. zresource Description Framework (RDF) Layer: describes the detailed semantic information of objects or instances. It establishes the relationship about single object, property and property value and saves them in files formatted in RDF standard 12 ;

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol 449 zontology Layer: defines the abstract structure of a semantic class and the relationship between different classes. Ontology is defined as the formalized specification of shared conceptual model and OWL is one of the most widely used descriptive languages for Ontology 13. OWL mainly describes resources by two types of building block: concept and property. It uses semantic relationship such as hierarchical structure, synonymy, logical component, and relational constraint to establish the relationship between resources and saves the model in files formatted in OWL standard. The Semantic Metadata Service Project uses the above-mentioned two layers to describe the semantic data and metadata. Features of the project include: First, in metadata service, metadata itself undoubtedly becomes the core data model. Associated core metadata services include the registry, deletion, update and retrieval of metadata. Moreover, extensible description mechanism must be provided for metadata definition management. Second, we also define the concepts of collection and view to facilitate personalized logical organizing of metadata. Corresponding metadata services include the creation, deletion and update of views and collections. Last, resource in the Grid is characterized by its changeability. Under such circumstance, traditional metadata service interface could no longer satisfy the need. A new dynamic, semantic metadata management framework is presented in our project to support the registry, synchronous update and retrieval of dynamic metadata. The whole architecture could be layered in four layers, as shown in Figure 3. Fig. 3. Semantic Metadata Service Project in Peking University ztool Layer: provides develop toolkit and graphical user interface for semantic metadata services; zinterface Layer: deals with the definition of service interface and the parsing of communication protocol (such as SOAP) and XML documents;

450 Y. Zhu et al. zapplication Layer: provides implementation of seven types of semantic metadata service interfaces, including metadata definition management, metadata instance management, collection management, view management, knowledge management and dynamic metadata management and synchronous update; zdata Layer: stores data objects used for metadata services, including metadata definition, metadata instances, collections, views and knowledge. In this paper, we mainly focuses on the Dynamic Metadata Management Framework in application layer. In the following section, the paper will describes the design of the Semantic Metadata Publish-Harvest Protocol (GMA-PSMH). 4 Semantic Metadata Publish-Harvest Protocol Design As stated above, OAI-PMH is designed for traditional metadata publish-harvest 10. A new Semantic Metadata Publish-Harvest Protocol (GMA-PSMH) is consequently required to solve the problem of dynamic semantic metadata management in Grid. First, the semantic metadata in the project are classified into two categories, formatted in RDF and OWL individually. Therefore, the protocol should support the publishing of the above two kinds of metadata files. Moreover, according to the need of dynamic metadata management, the protocol must also provide the URL address and average update frequency of metadata files to facilitate synchronous update of metadata files at designated time interval. GMA-PSMH includes three groups of requests and responses. The relationship between OAI-PMH and GMA-PSMH is shown in the table below. The protocol is also based on HTTP request, with responses encoded in XML streams. Table 1. Comparison between OAI-PMH and GMA-PSMH Request/Response OAI-PMH GMA-PSMH Name Identify Information of DataProvider Information of repository ListMetadataFormats Metadata Format OWL metadata file information ListRecords Metadata Records RDF metadata file information Detailed description of the three groups of requests and responses in GMA-PSMH are stated below: 1) Identify: provide general information of the repository, including repository name, base URL address, administrator s email, update granularity and description, similar to OAI-PMH. 2) ListMetadataFormats: describe the formats of semantic metadata, or say, the general information of OWL metadata files, including filename, URL address, average update granularity, last update time, version, copyright and detailed description. Detailed protocol examples is shown below:

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol 451 <?xml version="1.0" encoding="utf-8"?> <GMA-PMHxmlns=http://www.openarchives.org/OAI/2.0/ xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="http://www.openarchives.org/oai/2.0/ http://www.openarchives.org/oai/2.0/oai-pmh.xsd"> <responsedate>2005-05-17t21:45:33z</responsedate> <request verb="listmetadataformats" /> <ListMetadataFormats> <metadataformat> <filename>filesystem.owl</filename> <URL>localhost://filesystem.owl</URL> <granularity>2 months</granularity> <lastupdated>2005-05-01t</lastupdated> <version>1.0</version> <copyright>pku</copyright> <description>file system</description> </metadataformat> </ListMetadataFormats> </GMA-PMH> 3) ListMetadataFiles: describe semantic metadata, or say, the general information of RDF metadata files, including filename, URL address, URI, average update granularity, last update time, version, copyright and detailed description. Detailed protocol examples is shown below: <?xml version="1.0" encoding="utf-8"?> <GMA-PMHxmlns=http://www.openarchives.org/OAI/2.0/ xmlns:xsi="http://www.w3.org/2001/xmlschemainstance"xsi:schemalocation="http://www.openarchives.org/ OAI/2.0/http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responsedate>2005-05-17t21:45:33z</responsedate> <request verb="listmetadatafiles" /> <ListMetadataFiles> <metadatafile> <filename>network1.rdf</filename> <URL>localhost://network/network1.rdf</URL> <granularity>1 minute</granularity> <lastupdated>2005-05-01t</lastupdated> <version>1.0</version> <copyright>pku</copyright> <metadataformat>network.owl</metadataformat> <description>null</description> </metadatafile> </ListMetadataFiles> </GMA-PMH> 5 Dynamic Metadata Management Framework In accordance with the demand on dynamic metadata synchronization and management under Grid environment, we designed a new dynamic, semantic metadata management framework by referring to Grid Monitoring Architecture.

452 Y. Zhu et al. In this framework, three kinds of participators are defined first, namely resource provider, resource consumer and metadata service. Detailed function of each kind of participator is described below: zresource Providers: monitor the status of Grid resource and publish the status formatted in semantic metadata according to GMA-PSMH protocol; zmetadata Service: is in charge of resource provider registration and metadata retrieval. It stores the update address and average update frequency of dynamic metadata, and harvests synchronous metadata to its local repository by parsing the GMA-PSMH protocol at designated time interval; zresource Consumers: users of application programs. Next, different kinds of dynamic metadata in this framework must be classified: zreal-time Metadata: the metadata provided and published by resource providers to describe the real-time status of resource in Grid; zhistorical Metadata: the metadata harvested from resource providers in the past. Since the resource status in Grid may change rapidly, absolute real-time metadata could not be achieved. So the only way is to set different update frequencies in accordance with the changing features of diverse resources. Historical metadata still shows its significance in metadata retrieval; zcached Metadata: the historical metadata stored at metadata service, which is used as a base for information retrieval. The whole workflow of the Dynamic Metadata Management Framework consists of the following five processes, as shown in Figure 4, with active application processes marked in dark color. Fig. 4. Dynamic Metadata Management Framework

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol 453 1) Resource Provider publishes its dynamic, semantic metadata formatted in RDF and OWL according to GMA-PSMH protocol; 2) Resource provider registers itself at metadata service; 3) After registration, metadata service harvests the semantic metadata synchronously at designated time interval, and stores the harvested metadata at its repository as cached metadata; 4) When resource consumer requests at metadata service, the metadata service returns several matched metadata and the approach to get the real-time metadata; 5) The resource consumer could choose to use the cached metadata directly or harvest real-time metadata from resource producers. 6 Implementation 6.1 Resource Provider Implementation As part of the Dynamic Metadata Management Framework, resource provider accepts HTTP requests (Get or Post method) from metadata service or resource consumer, and responses XML streams according to GMA-PSMH protocol. The whole subsystem of the resource provider consists of three functional modules, as shown in Figure 5. 1) JSP Page Management Module: accepts HTTP requests and parameters, calls the corresponding function in JavaBean, and displays the result page via Apache Tomcat Server. 2) Database Management Module: Descriptive information of RDF and OWL semantic files is saved in MySQL database in our system. The main purpose of the database management module is to generate appropriate SQL query statements, retrieve the database, and return the result to application program. Also, three tables named GeneralInfo, OWLFiles and RDFFiles are established in our database, saving general information, metadata formats and metadata individually; 3) XML Format Generator: generators XML stream according to GMA-PSMH. Fig. 5. Resource Provider and Metadata Service Implementation

454 Y. Zhu et al. 6.2 Metadata Service Implementation The primary function of metadata service is to provide resource provider registration and metadata retrieval. Moreover, metadata service has to start up system thread timely, and harvests real-time metadata to update its local repository. The whole subsystem composes five parts, also shown in Figure 5. 1) Resource Provider Registration Module: provides interface for resource providers to register its base URL address. Then, a system thread is started up sending HTTP requests to get general information and metadata of the resource provider. 2) Metadata Update Module: metadata service starts up system thread timely according to the average update frequency provided by resource provider, sends HTTP requests (verb=listmetadataformats or verb=listmetadatafiles), parses XML streams and updates its database. 3) XML Parser: parses GMA-PSMH protocol stream to a DOM tree by making use of the XERCES java package. 4) Database Management Module: generator SQL INSERT or UPDATE statements to update its local repository. 5) Metadata Retrieval Module: returns several matched metadata and provides approaches for resource consumer to get real-time metadata from the provider. 6.3 Graphical User Interface Resource provider accepts HTTP requests from user and responses in XML streams in accordance with GMA-PSMH protocol, as shown in Figure 6 and Figure 7. Fig. 6. Resource Provider Request GUI Fig. 7. Resource Provider Response GUI Figures 8 and Figure 9 show the tables in the database of metadata service after harvesting. Fig. 8. Table OWLFiles in Metadata Service

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol 455 Fig. 9. Table RDFFiles in Metadata Service The metadata retrieval page and collection view of metadata is shown in Figure 10 and 11. Fig. 10. Metadata Retrieval Page Fig.11. Collection View of Metadata 7 Conclusion and Future Work As part of the Semantic Metadata Service Project in Peking University, the paper designed and implemented the Dynamic Metadata Management Framework. Primary features of the system are listed below: zthe system promotes the interoperability of current Grid Monitoring Architecture effectively by publishing and harvesting metadata; zthe system supports semantic metadata publish-harvest by modifying the OAI-PMH in Digital Library domain; Our future work will focus on the utilization of Web Service 14 15. The interoperability of the metadata service system does not only includes metadata publishing, but also web service of current functional modules. As the middleware of Internet, Web Service is a distributed computing technique based on object/ component modules. Based on Web Service, the Internet could become an open component platform, which would facilitate function extension and combination to meet the diverse need of users. Therefore, Web Service will undoubtedly become the trend of next generation Grid Computing.

456 Y. Zhu et al. References 1. Ian Foster, Carl Kesselman, and Steven Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations, International J. Supercomputer Applications, 15(3), 2001 2. http://www.w3.org/metadata/ 3. Ewa Deelman, et al, Grid-Based Metadata Services, 16th International Conference on Scientific and Statistical Database Management (SSDBM04), 21-23 June 2004 Santorini Island Greece 4. David De Roure, Nicholas R. Jennings and Nigel R. Shadbolt, The Semantic Grid: A Future e-science Infrastructure, 2004 5. W3C Semantic Web Activity Statement, http://www.w3.org/2001/sw/activity/ 6. Semantic Web, www.w3.org/2001/sw/ 7. R-GMA: Relational Grid Monitoring Architecture, www.r-gma.org/ 8. Rajesh Ramon, MATCHMAKING FRAMEWORKS FOR DISTRIBUTED RESOURCE MANAGEMENT, 2001 9. European DataGrid Project, http://www.edg.org/ 10. OAI, www.openarchives.org/ 11. Shuan Wang, Meng Wang, and Ming Zhang, the Design and Implementation of a Metadata Interoperability Architecture based on OAI-PMH, Computer Engineering and Application, 2003.39(20) 12. RDF, http://www.w3.org/rdf/ 13. OWL, http://www.w3.org/2001/sw/webont/ 14. http://www.w3.org/tr/ws-arch 15. Web Service Architecture: Technology Overview, 2004(4)