Provenance-aware Faceted Search in Drupal

Similar documents
Provenance-Aware Faceted Search in Drupal

POMELo: A PML Online Editor

Towards Provenance Aware Comment Tracking for Web Applications

SAF: A Provenance-Tracking Framework for Interoperable Semantic Applications

7 th International Digital Curation Conference December 2011

Tackling the Provenance Challenge One Layer at a Time

Where did you hear that? Information and the Sources They Come From

Tackling the Provenance Challenge One Layer at a Time

OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance

Produce and Consume Linked Data with Drupal!

An Approach to Enhancing Workflows Provenance by Leveraging Web 2.0 to Increase Information Sharing, Collaboration and Reuse

Automatic capture and efficient storage of e-science experiment provenance

Collaborative provenance for workflow-driven science and engineering Altintas, I.

Mapping between Digital Identity Ontologies through SISM

Automatic Generation of Workflow Provenance

An Archiving System for Managing Evolution in the Data Web

Sindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications.

Managing Rapidly-Evolving Scientific Workflows

Publishing and Consuming Provenance Metadata on the Web of Linked Data

Provenance Trails in the Wings/Pegasus System

Managing Exploratory Workflows

An Annotation Tool for Semantic Documents

The Point of View Axis: Varying the Levels of Explanation Within a Generic RDF Data Browsing Environment

The International Journal of Digital Curation Volume 7, Issue

W3C Workshop on the Future of Social Networking, January 2009, Barcelona

Provenance Information in the Web of Data

Publishing and Consuming Provenance Metadata on the Web of Linked Data

Integrating Text and Graphics to Present Provenance Information

Comparing Provenance Data Models for Scientific Workflows: an Analysis of PROV-Wf and ProvOne

A Process-Driven Approach to Provenance-Enabling Existing Applications

ORES-2010 Ontology Repositories and Editors for the Semantic Web

Addressing the Provenance Challenge using ZOOM

ISSN: Supporting Collaborative Tool of A New Scientific Workflow Composition

Support for Reusable Explorations of Linked Data in the Semantic Web

Managing the Evolution of Dataflows with VisTrails

Common Language Resources and Technology Infrastructure REVISED WEBSITE

ANNUAL REPORT Visit us at project.eu Supported by. Mission

Temporal Data Mining of Scientific Data Provenance

Re-using Data Mining Workflows

TWC LOGD: A Portal for Linking Open Government Data

Mapping Semantic Workflows to Alternative Workflow Execution Engines

Application of Named Graphs Towards Custom Provenance Views

Public Health Surveillance Using Global Health Explorer

Content Management for the Defense Intelligence Enterprise

On the use of Abstract Workflows to Capture Scientific Process Provenance

Visualization Architecture for User Interaction with Dynamic Data Spaces in Multiple Pipelines

Opus: University of Bath Online Publication Store

Combining Generality and Specificity in Generating Hypermedia Interfaces for Semantically Annotated Repositories

User Interfaces for Portable Proofs

Next Generation Semantic Data Environments (or Linked Data, Semantics, and Standards in Scientific Applications)

A Semantic Web Approach to the Provenance Challenge

HyperSD: a Semantic Desktop as a Semantic Web Application

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Trustworthiness of Data on the Web

IBM Research Report. Overview of Component Services for Knowledge Integration in UIMA (a.k.a. SUKI)

Ontology Extraction from Heterogeneous Documents

An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery

Analysis Traceability and Provenance for HEP

Sewelis: Exploring and Editing an RDF Base in an Expressive and Interactive Way

A SEMANTIC MATCHMAKER SERVICE ON THE GRID

Metadata Quality Assessment: A Phased Approach to Ensuring Long-term Access to Digital Resources

Extracting Causal Graphs from an Open Provenance Data Model

Extending Blaise Capabilities in Complex Data Collections

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web

Development of Contents Management System Based on Light-Weight Ontology

Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Porting Social Media Contributions with SIOC

Web Data mining-a Research area in Web usage mining

PROJECT PERIODIC REPORT

Provenance Collection Support in the Kepler Scientific Workflow System

Adaptive Personal Information Environment based on the Semantic Web

Semantically Enhanced Hypermedia: A First Step

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Ontology-based Architecture Documentation Approach

From Online Community Data to RDF

The Semantic Planetary Data System

DBpedia-An Advancement Towards Content Extraction From Wikipedia

38050 Povo Trento (Italy), Via Sommarive 14 IWTRUST: IMPROVING USER TRUST IN ANSWERS FROM THE WEB

Pilot: Collaborative Glossary Platform

Analysis of Approaches for Supporting the Open Provenance Model: A Case Study of the Trident Workflow Workbench

> Semantic Web Use Cases and Case Studies

Hyperdata: Update APIs for RDF Data Sources (Vision Paper)

Semantic Social Profile a Semantic Boost for the Social Information in MediaWiki

Amani Abu Jabal 1 Elisa Bertino 2. Purdue University, West Lafayette, USA 1. 2

Harvesting RDF Triples

RADX - Rapid development of web applications in XML

Open Data Search Framework based on Semi-structured Query Patterns

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Community Mining Tool using Bibliography Data

Provenance Framework for the Cloud Environment (IaaS)

CHALLENGES IN ADAPTIVE WEB INFORMATION SYSTEMS: DO NOT FORGET THE LINK!

Simile Tools Workshop Summary MacKenzie Smith, MIT Libraries

Scientific Workbench and Workflow management for GIS workflow

Software + Services for Data Storage, Management, Discovery, and Re-Use

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

Data Provenance in Distributed Propagator Networks

DATABASE MANAGEMENT SYSTEMS

Transcription:

Provenance-aware Faceted Search in Drupal Zhenning Shangguan, Jinguang Zheng, and Deborah L. McGuinness Tetherless World Constellation, Computer Science Department, Rensselaer Polytechnic Institute, 110 8 th Street, Troy, NY 12180, U.S.A {shangz,zhengj3,dlm}@cs.rpi.edu Abstract. As the web content is increasingly generated in more diverse situations, provenance is becoming more and more critical. While a variety of approaches have been investigated for capturing and making use of provenance metadata, arguably no single best-practice approach has emerged. In this paper, we investigate an approach that leverages one of the most popular content management systems Drupal. More specifically, we study how provenance metadata can be captured and later published as structured data on the Web using Drupal. We also demonstrate how provenance metadata can be used to facilitate faceted search in Drupal. Keywords: Provenance, Faceted Search, PML, Drupal 1 Introduction Information on the Web is increasingly generated using a wide variety of diverse sources. It is also pointed out in [1] that capturing provenance both within and across systems, and publishing that provenance provides potential for many benefits, such as tracing audit trails of data, reproducing scientific experimental results, finding useful information, evaluating data quality, establishing information accountability, etc. Although there have been numerous previous research efforts aimed at representing and tracking provenance in both closed and open systems, even at different levels of granularity, many of the approaches have certain limitations and inflexibilities that result from decisions to target a specific system, application, or scenario. Moreover, historically provenance research has often focused on capturing provenance metadata; currently there is an increasing interest in studying how provenance can be used in different ways. With the goal of exploring the issues mentioned above, this short paper describes some of our ongoing efforts related to provenance using a Drupal-based solution. Our proposed demonstration will highlight two areas: Representing and publishing provenance metadata. In this direction, we are exploring the configurability and extensibility of Drupal. Our approach is to create a provenance-aware Drupal-based platform for web applications. If we view Drupal as a generic platform based on which a variety of systems and applica-

2 Zhenning Shangguan, Jinguang Zheng, and Deborah L. McGuinness tions can be built, then the proposed approach immediately has a broad impact on all these applications, resulting in an increased number of provenance-aware web applications. Providing provenance-aware faceted search. We are designing and implementing a Drupal-based faceted search that can search over metadata and use provenance to help inform search results and help filter results. We will demonstrate how provenance-aware search can be more efficient, and provide insight into ranking and presentation options. We also will expose how provenance facets, such as temporal facets related to the creation and modification time of the Drupal contents, are being used in our search functionality. The rest of the paper is organized as follows. Section 2 highlights the related work. Section 3 demonstrates our initial effort to capture, encode, and publish provenance information as structured RDF. Section 4 describes how to make use of these provenance metadata to facilitate faceted search. Finally, we conclude the paper and outline some future work in section 5. 2 Related Work Four areas of work are considered related to the topic of our paper. There is a diverse literature on systems and applications that are capable of capturing provenance metadata, for example, Taverna [2], VisTrails [3], REDUX [4], Pegasus [5], Karma [6]. While they successfully demonstrate different approaches of capturing provenance, they can be viewed as having an application-dependent nature and thus they can be less flexible and less extensible when applications differ greatly from those that these approaches were designed to satisfy. Our work differs from these approaches in that we are investigating mechanisms to capture and publish provenance using Drupal, which has the potential to serve as the foundation of an application-independent solution. Another spectrum of related research is the generic provenance models and vocabularies, most notably the Open Provenance Model (OPM) [1], Provenance Markup Language (PML) [7], and the Provenance Vocabulary for the Web of Linked Data [8] [13]. While providing different vocabularies for representing provenance, there are certain conceptual overlaps between them. For example, both OPM and Provenance Vocabulary have similar basic concepts, i.e., Agents (OPM) and Actors (Provenance Vocabulary) denoting people, Processes (OPM) and Executions (Provenance Vocabulary) representing executions of actions or processes, and Artifacts (both OPM and Provenance Vocabulary) representing the entity produced or manipulated. Currently, we are using PML to encode the provenance metadata. However, supporting provenance representations using different domain-independent provenance vocabularies is planned as one of our future work areas. Faceted search for exploration has been widely studied over the past years. Numerous research efforts [9] [10] [11] have demonstrated benefits including usability and flexibility of faceted browsers when interacting with structured data (e.g., relational databases, XML, RDF). In this paper, we are making use of the generated

Provenance-aware Faceted Search in Drupal 3 provenance metadata to facilitate faceted search and help locate the desired information. Drupal, one of the most popular CMS systems, has been widely deployed. Recently, there is an emerging effort from both the Semantic Web community and the Drupal development community to enable Drupal to publish semantic metadata (e.g., RDF, RDFa) [12]. Our implementation makes use of some of the Drupal modules introduced in [12] to create provenance related node fields. We are also developing Drupal module of our own to publish provenance metadata encoded in PML. 3 Capturing and Publishing Provenance in Drupal The information that Drupal 1 organizes and manages is called content. Usually, a piece of content in Drupal corresponds to a single node (in the form of a page) that has a title, an optional body text description, and perhaps several additional fields. Every node also belongs to a particular content type, such as Person, Blog and etc. The site administrator uses the Content Construction Kit (CCK) 2 to create a node by specifying its content type (e.g., Person), title (e.g., name of the person), body text (e.g., short bio of the person), and fields (e.g., first name, last name, email of the person). Furthermore, the RDF CCK 3 module extends the functionality of CCK to enable the definition of mappings between: 1) a specific content type (e.g., Person) and an RDF class (e.g., foaf:person), and 2) a node field (e.g., field_firstname) and an RDF property (e.g., foaf:givenname). Currently, our initial implementation of the Drupal provenance module is capable of capturing provenance at two levels of granularity. Node-level: This level focuses on the provenance metadata associated with a node in Drupal, such as who created the node, when the node was first created, and when the node was last modified. Content-level: This level keeps track of the provenance metadata associated with all the revisions of a node in Drupal, such as who modified the body text of the node and changed the values of the fields, when these modifications and changes happened, as well as the texts/values after every revision. The benefit of having both the node-level and content-level provenance is that the former captures the basic provenance metadata about the node while the latter keeps track of the detailed edit history information. The implementation of capturing node-level provenance is straightforward: we can use CCK and RDF CCK to define several provenance related fields and map them to the RDF properties from our chosen provenance vocabulary. Currently, we define two fields for every node in Drupal and map them to pmlp:hascreationdatetime and 1 We are using Drupal version 6.16 and PHP 5.2 at the time of this writing. Unless explicitly stated, this holds for all the discussions throughout this paper. 2 Content Construction Kit (CCK) module: http://drupal.org/project/cck. 3 RDF CCK module: http://drupal.org/project/rdfcck.

4 Zhenning Shangguan, Jinguang Zheng, and Deborah L. McGuinness pmlp:hasmodificationdatetime, representing when the node was first created and last modified respectively. In contrast to capturing node-level provenance, keeping track of the content-level provenance is not natively supported in Drupal. Thus we develop our own provenance module 4, making use of various Drupal core hook function and core API. Besides capturing provenance metadata, our module is also capable of publishing these metadata as RDF on the Web. To access both the node-level and the contentlevel provenance metadata, users can follow the URL pattern http://your_domain_name/drupal-dir/node/node-id/pml for every node in a standard Drupal installation without clean URLs 5 turned on. Currently we are only using PML to represent the provenance metadata, with support for OPM and Provenance Vocabulary in progress. Figure 1 shows the exported provenance metadata for a node in our experimental Drupal installation. Fig. 1. Exported Provenance Metadata for a Drupal node 4 Provenance-aware Faceted Search in Drupal Being able to capture both the node-level and content-level provenance immediately brings about a lot of benefits, especially for provenance-ware search in Drupal. Currently our provenance-aware faceted search is implemented using the Exhibit module 6 of Drupal. More specifically, we leverage the functionality of it to build the faceted search interface, with some facets generated from the node-level provenance metadata, such as pmlp:hascreationdatetime and pmlp:hasmodificationdatetime. Our initial implementation is demonstrated in Figure 2, with the Creation Time facet generated from the pmlp:hascreationdatetime node field. 4 The source code for our provenance module can be found at http://tw2.tw.rpi.edu/drupaldev/provenance.zip. 5 Drupal clean URLs: http://drupal.org/getting-started/clean-urls. The provenance metadata can also be accessed via a similar URL pattern by appending /pml to the end of a node URL. 6 Exhibit Drupal module: http://drupal.org/project/exhibit.

Provenance-aware Faceted Search in Drupal 5 Fig. 2. Initial implementation of provenance-aware faceted search 5 Conclusion and Future Work In this paper we described some of our ongoing effort to capture and publish provenance metadata in Drupal. With the help of various Drupal modules, such as CCK and RDF CCK, node-level provenance can be captured by defining node fields and establishing mappings between them and RDF properties in the chosen provenance model, which is PML in our case. We also developed a provenance module in order to capture content-level provenance, which can be used to trace the revision history of nodes in a deployed Drupal website. Finally, our initial effort to facilitate faceted search with the help of the captured provenance metadata is also presented. We plan to provide a demonstration of our provenance-aware faceted search in one (or more) of our escience applications at IPAW. We identify two general directions for future work. First, our current implementation supports provenance encodings using only PML, without giving the user the ability to choose which provenance model to use. We plan to further enable the user to specify their desired provenance model when exporting the provenance metadata. Second, at present only the node-level provenance is exploited in the faceted search. However, content-level provenance might also contain useful properties to generate facets, such as revision time, date, and the user who made that revision. Moreover, content-level provenance can also be used to generate the edit history of the contents managed by Drupal. Therefore, another important aspect of our future work is to take advantage of the content-level provenance to further improve faceted search and generate edit history information of the nodes (pages) in Drupal. We also plan to leverage provenance data to help filter search results when result sets are large. References 1. Moreau, L.: The Foundations for Provenance on the Web. J. Foundations and Trends in Web Science (2009) 2. Zhao, J., Goble, C.A., Stevens, R., and Turi, D.: Mining Taverna s semantic web of provenance. Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 463 472 (2008)

6 Zhenning Shangguan, Jinguang Zheng, and Deborah L. McGuinness 3. Freire, J., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., and Vo, H.T.: Managing rapidly-evolving scientific workflows. In: IPAW2006, pp. 10 18 (2006) 4. Barga, R.S. and Digiampietri, L.A.: Automatic capture and efficient storage of e- science experiment provenance. Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 419 429 (2008) 5. Kim, J., Deelman, E., Gil, Y., Mehta, G., and Ratnakar, V.: Provenance trails in the wings/pegasus system. Concurrency and Computation: Practice and Experience, vol. 20, no. 5, pp. 587 597 (2008) 6. Simmhan, Y.L., Plale, B., and Gannon, D.: Karma2: Provenance management for data-driven workflows. Int. J. Web Service Res., vol. 5, no. 2, pp. 1 22 (2008) 7. McGuinness, D.L., Ding, L., Pinheiro da Silva, P., and Chang, C.: PML2: A modular explanation Interlingua. In: ExaCt (2007) 8. Hartig, O and Zhao, J.: Using Web Data Provenance for Quality Assessment. In Proceedings of the 1st Int. Workshop on the Role of Semantic Web in Provenance Management (SWPM) at ISWC, Washington, DC, USA (2009) 9. Hildebrand, M., van Ossenbruggen, J., Hardman, L.: /facet: A Browser for Heterogeneous Semantic Web Repositories. International Semantic Web Conference, pp272-285 (2006) 10. Oren, E., Delbru, R., Decker, S.: Extending Faceted Navigation for RDF Data. The 5th International Semantic Web Conference (ISWC2006), pp559-572 (2006) 11. schraefel, m.c., Karam, M., Zhao, S.: mspace: interaction design for userdetermined, adaptable domain exploration in hypermedia. AH 2003: Workshop on Adaptive Hypermedia and Adaptive Web Based Systems, pp21-235 (2003) 12. Corlosquet, S., Delbru, R., and Clark, T., Polleres, A. and Decker, S.: Produce and Consume Linked Data with Drupal! In proceedings of 8th International Semantic Web Conference (ISWC2009) 13. Hartig, O.: Provenance Information in the Web of Data. In Proceedings of the Linked Data on the Web Workshop (LDOW 09), Madrid, Spain