From Online Community Data to RDF

Similar documents
Porting Social Media Contributions with SIOC

W3C Workshop on the Future of Social Networking, January 2009, Barcelona

SIOC Ontology: Applications and Implementation Status

Weaving SIOC into the Web of Linked Data

Social Networks and Data Portability using Semantic Web technologies

SIOC browser- towards a richer blog browsing experience. Bojars, Uldis; Breslin, John G.; Passant, Alexandre

Sindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications.

An Architecture to Discover and Query Decentralized RDF Data

Mapping between Digital Identity Ontologies through SISM

Combining RDF Vocabularies for Expert Finding

Produce and Consume Linked Data with Drupal!

Towards an Interlinked Semantic Wiki Farm

A Prototype to Explore Content and Context on Social Community Sites

Using Linked Data to Build Open, Collaborative Recommender Systems

Topic Classification in Social Media using Metadata from Hyperlinked Objects

R2RML by Assertion: A Semi-Automatic Tool for Generating Customised R2RML Mappings

int.ere.st: Building a Tag Sharing Service with the SCOT Ontology

MonarchPress Software Design. Green Team

Springer Science+ Business, LLC

Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available.

Ylvi - Multimedia-izing the Semantic Wiki

Enabling cross-wikis integration by extending the SIOC ontology

PAD: A Semantic Social Network

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

A Privacy Preference Ontology (PPO) for Linked Data

Programming the Semantic Web

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

PECULIARITIES OF LINKED DATA PROCESSING IN SEMANTIC APPLICATIONS. Sergey Shcherbak, Ilona Galushka, Sergey Soloshich, Valeriy Zavgorodniy

The Emerging Web of Linked Data

An Annotation Tool for Semantic Documents

An Archiving System for Managing Evolution in the Data Web

Creating and Consuming Metadata from Transcribed Historical Vital Records for Ingestion in a Long-term Digital Preservation Platform (short paper)

The Open Government Data Stakeholder Survey

Meaning Of A Tag: A collaborative approach to bridge the gap between tagging and Linked Data

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB

Semantic Adaptation Approach for Adaptive Web-Based Systems

Web 3.0. Presented to. Prof. Dr. Eduard Heindl E Business Technology Hochschule Furtwangen University

Extended Identity for Social Networks

RDF Support in the Virtuoso DBMS

Proposal for Implementing Linked Open Data on Libraries Catalogue

Semantic Annotation and Linking of Medical Educational Resources

WebGUI & the Semantic Web. William McKee WebGUI Users Conference 2009

4 The StdTrip Process

How to Choose a Website Content Management System

Semantic Community Portals

W3C Workshop on RDF Access to Relational Databases October, 2007 Boston, MA, USA D2RQ. Lessons Learned

Prof. Dr. Christian Bizer

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

A Linked Data Translation Approach to Semantic Interoperability

Accessing information about Linked Data vocabularies with vocab.cc

Payola: Collaborative Linked Data Analysis and Visualization Framework

A Model for Integration and Interlinking of Idea Management Systems

Publishing Linked Statistical Data: Aragón, a case study.

A Framework for Performance Study of Semantic Databases

CONTENTDM METADATA INTO LINKED DATA

The Linked Data Value Chain: A Lightweight Model for Business Engineers

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Integrating Web 2.0 Data into Linked Open Data Cloud via Clustering

Linked Data Evolving the Web into a Global Data Space

The Linked Data Value Chain Model: A Methodology for Information Integration and Orchestration

COLLABORATIVE EUROPEAN DIGITAL ARCHIVE INFRASTRUCTURE

Semantic Web: vision and reality

SWSE: Objects before documents!

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

WikiOnt: An Ontology for Describing and Exchanging Wikipedia Articles

MRK260. Week Two. Graphic and Web Design

How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008

Webinar Annotate data in the EUDAT CDI

Labelling & Classification using emerging protocols

A Community-Driven Approach to Development of an Ontology-Based Application Management Framework

RDF for Life Sciences

An overview of RDB2RDF techniques and tools

Drupal 7 No Schema Type For Mysql Type Date

Prof. Dr. Christian Bizer

Open Research Online The Open University s repository of research publications and other research outputs

Information Retrieval (IR) through Semantic Web (SW): An Overview

Comparative Study of RDB to RDF Mapping using D2RQ and R2RML Mapping Languages

PHP & PHP++ Curriculum

Collaboration. Problems in collaboration. The solution

Linked Data Demystified: Practical Efforts to Transform CONTENTdm Metadata for the Linked Data Cloud

Interlinking Media Archives with the Web of Data

Harvesting Desktop Data for Semantic Blogging

The NEPOMUK project. Dr. Ansgar Bernardi DFKI GmbH Kaiserslautern, Germany

Hyperdata: Update APIs for RDF Data Sources (Vision Paper)

DCMI Abstract Model - DRAFT Update

> Semantic Web Use Cases and Case Studies

Data Republishing on the Social Semantic Web

The Linking Open Data Project Bootstrapping the Web of Data

Enrichment of Sensor Descriptions and Measurements Using Semantic Technologies. Student: Alexandra Moraru Mentor: Prof. Dr.

Review and Alignment of Tag Ontologies for Semantically-Linked Data in Collaborative Tagging Spaces

4 th Linked Data on the Web Workshop (LDOW 2011)

SEVENMENTOR TRAINING PVT.LTD

Revisiting Blank Nodes in RDF to Avoid the Semantic Mismatch with SPARQL

Role of Social Media and Semantic WEB in Libraries

A SEMANTIC MATCHMAKER SERVICE ON THE GRID

Webomania Solutions Pvt. Ltd Guidelines to Handling a Website using Drupal Platform

Benchmarking RDF Production Tools

Transcription:

From Online Community Data to RDF Abstract Uldis Bojārs, John G. Breslin [uldis.bojars,john.breslin]@deri.org Digital Enterprise Research Institute National University of Ireland, Galway Galway, Ireland Large amounts of data are created within online community sites (forums, blogs, etc.). These can serve as a valuable source of information for web users, and usually contain rich meta-information. Most of this information is stored in relational databases, but unfortunately remains locked into these databases and cannot be used by other applications. The SIOC project is aimed at providing guidelines for making this information available on the Web and for using this information for connecting online community sites together. SIOC aims to let other sites know more about the structure and contents of online communities, and to make more use of tagging and semantic metadata in these sites. This position paper describes the approach we have adopted for making online community site data available in RDF from many applications, and we will illustrates it through the example of a SIOC export tool for b2evolution blog engine. As opposed to extracting data directly from a relational database, we attempt to tie our RDF data producers into the associated application logic for each system and reuse built-in functions and APIs where possible to generate RDF data. 1. Introduction The Social Web contains large volumes of content (blog posts, reviews, etc.) posted on online community sites (such as blogs, wikis and bulletin boards). These sites allow users to gather online, create content and enter into discussions. They contain rich metadata about content items and people creating them but most of this data are locked in HTML markup and not available for reuse without scraping the markup. In order to facilitate intelligent reuse of the information contained within these sites we need a data format that better suited for the task. Semantically-Interlinked Online Communities (SIOC) [Breslin2005] is a project aimed at interconnecting online community sites by making their information available in a machine-readable form. A rich data model is needed if we are to express full information about the content and structure of these sites. The SIOC project defines an ontology for describing this information in RDF and provides several open source SIOC RDF exporters. Online community sites typically run content management systems (CMSs) which consist of a relational database (e.g., MySQL) and a presentation layer for displaying content to visitors. If that is the case, the task of a SIOC export tool is to retrieves information from a relational database and export it in RDF. This position paper is based on the experience of SIOC developers community in exporting RDF from online community sites. We describe an application logic based or

indirect approach for exporting RDF, used by many SIOC export tools, and illustrate it on the example of a SIOC exporter for b2evolution blog engine. The rest of the paper is organized as follows: Section 2 describes characteristics of online community sites; Section 3 identifies different approaches for exposing relational databases in RDF; Section 4 illustrates our approach on an example of a SIOC RDF export tool; and Section 5 concludes the paper. 2. Characteristics of Online Community Sites The following characteristics of online community sites make the indirect approach described in this paper well-suited for them. Extensible. Most of the online community site engines are built to be extensible and provide well-documented APIs for use by plugin developers. Most of them are also open source (b2evolution, Drupal, B2evolution, etc.). This enables us to use functions and API calls provided when building RDF export tools. Dynamically Evolving. At the same time these engines may have very fast development cycles with approximately one major release per year and often many more minor version changes. While these changes may affect both the database schema and public APIs, the latter is usually kept stable. Changes to the functions and APIs to be used by other developers are kept to minimum and well documented. The same is not always true for database schema changes. Large installation base, all over the Web. This software has many installations by web users who are not experts in software development. These users may still want to enable RDF export from their sites provided that this functionality is simple to install and does not require a large effort or specific knowledge. Many people use web hosting providers which may limit what software they are allowed to install. Most of these sites currently store their data in relational databases and therefore the task of expressing information from these sites in RDF can be viewed as a special case of expressing relational database data in RDF. The characteristics described above may make it challenging to use other approaches such as direct mappings from relational databases to RDF because of a high risk of database structure changes, limitations of web hosting provides, etc. At the same time the extensible, open-source nature of these content management systems make it possible to access data at a higher abstraction level, using existing application logic. 3. Methods for Exposing Relational Databases in RDF We will consider two approaches for exposing relational databases in RDF direct (mappings from relational database schema) and indirect (using the application logic to access data). Direct Mapping. Most of existing work for exposing relational databases in RDF [Bizer2006, Erling2006, TimBL2006], consider direct mappings from relation database schema to RDF. This generic approach can be useful in many cases, but in the case of online community sites it may lead to difficulties in keeping up with changes in the database structure and also in installation and use by inexperienced users.

Using the Application Logic to Access Data. In this position paper we describe an approach adopted by many SIOC export tools using APIs and the application logic provided by content management systems as a source of information to be exported in RDF. Authors of CMS software are encouraging developers of plugins to use these APIs and not to access the database directly. By doing so the developers of RDF export applications are shielded from any changes to the storage layer as long as the interface remains the same, and can deploy their applications as simple plugins for content management systems. In a generic scenario you may not always have a choice and direct access to the database can be the only solution. The characteristics of web applications described above enable another option by providing APIs and function calls for access to data. By using application logic to retrieve data we can also make use of caching, data access protection and other functionality built into the application. To further abstract from specific solutions we can consider two choices using direct access to the database vs. using the application logic to access data, and declarative vs. procedural method of converting data to RDF: Declarative Procedural Direct (database) I II Indirect (application logic) III IV Most of existing methods using direct mappings from database schema to RDF correspond to Quadrant I on this table. A generic problem setting allows to create generic solutions and separate software doing the mapping from the mappings themselves. Some of the existing solutions may also be in Quadrant II, but this distinction is not of particular interest for this paper. Quadrant III (a hybrid approach) - generating RDF by declaring rules for mapping data accessed via the application logic - may be interesting for future exploration. Quadrant IV represents the method described in this paper using a procedural approach and existing application logic to access data. This indirect approach is application specific (unless there is a standard API used by all applications) and can be more difficult to express in a declarative form. While direct mappings can achieve greater performance they may require more maintenance because they do not use the application logic and database access abstraction layer already provided by many applications. 4. Example: SIOC export plugin for b2evolution The SIOC initiative provides on the one hand a SIOC Core Ontology (W3C member submission published on July 31, 2007 [1]) that can be used to describe information about contents and structure of online community sites in RDF, and secondly, several SIOC export applications for blogs, forums, and mailing lists [2]. SIOC API for PHP was also created in order to make development of such SIOC plugins and exporters as easy as possible. It shields developers from technical details of how information is represented in RDF they are operating at the level of SIOC concepts instead. Thus, developers only have to extract content from the database (handled by the internal logic of CMS) and pass it to the API that will render RDF data. The architecture of b2evolution's SIOC export plugin (Figure 1) illustrates the

application logic based approach. Information is contained in a MySQL database. The plugin uses existing b2evolution's functions to access information in the database, which is then passed to a SIOC export API for PHP to generate RDF/XML output. Figure 1: Architecture of the SIOC RDF export plugin for b2evolution Sometimes an API call needed for the SIOC exporter is not provided by the CMS engine. Then we have to fall back to accessing the database directly. For example, SIOC export plugins for b2evolution and WordPress each contain one direct query to the database, in both cases having to do with user account information. In our experience such code is the first to break as the CMS engine evolves. A possible solution is to eliminate direct database access by asking CMS developers to include API calls for requesting the required information. 5. Conclusion This position paper described an approach adopted by many SIOC RDF export tools using the application logic provided by content management systems in order to access data stored within a relational database and generate RDF data. When compared with a traditional approach of creating mappings from relational database schema to RDF, using the existing application logic ensures a tighter integration with a content management system, an increased resistance to database schema changes as the software evolves and allows us to use existing facilities of CMS engines such as caching and access control. Web applications such as online community site engines can be regarded as a special

case where function calls are well documented and development of extensions is encouraged. This enables us to use existing API calls when exporting data in a machine-readable form. Two different approaches (direct and indirect) for generating RDF gives us a choice and further research may be needed to determine what is the best decision in each situation. We only looked at exporting specific information contained within online community sites and did not aim at answering arbitrary queries over RDF. When in need to answer arbitrary SPARQL queries, a hybrid approach - using the application logic to access the database and defining mappings from these API calls to RDF - may be interesting. Acknowledgments This material is based upon work supported by the Science Foundation Ireland under Grant No. SFI/02/CE1/I131. We gratefully acknowledge Conor Hayes for his feedback on this paper. References [Bizer2006] C. Bizer, R. Cyganiak, J. Garbers, O. Maresch, editors. "D2RQ V0.5 - Treating Non-RDF Relational Databases as Virtual RDF Graphs", 2006, http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/spec/ [Breslin2005] J.G. Breslin, A. Harth, U. Bojārs, and S. Decker. "Towards Semantically-Interlinked Online Communities". In The 2nd European Semantic Web Conference (ESWC 05), Heraklion, Greece, Proceedings, May 2005, http://sioc-project.org/publications. [Erling2006] O. Erling, I. Mikhailov. "Mapping Relational Data to RDF in Virtuoso", OpenLink Software, 2006, http://virtuoso.openlinksw.com/wiki/main/main/vossqlrdf [TimBL2006] T. Berners-Lee. "Relational Databases on the Semantic Web", Design Issues for the World Wide Web, 2006, http://www.w3.org/designissues/rdb-rdf [1] Semantically-Interlinked Online Communities (SIOC) Ontology Submission Request to W3C [2] SIOC Ontology: Applications and Implementation Status