DBpedia-An Advancement Towards Content Extraction From Wikipedia

Similar documents
The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Comparative Study of RDB to RDF Mapping using D2RQ and R2RML Mapping Languages

16th International World Wide Web Conference Developers Track, May 11, DBpedia. Querying Wikipedia like a Database

The Data Web and Linked Data.

Enriching an Academic Knowledge base using Linked Open Data

The German DBpedia: A Sense Repository for Linking Entities

Proposal for Implementing Linked Open Data on Libraries Catalogue

University of Rome Tor Vergata DBpedia Manuel Fiorelli

Semantic Web Fundamentals

Building DBpedia Japanese and Linked Data Cloud in Japanese

The Linked Data Value Chain: A Lightweight Model for Business Engineers

August 2012 Daejeon, South Korea

Augmenting Video Search with Linked Open Data

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty

DBpedia Extracting structured data from Wikipedia

Orchestrating Music Queries via the Semantic Web

Man vs. Machine Dierences in SPARQL Queries

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Applying Linked Data Technologies for Online Newspapers

The Emerging Web of Linked Data

Linked Data Evolving the Web into a Global Data Space

Federated Search Engine for Open Educational Linked Data

SPARQL Protocol And RDF Query Language

SPARQL Protocol And RDF Query Language

Semantic Web Fundamentals

SRI International, Artificial Intelligence Center Menlo Park, USA, 24 July 2009

DBPedia (dbpedia.org)

Semantic Web and Linked Data

Deliverable DBpedia-Live Extraction

JENA: A Java API for Ontology Management

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Linked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011

Knowledge Representation in Social Context. CS227 Spring 2011

BUILDING THE SEMANTIC WEB

Integrating Web 2.0 Data into Linked Open Data Cloud via Clustering

LINKED DATA - A MULTIPLE REPRESENTATION DATABASE AT WEB SCALE?

Introduction. October 5, Petr Křemen Introduction October 5, / 31

Semantic Web and Natural Language Processing

OWLIM Reasoning over FactForge

Grid Resources Search Engine based on Ontology

Meta Search Engine Powered by DBpedia

Semantic Web Systems Linked Open Data Jacques Fleuriot School of Informatics

Semantic Web. Tahani Aljehani

Linking Spatial Data from the Web

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

THE GETTY VOCABULARIES TECHNICAL UPDATE

Programming the Semantic Web

Linking Thesauri to the Linked Open Data Cloud for Improved Media Retrieval

Evaluating Class Assignment Semantic Redundancy on Linked Datasets

Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach

Reimplementing the Mathematics Subject Classification (MSC) as a Linked Open Dataset

Publishing a Scorecard for Evaluating the Use of Open-Access Journals Using Linked Data Technologies

LESS - Template-Based Syndication and Presentation of Linked Data

The Open Government Data Stakeholder Survey

Linking Entities in Chinese Queries to Knowledge Graph

Enhancing Security Exchange Commission Data Sets Querying by Using Ontology Web Language

Building a Large-Scale Cross-Lingual Knowledge Base from Heterogeneous Online Wikis

Web Ontology for Software Package Management

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

Programming Technologies for Web Resource Mining

Mapping between Digital Identity Ontologies through SISM

A Community-Driven Approach to Development of an Ontology-Based Application Management Framework

Linked Data and RDF. COMP60421 Sean Bechhofer

Combining Government and Linked Open Data in Emergency Management

a paradigm for the Semantic Web Linked Data Angelica Lo Duca IIT-CNR Linked Open Data:

Extracting knowledge from Ontology using Jena for Semantic Web

DBpedia As A Formal Knowledge Base An Evaluation

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

A rule-based approach to address semantic accuracy problems on Linked Data

Using Linked Data to Reduce Learning Latency for e-book Readers

The Linking Open Data Project Bootstrapping the Web of Data

Linked Data Practices for the Geospatial Community

Reducing Consumer Uncertainty

Porting Social Media Contributions with SIOC

The Semantic Web DEFINITIONS & APPLICATIONS

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

Digital Public Space: Publishing Datasets

PECULIARITIES OF LINKED DATA PROCESSING IN SEMANTIC APPLICATIONS. Sergey Shcherbak, Ilona Galushka, Sergey Soloshich, Valeriy Zavgorodniy

How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008

Using linked data to extract geo-knowledge

An overview of RDB2RDF techniques and tools

A Developer s Guide to the Semantic Web

The R2R Framework: Christian Bizer, Andreas Schultz. 1 st International Workshop on Consuming Linked Data (COLD2010) Freie Universität Berlin

CodeOntology: RDF-ization of Source Code

Architecture and Applications

Computer-assisted Ontology Construction System: Focus on Bootstrapping Capabilities

Understanding Billions of Triples with Usage Summaries

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Data Integration and Fusion using RDF

Silk Server Adding missing Links while consuming Linked Data

An Implementation of LOD Instance Development System using Schema- Instance Layer Separation

LODatio: A Schema-Based Retrieval System forlinkedopendataatweb-scale

Linked data implementations who, what, why?

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Accessing information about Linked Data vocabularies with vocab.cc

Exploring and Using the Semantic Web

Semantic Web: Core Concepts and Mechanisms. MMI ORR Ontology Registry and Repository

4 th Linked Data on the Web Workshop (LDOW 2011)

State of the Art of Semantic Web

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs

Transcription:

DBpedia-An Advancement Towards Content Extraction From Wikipedia Neha Jain Government Degree College R.S Pura, Jammu, J&K Abstract: DBpedia is the research product of the efforts made towards extracting structured content from the data obtainable from the largest web encyclopaedia i.e. Wikipedia. It is a project aiming towards making the information machine readable. DBpedia makes available the content in the form of RDF (Resource Description Framework) triplets which can be easily queried using query languages like SPARQL. The paper highlights this query making procedure. The research work also includes the role of DBpedia in Linked Data initiative. Keywords: DBpedia; Wikipedia; SPARQL; Jena; Knowledge Extraction. Introduction DBpedia is the outcome of work undertaken towards extracting structured content form the content which is available on the largest web encyclopedia i.e. Wikipedia. It is a project aiming towards making the information machine readable. DBpedia provides the content in the form of RDF triplets which can be easily queried using query languages like SPARQL. RDF is a flexible data model for representing extracted content and for publishing it on the Web. A number of end points are available which can be used to query DBpedia. In the presented research work, DBpedia data has been queried and processed using Apache Jena toolkit in JAVA. Apache Jena is a free and open source framework for building Semantic Web and Linked Data Applications[1]. The paper highlights this query making procedure. The research work also includes the role of DBpedia in Linked Data initiative. About DBpedia The DBpedia initiative has successfully captured a richly loaded knowledge base from its parent site Wikipedia. This knowledge base serves as Linked Data for other initiatives on the Web. Because of its volume, DBpedia is considered to be at the core of the Linked Data Cloud. DBpedia knowledge base to the present date provides 274 million pieces of information pertaining to 2.6 million concepts. As DBpedia is highly diverse and expanded over a wide range of domains, it has a considerable amount of overlap with various other data-sets already available on the World Wide Web. Many data publishers have taken to linking their data sources to DBpedia to enhance interoperability and accessability of data. This has made DBpedia occupy a central position j.neha10@gmail.com * Neha Jain 99

on the Linked Open Data cloud. The BBC is one such organization working in this direction. BBC is working towards using DBpedia as the semantic backbone and controlled vocabulary for its various domains. Related work In their work Christian Bizer et.al. described the extraction of the DBpedia knowledge base, the urgent status of interlinking DBpedia with other sources on the World Wide Web and also presented a general idea of various applications that assist the Linked Open Data concept with DBpedia. [2] S. Auer et al. [3] described the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. They also highlighted some promising applications from the DBpedia community and also presented how website authors can facilitate DBpedia content within their sites. They also presented the current status of interlinking DBpedia with other open datasets on the Web and outlined the method for DBpedia to serve as a nucleus for an emerging Web of open data. G. Kobilarov et al. [4] described how the BBC is working to integrate data and linking documents across BBC domains by using Semantic Web Technologies and Linked open data and in particular DBpedia. LehmannJens et al. [5] presented an overview of the DBpedia community project including its architectur, technical implementation, maintenance, internationalization, usage statistics and applications. S. Hellmann et al. [6]described tools and techniques to use existing techniques to achieve scalability on large knowledge bases available as SPARQL endpoints or Linked Data. The algorithms developed by them are made available in the open source DL-Learner project and can be used in real-life scenarios by Semantic Web applications. DBpedia Knowledge Extraction Model Wikipedia is an online encyclopedia which mostly consists of free text to be used by human users. It also contains some information in structured form. Structured text is in the form of infobox templates, categorization information, images, geo-coordinates, hyperlinks etc. The DBpedia Project extracts the updated structured information from Wikipedia and compiles it into a knowledge base. 100

The DBpedia Ontology is a thin, cross-domain ontology, created by hand-picking information from infoboxes within Wikipedia. The ontology presently consists of 685 classes, arranged in a subsumption hierarchy explained by 2,795 different properties. DBpedia presently consists of eleven extractors which process the following types of Wikipedia content: Labels: The title of the Wikipedia articles can be accessed through this as rdfs: label. Abstracts: a short abstract which is generally the first paragraph.this can be accessed using the property DBpedia: abstract. Interlanguage links: These are the links that connect articles about the same topic in different language editions of Wikipedia. These links assign labels and abstracts in varied languages to resources available in DBpedia. Images: Images are represented as foaf: depiction property. These are the links pointing to resources at Wikimedia Commons images. Redirects: The redirection links in Wikipedia are extracted in DBpedia and are used to resolve references between DBpedia resources. Disambiguation: The disambiguation pages of Wikipedia are used to explain the different meanings of similarly spelled words. The disambiguation links are extracted and represented using predicate DBpedia: diambiguates. External links: The links to external articles which are present on Wikipedia pages are represented in DBpedia using the property DBpedia: reference. owl: same As property is used to depict RDF links. Pagelinks: All the links between Wikipedia articles are extracted and represented in DBpedia as DBpedia: wikilink property. Homepages: The links to the homepages of entities such as companies and organizations extracted using this property. It is represented as foaf: homepage. are Categories: Wikipedia articles are arranged in categories, which are represented in DBpedia using the SKOS shared vocabulary.[7] Categories are represented as skos: concepts and the relationships between categories are represented as skos: broader. 101

Geo-coordinates: DBpedia contains coordinates extracted from info-boxes. Besides that, it also contains geo-coordinated for 1,094,000 geographic locations[8]. World Wide Web Consortium (W3C) Basic Geo Vocabulary is used for expressing the Geo-coordinates. The model uses Apache Jena Toolkit in JAVA and SPARQL as the query language to gather information from DBpedia datasets. The Jena API which is a free and open source Java framework for building Semantic web and Linked Data applications is used in this work. In Jena, a data structure called Model encapsulates all state information provided by a collection of RDF triples. The model, in principle, denotes an RDF graph as it contains a collection of RDF nodes attached to each other by labeled relations. The query language used is a fourth generation query language called SPARQL. It is "data-oriented" language as it only queries and reverts back the information held in the models. The language does not offer any inferencing or reasoning capability. In the proposed research work, Apache Jena 3.0 framework is used. The API provided by the Jena Framework are imported and used to construct SPARQL queries, call and return results from datasets from DBpedia. The queries are of the form: String querystr= "PREFIX dbo:<http://dbpedia.org/ontology/>"+"prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>" +"SELECT?x"+" WHERE { "+" <http://dbpedia.org/resource/india> rdfs:label?x. " +" FILTER (lang(?x) = 'en')}"; This query returns the label property value. String querystr= "PREFIX dbo:<http://dbpedia.org/ontology/>"+"prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>" +"SELECT?x"+" WHERE { "+" <http://dbpedia.org/resource/india> dbo:abstract?x. " +" }"; This query returns the abstract property value. Since, there is no language filter, the query returns all the available muti-lingual data. 102

Conclusion Jena is a promising development environment as it provides a powerful API to exploit the DBpedia datasets. But it is still not completely robust. It does not provide support to a number of properties like dbo:disambiguates, dbo:wikipageredirects, redirection links to images etc. This is not only a limitation with Jena but it is observed that not all localized editions provide fully robust publically available SPARQL endpoints, nor do all localized URI's reference. This is acceptable as the field of linked data is still in the development stages and as more and more local DBpedia research initiatives come up in different parts of the world, the effort to internationalize DBpedia will start to get success. DBPedia is a promising initiative towards making the World Wide Web intelligent and machine accessible. References [1] "https://jena.apache.org/," [Online]. [2] C. Bizer, J. Lehmannb, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak and S. Hellmann, "DBpedia-A Crystallization Point for the Web of Data," Web Semantics: Science, Services and Agents on the World Wide Web, pp. Volume 7, Issue 3, September 2009, Pages 154 165, September 2009. [3] S. Auer, C. Bizer, G. Kobilaro, J. L. R. Cyganiak and Z. Ives, "DBpedia: A Nucleus for a Web of Open Data," The Semantic Web, pp. 722-735, 2007 [4] G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer and R. Lee, "Media Meets Semantic Web How the BBC Uses DBpedia and Linked Data to Make Connections," The Semantic Web: Research and Applications. ESWC 2009. Lecture Notes in Computer Science, vol 5554. Springer, Berlin, Heidelberg, pp. 723-737, 2009. [5] LehmannJens, I. Robert, J. Max, J. Anja, K. Dimitris, M. P. N., H. Sebastian, M. Mohamed, K. P. van, A. Sören and B. Christian, "DBpedia A large-scale, multilingual knowledge base extracted from Wikipedia," Journal: Semantic Web, vol. 6, no. 2, pp. 167-195, 2015 [6] S. Hellmann, J. Lehman and S. Auer, "Learning of OWL class descriptions on very large knowledge bases," ISWC- PD'08 Proceedings of the 2007 International Conference on Posters and Demonstrations - Volume 401, pp. Pages 102-103, 2007 [7] "https://www.w3.org/2004/02/skos/," [Online]. [8] "http://wiki.dbpedia.org/services-resources/datasets/data-set-39," [Online]. 103

Fig 1 104