A Technique for Automatic Construction of Ontology from Existing Database to Facilitate Semantic Web

Similar documents
Domain Specific Semantic Web Search Engine

A New Semantic Web Approach for Constructing, Searching and Modifying Ontology Dynamically

Today: RDF syntax. + conjunctive queries for OWL. KR4SW Winter 2010 Pascal Hitzler 3

Semantic Web In Depth: Resource Description Framework. Dr Nicholas Gibbins 32/4037

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

Web Science & Technologies University of Koblenz Landau, Germany. RDF Schema. Steffen Staab. Semantic Web

FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES

A New Approach to Design Graph Based Search Engine for Multiple Domains Using Different Ontologies

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

RDF AND SPARQL. Part III: Semantics of RDF(S) Dresden, August Sebastian Rudolph ICCL Summer School

Semantic Web Technologies

Using RDF to Model the Structure and Process of Systems

Semantic Web Fundamentals

Proposal for Implementing Linked Open Data on Libraries Catalogue

Opus: University of Bath Online Publication Store

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

RDF /RDF-S Providing Framework Support to OWL Ontologies

OSM Lecture (14:45-16:15) Takahira Yamaguchi. OSM Exercise (16:30-18:00) Susumu Tamagawa

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

From the Web to the Semantic Web: RDF and RDF Schema

Semantic Web Technologies: RDF + RDFS

Semantic Web: vision and reality

Extracting knowledge from Ontology using Jena for Semantic Web

Ontological Modeling: Part 2

An Introduction to the Semantic Web. Jeff Heflin Lehigh University

BUILDING THE SEMANTIC WEB

An RDF-based Distributed Expert System

XML and Semantic Web Technologies. III. Semantic Web / 1. Ressource Description Framework (RDF)

The Semantic Web. Mansooreh Jalalyazdi

Semantic Web Knowledge Representation in the Web Context. CS 431 March 24, 2008 Carl Lagoze Cornell University

Knowledge Representation for the Semantic Web

Integrating RDF into Hypergraph-Graph (HG(2)) Data Structure

RDF Schema. Mario Arrigoni Neri

RDF. Mario Arrigoni Neri

Web Science & Technologies University of Koblenz Landau, Germany RDF. Steffen Staab. Semantic Web

Ontology Creation and Development Model

Semantic Web Mining and its application in Human Resource Management

The Resource Description Framework and its Schema

Helmi Ben Hmida Hannover University, Germany

Outline RDF. RDF Schema (RDFS) RDF Storing. Semantic Web and Metadata What is RDF and what is not? Why use RDF? RDF Elements

Resource Description Framework (RDF)

Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper

Semantic Information Retrieval: An Ontology and RDFbased

Seman&cs)of)RDF) )S) RDF)seman&cs)1)Goals)

Semantic Web Fundamentals

Adding formal semantics to the Web

Semantic Web. RDF and RDF Schema. Morteza Amini. Sharif University of Technology Spring 90-91

Semantic web. Tapas Kumar Mishra 11CS60R32

LECTURE 09 RDF: SCHEMA - AN INTRODUCTION

Describing Structure and Semantics of Graphs Using an RDF Vocabulary

RDF Schema. Philippe Genoud, UFR IM2AG, UGA Manuel Atencia Arcas, UFR SHS, UGA

Knowledge Representation for the Semantic Web

Semantic Web Domain Knowledge Representation Using Software Engineering Modeling Technique

RDF(S) Resource Description Framework (Schema)

SEMANTIC WEB AN INTRODUCTION. Luigi De

The Semantic Planetary Data System

Semantic Web Engineering

Library of Congress BIBFRAME Pilot. NOTSL Fall Meeting October 30, 2015

Semantic Web Systems Introduction Jacques Fleuriot School of Informatics

Information Retrieval (IR) through Semantic Web (SW): An Overview

The Semantic Web & Ontologies

JENA: A Java API for Ontology Management

Mustafa Jarrar: Lecture Notes on RDF Schema Birzeit University, Version 3. RDFS RDF Schema. Mustafa Jarrar. Birzeit University

Introduction to Semantic Web

Chapter 2 SEMANTIC WEB. 2.1 Introduction

RDF. Charlie Abela Department of Artificial Intelligence

11 The Semantic Web. Knowledge-Based Systems and Deductive Databases Knowledge Representation Knowledge Representation

Business Rules in the Semantic Web, are there any or are they different?

Developing markup metaschemas to support interoperation among resources with different markup schemas

KNOWLEDGE GRAPHS. Lecture 3: Modelling in RDF/Introduction to SPARQL. TU Dresden, 30th Oct Markus Krötzsch Knowledge-Based Systems

An Argument For Semantics

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

2. RDF Semantic Web Basics Semantic Web

SWAD-Europe Deliverable 8.1 Core RDF Vocabularies for Thesauri

Semantic Web and Electronic Information Resources Danica Radovanović

Mapping between Digital Identity Ontologies through SISM

OWL a glimpse. OWL a glimpse (2) requirements for ontology languages. requirements for ontology languages

Semantic-Based Web Mining Under the Framework of Agent

Lecture Telecooperation. D. Fensel Leopold-Franzens- Universität Innsbruck

New Approach to Graph Databases

Semantic Integration with Apache Jena and Apache Stanbol

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

The RDF Schema Specification Revisited

Knowledge Representation in Social Context. CS227 Spring 2011

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services.

Chapter 13: Advanced topic 3 Web 3.0

Semantic Web. MPRI : Web Data Management. Antoine Amarilli Friday, January 11th 1/29

Design and Implementation of an RDF Triple Store

Formalizing Dublin Core Application Profiles Description Set Profiles and Graph Constraints

Linked Data: What Now? Maine Library Association 2017

Unit 2 RDF Formal Semantics in Detail

Introduction to Ontologies

Abstract: In this paper we propose research on how the

Web Services: OWL-S 2. BPEL and WSDL : Messages

On practical aspects of enhancing semantic interoperability using SKOS and KOS alignment

Transforming Data from into DataPile RDF Structure into RDF

Multi-agent and Semantic Web Systems: RDF Data Structures

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 4, Jul-Aug 2015

Metadata Common Vocabulary: a journey from a glossary to an ontology of statistical metadata, and back

Transcription:

10th International Conference on Information Technology A Technique for Automatic Construction of Ontology from Existing Database to Facilitate Semantic Web Debajyoti Mukhopadhyay, Aritra Banik, Sreemoyee Mukherjee Web Intelligence & Distributed Computing Research Lab, Techno India Group West Bengal University of Technology EM 4/1, Salt Lake Sector V, Calcutta 700091, India {debajyoti.mukhopadhyay, aritrabanik, m.sreemoyee}@gmail.com Abstract In this paper we describe a technique for automatic construction of ontology, a semantic representation of a conceptualization, for any domain, organization or enterprise whose Web site is existent with quiet a big mine of information. An user interactive system is developed with the help of which any organization, which has a Web site to maintain data, but has no corresponding semantic mark up, can automatically construct the ontology and further use that to model domain knowledge and make use of various Semantic Web applications like Semantic Web search engine etc. The automated construction of ontology will facilitate further the growth, popularity and usage of the Semantic Web and Semantic Web technologies as well as serve as a basis for Artificial Intelligence reasoning. 1.Introduction Tim Berners-Lee, the inventor of the World Wide Web, defines the Semantic Web as The Web of data with meaning in the sense that a computer program can learn enough about what the data means [in order] to process it (Berners -Lee 1999) [12]. The Semantic Web is a set of technologies that envisions the existence of knowledge on the Web in a format that software applications can understand and reason about [3]. Thus, it would be possible to make knowledge more accessible to software and make software intelligent enough to understand knowledge and create new knowledge. Some elements of the Semantic Web are expressed as prospective future possibilities that have yet to be implemented or realized. Other elements of the Semantic Web are expressed in formal specifications. Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and notations such as RDF Schema (RDFS) and the Web Ontology La nguage (OWL) [2][10]. All of which are intended to formally describe concepts, terms, and relationships within a given knowledge domain. It is thus a mesh of information linked up in such a way as to be easily processable by machines, on a global scale thus making the World Wide Web a universal medium for data, information, and knowledge exchange. Thus, the Semantic Web is a framework that allows publishing, sharing, and reusing data and knowledge on the Web and across applications, enterprises, and community boundaries. Currently, the Semantic Web, consisting of the Semantic Web documents typically encoded in the languages RDF and OWL, runs parallel to the web of HTML documents. The Semantic Web essentially involves publishing data in a language called Resource Description Framework (RDF), specifically for data, so that it can be manipulated and combined just as can data files on a local computer [2]. RDF and the Web Ontology Language (OWL) which are ontology based procedures for representing knowledge on the web, providing useful features beyond those available in ordinary XML, allowing users to define terms (for example, classes and properties), express relationships among them, and specify constraints and axioms that hold for well-formed data. Representing knowledge using RDF first requires proper specification of that knowledge, if possible in a hierarchical manner, that is, describing the ontology. The Semantic Web is mostly built on syntaxes which use URIs or Uniform Resource Identifiers to represent data, which can be held in databases, reused and interchanged over the World Wide Web. One such set of syntaxes developed for achieving the above mentioned purposes is called the Resource Description Framework" while another set of such syntaxes form the Web Ontology Language or OWL which is also used similarly to represent machine understandable knowledge. Therefore for any 0-7695-3068-0/07 $25.00 2007 IEEE DOI 10.1109/ICIT.2007.24 240 246

domain which wants to utilize Semantic Web technologies to improve performance and inculcate new features, requires corresponding semantic mark up which is nothing but the description of the domain ontology using RDF. Ontological representations based on subtype relationships provide the necessary knowledge base for AI techniques and representations for a variety of Semantic Web applications. 2. RDF Before going into details of Resource Description Framework or RDF, it is essential to know the basis of RDF, that is, ontology. Ontology is nothing but a specification of a conceptualization [2]. A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose [2]. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly. It involves explicit representation of knowledge about some topic. Ontologies can enhance the functioning of the Web in many ways. They can be used in a simple fashion to improve the accuracy of Web searches the search program can look for only those pages that refer to a precise concept instead of all the ones using ambiguous keywords. More advanced applications will use ontologies to relate the information on a page to the associated knowledge structures and inference rules. Problem-solving methods, domain-independent applications, and software agents use ontologies and knowledge bases built from ontologies as data. Ontologies can be domain specific as well. Domain specific ontologies describe a specific domain by defining different terms applicable to that domain and specifying the interrelationship between them. Several domain ontologies can also be merged for a more general representation. Our work mainly involves development of a domain specific ontology described via one of the most common ontology language RDF. Development of ontology has several utilities. Firstly it helps in understanding an area of knowledge better, enables multiple machines to share the knowledge and use the knowledge in various applications. Resource Descripton Framework is nothing but a language to represent the ontology of an area of knowledge. The Resource Description Framework (RDF) is a general purpose language for representing information on the web thus enabling exchange and reuse of structured metadata [9]. It provides a common framework for describing and interchanging metadata as well as well defined machine understandable semantics for metadata [5][9]. The XML syntax provides vendor independence, user extensibility, validation, human readability, and the ability to represent complex structures [10]. By exploiting the features of XML, RDF imposes structure that provides for the unambiguous expression of semantics and, as such, enables consistent encoding, exchange, and machine-processing of standardized metadata. In fact, by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web. RDF is mainly intended for development of machine understandable information, that is, information that will be processed by applications rather than just for display. RDF comes with a common framework for expressing information so that information exchange can take place unhindered, without any loss of meaning. The ability to exchange information signifies that information can be made available to applications other than those for which it was created. RDF provides a model for describing resources called the RDF data model which consists of directed labeled graphs and model elements resource, properties, values. RDF defines a resource as any object described by RDF statements, each uniquely identifiable by a Uniform Resource Identifier (URI) and it may be an entire Web page or a part of a Web page [5][9]. On the other hand, A property is a specific aspect, characteristic, attribute, or relation used to describe a resource W3C, Resource Description Framework (RDF) Model and Syntax Specification [4]. Note that a property is also a resource since it can have its own properties and all properties have specific values which may be atomic in nature (text strings, numbers, etc) or other resources which in turn may have their own properties. An RDF statement combines a resource, a property and a value [4]. These three individual parts are known as the subject, predicate and object [8]. For example, Mr. Black works in Company X is an RDF statement having the following parts: Subject (Resource) Predicate (Property) Object (value) Mr. Black worksin Company X This can be represented as a directed labeled RDF graph as follows: Resource Mr. Black worksin Property An RDF statement Value Company X Fig. 1. An RDF directed labeled graph 241 247

The corresponding RDF code is as follows: <?xml version="1.0"?> <employee rdf:id="mr. Black" xmlns:rdf="http://www.w3.org/1999 /02/22-rdf-syntax-ns#" xmlns="http://www.westbengal.org/ crops#"> <worksin>company X</worksin> </employee> This RDF code describes the resource Mr. Black which is an instance of the class Employee, has a property worksin whose value is Company X Separate RDF code depicts the class subclass relationships according as the RDF schema structure. RDF Schema is a simple data-typing model for RDF [6] so that we can describe groups of related resources and the relationships among these resources. For example, we can say Mr. Black is an instance Employee and Employee is a subclass of Organisation. The purpose of RDF schema is to express classes and their (subclass) relationships as well as to define properties and associate them with classes. The benefit of an RDF Schema is that it facilitates inferencing on the data, and enhanced searching. Resources can be divided into classes which are composed of instances. A class itself is also a resource which is usually identified by RDF URI References and can be described by RDF properties. We often use the prefix rdfs: to indicate the term is RDF Schema term. rdfs:resource is the root class of everything in RDF Schema. rdf:type is an instance of rdf:property (class of RDF properties), and it means that a resource is an instance of a class. The property rdfs:subclassof is an instance of rdf:property that is used to state that a class is a Subclass of the other. The following RDF code shows that Organisation is the super-class and Employee is a subclass of Organisation. <?xml version="1.0"?> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/0 2/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/ 01/rdf-schema#" xml:base="http:// www.companies.org/organisation#"> <rdfs:class rdf:id="employee"> <rdfs:subclassof rdf:resource="#organisation "/> </rdfs:class>...... </rdf:rdf> Another important RDF technique that we have utilized in our design methodology is the specification of the RDF domain and range. Domain is used to indicate the classes that a property will be used with. One may specify zero, one, or multiple rdfs:domain properties. Range is used to indicate the type of values that a property will contain. One may specify zero, one, or multiple rdfs:range properties [4][5]. The following RDF code snippet shows the usage of domain and range. It describes an RDF property manager whose domain is the Employee class i.e. it can be used with this class and its range is the Department class i.e. it can take values of the type Department only. <rdf:property rdf:id="seasonreqd"> <rdfs:domain rdf:resource="#vegetable"/> <rdfs:range rdf:resource="#season"/> </rdf:property> Apart from the above mentioned basic features, RDF also provides means to express collection of items rdf:bag, rdf:seq, rdf:alt, rdf:list rdf:rest etc. Some other commonly used RDF syntaxes are rdf:subject (to state the subject of a statement), rdf:predicate (to state predicate of a statement), rdf:object (to state the object of a statement), rdfs:isdefinedby (to indicate a resource identifying the subject resource), rdf:member (to indicate a member of a subject resource), rdf:comment (a description of a subject resource) etc. RDF additionally provides means for publishing both humanreadable and machine-processable vocabularies. Vocabularies are the set of properties, or metadata elements, defined by resource description communities. The standardized vocabulary declaration greatly encourages the reuse and extension of semantics across different information communities as well. Hence we can easily describe resources along with their properties, value of those properties, range, domain and also indicate interrelationship between the resources. Our work involves the automatic generation of this RDF coding from the existing data sources of any organization. 3. Our Approach for Ontology Construction As has been mentioned before, for an increased usage of the Semantic Web technologies, it is essential that more and more domains are described using ontology languages like 242 248

RDF or OWL but most of the existing domains are developed using popular web development tools like HTML, jsp, servlet, asp or.net coding. Domains fully described using ontology languages are quite rare. This prompted us to devise a method and implementation of automatic ontology constructor software. A rather popular approach in this context is conversion of Web-pages to RDF or OWL, the two most widely used efficient ontology representing languages. Web-pages are a combination of some HTML, jsp, servlet or some asp or.net coding. RDF is a language which tells us I know what it means, you tell me how it should look. On the other hand HTML is basically a schema which tells us how it (Web-page) looks with no relation to its meaning whatsoever. So it would be trying to convert a schema to a meaningful data source which is impossible in practical sense. Further more, if we assume it to contain several paragraphs of textual data, it would be a really time consuming work to understand the proper meaning from the data and extract sufficient information to construct an RDF page. We used a term proper meaning which again can give rise to considerable confusion as meaning of any word varies according to the context of its usage. Some words have a meaning in one domain and something else in another domain. For example the word bridge carries different meanings in networking domain and in construction domain. Complete information about a domain or organisation is essential to develop automated ontology construction software for that domain. We notice that the problem with Web-pages is that their data is not well defined so that a dumb like a computer can understand its meaning. Only in one place data is well defined that is a database or to be more specific a relational database. Here inter relation between data is considerably well defined. So we decided to concentrate on this and it resulted in an extraordinary finding. If we consider a table as a class, it solves our problem. EmpId Name Address PhoneNo DeptNo E001 Xyz Xyz 0000000000 D001 E002 Xyz Xyz 0000000000 D002 E003 Xyz Xyz 0000000000 D001 Fig. 2. Employee table Say we are trying to build ontology for a company, some of whose database tables are Employee, Department, Inventory etc. designed for that organization. According to our development methodology, these relational database table names become the RDF class names of the RDF we intend to develop. Now, instances of the Employee class should be the different employees working for that organization. Every relational database has a primary key. Primary keys help in the unique identification of data in a database. Say in the Employee table the primary key is Employee_ID attribute. Hence in the final RDF code to be generated from this table, each Employee_ID that stands for a unique employee becomes the instance. Other attributes such as name, address etc. become the properties of each instance generated. The RDF code corresponding to this table would be as follows: <EMP rdf:id="e001"> <NAME> XYZ </ENAME> <Address> xyz </Address> <PhoneNo> 0000000000</PhoneNo> <DeptNo> D001 </DeptNo> <EMP rdf:id="e002"> <NAME> XYZ </ENAME> <Address> xyz </Address> <PhoneNo> 0000000000 </PhoneNo> <DeptNo> D002 </DeptNo> <EMP rdf:id="e003"> <NAME>XYZ</ENAME> <Address> xyz </Address> <PhoneNo> 0000000000 </PhoneNo> <DeptNo> D001 </DeptNo> Now the question comes how to create properties from a database. Properties relate two classes. In this context we can say properties relate two tables. Now, it is known that, in RDBMS two tables are connected via foreign keys. In previous example DeptNo is the foreign key. So in terms of ontology we can say DeptNo is a property whose domain is Department and range is Employee. The corresponding RDF code would be as follows: <rdf:property rdf:id ="DEPTNO"> <rdfs:domain rdf:resource="#department"/> <rdfs:range rdf:resource="#employee"/> </rdf:property> Now the DeptNo property of the Employee class becomes a pointer or url of the Department class. So the new structure would look like this: Error! <EMP rdf:id="e001"> <NAME>XYZ</ENAME> <Address>xyz</Address> <PhoneNo>0000000000</PhoneNo> <DeptNo>http://www.XYZcompany.org/ Department#d001</DeptNo> 243 249

Screenshots of automatic ontology generator and class structure creator are shown in Figure 3 and Figure 4. The entire application has been developed using Java Swing. It gets connected to the database of the organization and produces corresponding RDF coding considering each and every table present in the database and their interrelationships. 4. Conclusion Fig. 3. Screenshot of the Automatic Ontology Generator To find the class subclass interrelationships, a graphical user interface has been developed which prompts the user to enter the possible class subclass relationships and the software automatically generates the corresponding RDF codes. A screenshot of the class structure creator is as follows: The technique described above is entirely new and can be largely helpful in increasing the use of and popularizing of the Semantic Web techniques and taking up Semantic Web techniques would lead better exchange, reuse and storage of structured metadata. In the present WWW where most of the data is unstructured and HTML based, the use of Semantic Web applications is largely hindered. Populating the World Wide Web with sufficient metadata will pave the way for upcoming new and efficient technologies. First, searching on the web will become easier as search engines have more information available, and thus searching can be more focused. Useful technologies enabling automated software agents to roam the web can be developed, which would not only be searching for information but also would be transacting business on our behalf. The web of today, the vast unstructured mass of information, in the future can be transformed into something more manageable - and thus something far more useful. This brings up the requirement of efficient techniques to generate machine understandable information, that is, ontological representation of existing unstructured information. One such technique, developed by us utilizes the existing huge source of information in the form of relational databases and generates RDF coding for the data. This generated code can be used for multiple purposes, an example being Semantic Web search engines. Search engines which search for information based on the meaning of the entered data, can successfully utilize the generated semantic information in the form of RDF. Hence such a fast and efficient methodology for the automatic generation of ontology in the form of RDF from the existing Web-sites would lead to the generation of more and more machine interpretable information resulting in an unprecedented and effective growth of the upcoming and promising Semantic Web. 5. References [1] F. Manola and E. Miller, eds. 2002; RDF Primer, W3C. http://www.w3.org/tr/rdf-primer. [2] T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition,5(2):199-220, 1993. Fig. 4. Screenshot of the Class Structure Creator [3] T. Berners-Lee et al 2001. The Semantic Web. Scientific American. May 2001. 244 250

[4] W3C, RDF Vocabulary Description Language 1.0: RDF Schema, W3C Working Draft 23 January 2003, http://www.w3.org/tr/rdfschema/. [5] W3C, Resource Description Framework (RDF) Model and Syntax Specification, W3Crecommendation February 1999, http://www.w3.org/tr/1999/rec-rdf-syntax-19990222. [6] An Introduction to the Resource Description Framework, Eric Miller Research Scientist D -Lib Magazine May 1998. [7] Sean B Palmer, The Semantic Web: An Introduction, 2001 [8] David E. Goldschmidt and Mukkai Krishnamoorthy; Architecting a Search Engine for the Semantic Web. [9] Tim Bray, What is RDF? http://www.xml.com/lpt/a/2001/01/24/rdf.html. [10] W3C, RDF Primer, W3C Working Draft 23 January 2003, http://www.w3.org/tr/2003/wd-rdf-primer-20030123/ [11] Interview with Tim Berners-Lee, Business Week, April 2007. The article, published in the same issue, also refers to numerous quotes from Tim. [12] Berners-Lee.T,1999. Weaving the Web:The Original Design and Ultimate Destiny of the World Wide Web by its Inventor,New York:Harper SanFrancisco. [13] OWL Web Ontology Language Guide, W3C Recommendation, http:// www.w3.org/tr/oel-guide/. [14] A Guide to Creating Your First Ontology. Natalya F. Noy and Deborah L. McGuinness. 245 251