Measuring The Degree Of Similarity Between Web Ontologies Based On Semantic Coherence

Similar documents
Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Solving Semantic Ambiguity to Improve Semantic Web based Ontology Matching

A Comprehensive Analysis of using Semantic Information in Text Categorization

Improving Suffix Tree Clustering Algorithm for Web Documents

Web Information Retrieval using WordNet

WATSON: SUPPORTING NEXT GENERATION SEMANTIC WEB APPLICATIONS 1

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Evolva: A Comprehensive Approach to Ontology Evolution

A Tagging Approach to Ontology Mapping

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94

Dynamic Ontology Evolution

What can be done with the Semantic Web? An Overview of Watson-based Applications

Multi-Modal Data Fusion: A Description

Ontology Based Search Engine

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

XETA: extensible metadata System

Ontology Modularization for Knowledge Selection: Experiments and Evaluations

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Computer-assisted Ontology Construction System: Focus on Bootstrapping Capabilities

Towards Rule Learning Approaches to Instance-based Ontology Matching

A Novel Architecture of Ontology based Semantic Search Engine

A conceptual model of trademark retrieval based on conceptual similarity

Latest development in image feature representation and extraction

Ontology Extraction from Heterogeneous Documents

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

A Semantic Role Repository Linking FrameNet and WordNet

Ontology Generation from Session Data for Web Personalization

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Pattern Mining in Frequent Dynamic Subgraphs

Ontology-Based Web Query Classification for Research Paper Searching

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96

Annotation Component in KiWi

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

The HMatch 2.0 Suite for Ontology Matchmaking

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

Testing the Impact of Pattern-Based Ontology Refactoring on Ontology Matching Results

SemSearch: Refining Semantic Search

Theme Identification in RDF Graphs

ImgSeek: Capturing User s Intent For Internet Image Search

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures

Measuring Semantic Similarity between Words Using Page Counts and Snippets

Making Sense Out of the Web

Results of NBJLM for OAEI 2010

NATURAL LANGUAGE PROCESSING

Text Document Clustering Using DPM with Concept and Feature Analysis

Domain-specific Concept-based Information Retrieval System

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

ANALYTICS DRIVEN DATA MODEL IN DIGITAL SERVICES

Efficient Discovery of Semantic Web Services

Open Research Online The Open University s repository of research publications and other research outputs

Automatic Wordnet Mapping: from CoreNet to Princeton WordNet

Cluster-based Instance Consolidation For Subsequent Matching

SEMANTIC WEBSERVICE DISCOVERY FOR WEBSERVICE COMPOSITION

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Enterprise Multimedia Integration and Search

Refining Ontologies by Pattern-Based Completion

Contributions to the Study of Semantic Interoperability in Multi-Agent Environments - An Ontology Based Approach

An Improving for Ranking Ontologies Based on the Structure and Semantics

Text Mining: A Burgeoning technology for knowledge extraction

Research Article. ISSN (Print) *Corresponding author Zhiqiang Wang

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Tag Based Image Search by Social Re-ranking

Visualizing semantic table annotations with TableMiner+

Remote Monitoring System of Ship Running State under Wireless Network

ORES-2010 Ontology Repositories and Editors for the Semantic Web

Leopold Franzens University Innsbruck. Ontology Learning. Institute of Computer Science STI - Innsbruck. Seminar Paper

MERGING BUSINESS VOCABULARIES AND RULES

OWLS-SLR An OWL-S Service Profile Matchmaker

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Fausto Giunchiglia and Mattia Fumagalli

Software Architecture Recovery based on Dynamic Analysis

Schema Quality Improving Tasks in the Schema Integration Process

NUS-I2R: Learning a Combined System for Entity Linking

Requirements Engineering for Enterprise Systems

Extracting knowledge from Ontology using Jena for Semantic Web

Deep Web Crawling and Mining for Building Advanced Search Application

A Study on Metadata Extraction, Retrieval and 3D Visualization Technologies for Multimedia Data and Its Application to e-learning

Improving the Performance of the Peer to Peer Network by Introducing an Assortment of Methods

MEASUREMENT OF SEMANTIC SIMILARITY BETWEEN WORDS: A SURVEY

Introduction to Text Mining. Hongning Wang

analyzing the HTML source code of Web pages. However, HTML itself is still evolving (from version 2.0 to the current version 4.01, and version 5.

Content Based Image Retrieval system with a combination of Rough Set and Support Vector Machine

Evaluating Three Scrutability and Three Privacy User Privileges for a Scrutable User Modelling Infrastructure

An Approach for Accessing Linked Open Data for Data Mining Purposes

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

Gap analysis of ontology mapping tools and techniques

ijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System

Towards the Automatic Creation of a Wordnet from a Term-based Lexical Network

Simulating a Finite State Mobile Agent System

Context Ontology Construction For Cricket Video

GrOnto: a GRanular ONTOlogy for Diversifying Search Results

Measuring and Evaluating Dissimilarity in Data and Pattern Spaces

Client Collaborator with Multimedia

MATRIX BASED INDEXING TECHNIQUE FOR VIDEO DATA

Transcription:

Measuring The Degree Of Similarity Between Web Ontologies Based On Semantic Coherence ABHIK BANERJEE, HAREENDRA MUNIMADUGU, SRINIVASA RAGHAVAN VEDANARAYANAN, LAWRENCE J. MAZLACK Applied Computational Intelligence Laboratory University of Cincinnati, Ohio 45220 UNITED STATES banerjak@mail.uc.edu, munimaha@mail.uc.edu, vedanasn@mail.uc.edu, mazlack@uc.edu Abstract: - The Internet comprises of a variety of websites, which both individually and in clusters generate large amounts of information. In order to make web pages machine-understandable we need a formal, explicit specification. This is provided by a Web Ontology. The importance of domain ontologies is widely recognized, particularly in its relation to the expected advent of the Semantic Web. For the task of detecting and recovering relevant ontologies, a means to measure the similarity between ontologies becomes a binding necessity on a vary large scale. The purpose of this paper is to describe a method that will effectively recognize and categorize different ontologies of the same domain and find the degree similarity between them to provide a framework for a research that can effectively provide a scope for merging the ontologies that relate to a similar concept in a domain. Key Words: - Ontology, merging, comparison, coherence, semantic web. 1 Introduction The Internet comprises of a variety of websites, which both individually and in clusters generate large amounts of information. It is up to human users to effectively and efficiently extract the information by having the machine do the work for us and fetch that information. In order to make web pages machine-understandable we need formal, explicit specifications. This is provided by a Web Ontology (from here on just Ontology will be used in place of Web Ontology). An illustration of knowledge by a group of concepts within a particular domain and the equivalent relationships between such concepts are called Ontology. Its main application is to rationale about the characteristics of that domain, and may be used to depict the domain [1]. Ontology is an important upcoming discipline and has great potential to enhance information management [2] [9]. The importance of ontologies pertaining to a particular domain is widely recognized, particularly in relation to the growth of Semantic Web. For the task of detecting and recovering relevant ontologies, a means to measure the similarity between ontologies becomes a binding necessity on a humungous scale [3]. The motivation also may include cases such as, If a person wants to find the right community with which he will be more comfortable to communicate, identification of similarity between ontologies (communities) can be of great benefit [3] [4]; a major application of this is to employ it in categorizing communities in social networks such as Facebook, Orkut, Twitter [5] [10]. In a major industry in the Web scenario - ontology engineering, it is supportive to find ontologies that are similar so that they can be easily used in tandem with other ones [3]. For illustration, when creating an ontology for astrophysics analysis, it would be better to find ISSN: 1792-4251 584 ISBN: 978-960-474-213-4

both astronomy and physics ontologies that can be used in sync with each other [3]; In fresh and lucrative areas such as semantic search engines, where context-based search is considered the best, they return ontologies in response to a query, it would be valuable to introduce [3] [6] module that would find akin ontologies abstracted as a key. Distances can also be used for sorting responses to such a query that in consequence will lead to ontology grading, in lieu to ontology proximity [3] [7]. The purpose of this paper is to find the similarity measure between two ontologies and to provide a framework for future research for merging the ontologies that relate to a similar domain. The objective of this research is to recognize and categorize similar ontologies and subsequently produce a common consolidated ontology framework for a particular domain. Our central hypothesis is to use WordNet to find the semantic coherence between two nodes in two different ontologies. We make use of the lexical similarity, semantic similarity and tree transformation costs for attaching a value to the degree of similarity between 2 ontologies. The hypothesis has been formulated on the basis of the fact that a WordNet gives all the meanings of a particular word and also its synonyms. 2 Problem Formulation In several assorted areas such as prearranged databases expressed basically in text form, experimental biology, compiler optimization, and image investigation degree of similarity has been well evaluated [8]. The problem focused in the paper is to find the degree of similarity between two ontologies from the same domain. The long term goals of this research is to recognize and categorize similar ontologies, detect their domain and consequently provide a larger framework for data integration, so that, we can perform better analysis and data mining on global data to achieve coherent and highly specific, trustworthy results for user queries. In these researches edit cost (or edit distance) from one tree to another one is employed to measure similarity degree of two trees. Nevertheless, the basis of such ideas are mainly concentrated on discovering matches rooted in the structural or geometric perspective without considering the conceptual semantics of the tree nodes in the framework of knowledge [8]. A close to comprehensive study about the similarity between ontologies was carried out by Maedche [3]. In their research, ontology had a tree-like structure that would be used to model a concept in the form of a taxonomy [8]. A method was developed to measure the similarity between ontologies based on the ideas of lexicon, reference functions, and semantic cotopy [11]. The scheme was built on the hypothesis that the same terms will be used in different ontologies for concepts but their relative positions with respect to each ontology may vary according to that application or that user s priority [8] [12]. In such cases computing the taxonomic overlap cannot be fully achieved and evaluation on a lexical level becomes almost impossible [8]. The structural characteristics of trees, which are in a way crucial to discovery of similarity, were not taken into account by this research. 3 Problem Resolution Starting with a particular domain of ontologies is expected to decrease complexity with retaining the main issue of this research. The Tourism domain will be used for testing our hypothesis. Before we can continue with our testing we want to get our data defined and also get our different ontological trees generated and represented in a particular format. We have decided to choose a group of undergraduate students from the industrial engineering department. We will ask the students a screening test, which will consist of questions such as number of miles traveled, number of countries visited, mode of transportation, and so forth. Initially, we plan to use only those students who have traveled for more than 1000 miles, visited at least 3 countries and utmost of 5 countries and mode of transportation to these countries should have been through commercial flights. We will take a minimum of 3 groups of 25 students each based on the above criteria to design individual ontologies for the tourism domain. There are forms of knowledge representation that includes semantic nets, frames, rules, etc. ISSN: 1792-4251 585 ISBN: 978-960-474-213-4

We decided to represent our Ontological trees in the form of frames. We chose to use the Protégé tool to develop the ontological frames. The undergraduate students selected for the experimental setup will be given several hours training on the use of Protégé and how to use the interfaces for the creation of frames. Fig 1 shows one of the interfaces that will be used to develop these frames for knowledge representation. nodes that constitute the two ontological trees with the predefined threshold values. 3.1.1 Lexical Comparison Level At this level, we compare nodes that have lexically similar names. To find the lexical similarity, we use the concept of edit distance. The edit distance is used to measure the minimum number of operations that have to be performed which may be insertions, deletions, and substitutions in order to transform one string to another. The similarity measure is calculated based on this edit distance given by the following equation: Lexical Similarity Measure: (LSM)=(s-c)/s; [3] (1) Where, s - length of the shortest string and c- number of changes required to transform one string to another (edit distance). Fig 1. (Interface for designing frames in Protégé) After the ontological trees are developed, we will then compare the two ontologies at a time by running it through our system comprising of the lexical, semantic and tree transformation phase. To test the robustness of our system, the tests and subjects will be varied by choosing a different domains in a similar manner. 3.1 Testing For testing our hypothesis we define a 3 step approach, that comprise of Lexical comparison level Semantic comparison level Calculating the Transformation cost The three testing procedures are iteratively run for all the nodes in both the ontological trees and the numerical values are compared for all the Fig 2. Two ontologies A and B depicting university structure [8] The nodes that have similar names are taken into consideration from the corresponding on- ISSN: 1792-4251 586 ISBN: 978-960-474-213-4

tologies. For example, in Fig 2, for nodes in ontology A, University A has best match with University B, LSM = (12-1)/12 = 0.91 Employee has the best match with Employee in Ontology B, LSM = 1 College has a bad match with any node, consequently, LSM = 0 3.1.2 Semantic Comparison Level At the semantic level, we incorporate our approach of using WordNet to try and find the concept of the words that were found similar or not completely similar at the lexical level. WordNet can be suitable option to find the semantic similarity between nodes. Though the WordNet resembles the structure of thesaurus in many forms, one of the differences between them is that, apart from expressing the words in the forms of concepts, WordNet also tries to find the relationships between the words and the concepts. WordNet uses the concept of synsets as well as using the concept of synonyms. Hypernyms and hyponyms can also be treated as synsets. Hyponyms and hypernyms can be understood by a simple example; we say that car is a kind of vehicle. Here car is a hyponym and vehicle is a hypernym. Synsets are defined as words that have a similar meaning and also can be substituted in a sentence in place of each other without changing the actual meaning of the proposition. So, we can substitute the words car and vehicle in a sentence interchangeably, such as if we say that The car is traveling at a speed of 100 miles per hour can also be represented as The vehicle was traveling at a speed of 100 miles per hour. We suggest the idea, that besides comparing the lexical similarity of the two similar nodes in two different ontological tree structures, we also use WordNet to find whether the semantic meaning of the two nodes matches. We will write a function that will compare two words to see if they form synset pairs, taking into consideration the corresponding hypernyms and hyponyms. The function we define will check if the two pairs of words (nodes/concepts) from the two ontological trees are a pair of synsets by comparing them to the WordNet dictionary. This function will return a numerical value between 0 and 1, where a value of 1 implies complete semantic coherence between the two word pairs. Considering our example from Fig 1, the WordNet would ideally return College and School to be semantically coherent. So this can be used to re-label College to School in Ontology A. 3.1.3 Calculating the Transformation Cost We define operations such as insertion of a node, deletion of a node, moving a node, relabeling a node as transformations that when done to one ontology, would closely resemble the other ontology in consideration. We assign a transformation cost to each of these transformation operations and compute the net transformation cost. The cost of insertion or deletion is given by the expression [8], Ci/d = [(h-d) +1+ D] / V (2) Where, h - height of the tree (no. of levels), d - depth of the given node in the tree, V - number of nodes in the tree before insertion/deletion, and D - number of descendants to that node after it is inserted. Deletion and insertion are exemplified by Fig 3 that portrays a conversion from Ontology A to Ontology B: The costs for deletion and insertion are: Deletion of Professor node at level 4: C d = [(4-4) +1+0] / 5 = 0.2 (3) Insertion of Professor node at level 3: C i = [(3-3) +1+0]/5 = 0.2 (4) The cost of moving a node is given by the expression [8], Cm = [0.5*(D+I)*(V-2)] / 2 (5) Where, D - cost of deletion, I t he cost of insertion, and V - number of nodes in the tree before insertion/deletion. From equations (3) and (4), we have: Cm = [0.5*(0.2+0.2)*(5-2)] / 2 = 0.3 ISSN: 1792-4251 587 ISBN: 978-960-474-213-4

Fig 3. Deletion and insertion of node Professor [8] The re-labeling operation is useful when labels of nodes do not match between two concepts or ontologies. The cost of re-labeling is dependent on the semantic similarity between two given concepts denoted by s. The cost of re-labeling is assigned the same value as returned by the function defined in the semantic comparison level. The structure of the two ontologies after re-labeling is illustrated by Fig 4. Fig 4. Ontology B transformed to ontology A 3.2 Threshold Setting After comparing the two ontological trees we obtain three numerical values for each of our three steps. The numerical values from the lexical, semantic and transformation cost are compared to a predefined threshold value. For the case of lexical and semantic comparison, if the numerical values are greater than the threshold value we constitute the result to be true. In the case of calculation of transformation cost if the numerical value obtained is less than the threshold value we assume the result of the test to be true. Deciding the threshold value for the tests is a difficult question and also an important one. We take a range of threshold values for each of the three tests and repeat our experiments with different inputs in the same domain as well as inputs from various domains. With each iteration, we decide if the threshold values for each of the tests needs to be increased or decreased. After setting our threshold values for each of the tests, we run our experiment on various input ontology sets for various domains. 4 Conclusion The results obtained as a result of the three tests performed are a sufficient measure to decide if two ontological trees or two ontological structures are similar or not. The lexical and the semantic comparison of the nodes gives us an idea if the two ontologies has the same concepts defined, and the cost of transformation of the trees defines the cost that is required to transform one ontology to another. Future goals of the research include the merging of the two ontologies once we conclude that the two ontologies are similar. A merged ontology for each domain will be more efficient, maintainable and error free. It will help in removing ambiguity between the various web ontologies available. The future benefits of having a merged ontology includes finding common communities in various social networking sites, semantic searches over the wide span of data spread over the internet and to consolidate ontologies over various coherent research areas like astrophysics, geophysics and many others. ISSN: 1792-4251 588 ISBN: 978-960-474-213-4

References:- [1] http://www.answers.com/topic/ontologycomputer-science [2] Chen, E.; Wu, G. 2005. An Ontology Learning Method Enhanced by Frame Semantics. In ISM 2005: Proceedings of the Seventh IEEE International Symposium on Multimedia, 374-382. [3] Maedche, A.; Staab, S. 2002. Measuring Similarity Between Ontologies. In EKAW '02: Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, Springer-Verlag, London, UK, 251-263. [4] Jung, J.; Euzenatl J. 2007 Towards Semantic Social Networks. In Proc. 4th European Semantic Web Conference, Innsbruck (AT), volume 4519 of Lecture Notes in Computer Science, 267 280. [5] Jung,J.; Zimmermann, A.; Euzenat.; J. 2007. Concept-Based Query Transformation Based On Semantic Centrality In Semantic Peer-To-Peer Environment. In Proc. Advances in Data and Web Management, Joint 9th Asia-Pacific Web Conference (APWeb) and 8th International Conference, on Web- Age Information Management (WAIM), Huang Shan (CN), volume 4505 of Lecture Notes in Computer Science, 622 629. [6] D Aquin, M.; Claudio Baldassarre, C.; Gridinoc, L.; Angeletou, S.; Sabou, M.; Motta; E. 2007. Watson: A Gateway For Next Generation Semantic Web Applications. In Proc. Poster session of the International Semantic Web Conference (ISWC), Busan [7] Alani, H.; Brewster, C.; 2005. Ontology Ranking Based On The Analysis Of Concept Structures. In Proc. 3rd International conference on Knowledge Capture (K-Cap), Banff, 51 58. [8] Xue, Y., Wang, C., Ghenniwa, H.H., Shen, W. 2009. A Tree Similarity Measuring Method And Its Application to Ontology Comparison. J.JUS 15, 1766-1781. [9] Jorge Gracia, Vanessa Lopez, Mathieu D Aquin, Marta Sabou, Enrico Motta, Eduardo Mena. 2007. Solving Semantic Ambiguity To Improve Semantic Web Based Ontology Matching. In Proc. 2nd ISWC Ontology matching workshop (OM), Busan, 1 12. [10] Khelif, K., Gandon, F.L., Corby, O., Dieng- -Kuntz, R. 2008. Using The Intension Of Classes And Properties Definition In Ontologies For Word Sense Disambiguation, Knowledge Engineering: Practice and Patterns, 16th International Conference, EKAW 2008, Acitrezza, Italy, September 29 - October 2, 2008, 188-197. [11] Ermolayev, V.; Keberle, N., Matzke,W. 2008 An Upper Level Ontology Model for Engineering Design Performance Domain, Proc 27th International Conference on Conceptual Modeling (ER 2008), Barcelona, Spain, October 20-24. [12] Liu, B., Zhang, H., Yang, X. 2008. GY- RTI: An Integrated Distributed Simulation Environment. In Proc. IEEE International Conference on Networking, Sensing and Control (ICNSC 2008), 232-235. ISSN: 1792-4251 589 ISBN: 978-960-474-213-4