GoNTogle: A Tool for Semantic Annotation and Search

Similar documents
GoNTogle: A Tool for Semantic Annotation and Search

Integrating Keywords and Semantics on Document Annotation and Search

Domain-specific Concept-based Information Retrieval System

Semantic Cloud Generation based on Linked Data for Efficient Semantic Annotation

OWLS-SLR An OWL-S Service Profile Matchmaker

Ontology-based Architecture Documentation Approach

Semantic Searching. John Winder CMSC 676 Spring 2015

Context Sensitive Search Engine

Chapter 7 Semantically Enhanced Search and Browse

An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Information Retrieval

PatMan: A Visual Database System to Manipulate Path Patterns and Data in Hierarchical Catalogs

An Annotation Tool for Semantic Documents

Generation of Semantic Clouds Based on Linked Data for Efficient Multimedia Semantic Annotation

Ylvi - Multimedia-izing the Semantic Wiki

Information Retrieval (IR) through Semantic Web (SW): An Overview

A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

Semantic Web-Based Document: Editing and Browsing in AktiveDoc

The Documentalist Support System: a Web-Services based Tool for Semantic Annotation and Browsing

SWSE: Objects before documents!

An Archiving System for Managing Evolution in the Data Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Application of a Semantic Search Algorithm to Semi-Automatic GUI Generation

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

Enterprise Multimedia Integration and Search

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

ORES-2010 Ontology Repositories and Editors for the Semantic Web

Automated Online News Classification with Personalization

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

Ontology-Specific API for a Curricula Management System

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN

Automatic Ontology-Based Document Annotation for Arabic Information Retrieval

A Tagging Approach to Ontology Mapping

Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm

Using idocument for Document Categorization in Nepomuk Social Semantic Desktop

Ontology Refinement and Evaluation based on is-a Hierarchy Similarity

Heading-Based Sectional Hierarchy Identification for HTML Documents

Semantic Annotation: The Mainstay of Semantic Web

Context Ontology Construction For Cricket Video

Chapter 6: Information Retrieval and Web Search. An introduction

Co-occurrence and Ranking of Entities

Extracting knowledge from Ontology using Jena for Semantic Web

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

Evaluating find a path reachability queries

Using Data-Extraction Ontologies to Foster Automating Semantic Annotation

Comment Extraction from Blog Posts and Its Applications to Opinion Mining

Information Retrieval

NUS-I2R: Learning a Combined System for Entity Linking

SemSearch: Refining Semantic Search

User Guide. Version 1.5 Copyright 2006 by Serials Solutions, All Rights Reserved.

Collaborative Ontology Construction using Template-based Wiki for Semantic Web Applications

Study of Design Issues on an Automated Semantic Annotation System

Chapter 27 Introduction to Information Retrieval and Web Search

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Semantic Interoperability. Being serious about the Semantic Web

Exploring and Navigating Ontologies and Data A Work in Progress Discussion Jan 21 st, 2009

Benchmarking Textual Annotation Tools for the Semantic Web

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

ResPubliQA 2010

Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

Taxonomies and controlled vocabularies best practices for metadata

Information Retrieval

How to submit an assignment to Turnitin full student guide

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

ASTM. Your Portal for Standards, Testing, Learning & More

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

A Knowledge-Based System Using Semantic Categorization and Rating Mechanisms

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

OLLIE: On-Line Learning for Information Extraction

Semantic Annotation for Semantic Social Networks. Using Community Resources

What can be done with the Semantic Web? An Overview of Watson-based Applications

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

Presented by Kit Na Goh

Information Gathering Support Interface by the Overview Presentation of Web Search Results

ISSN: (Online) Volume 2, Issue 3, March 2014 International Journal of Advance Research in Computer Science and Management Studies

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

ABSTRACT I. INTRODUCTION

Chapter 6 Shared Ontology for Knowledge Management

Survey of Semantic Annotation Platforms

Text Mining and the. Text Mining and the Semantic Web. Semantic Web. Tim Finin. University of Maryland Baltimore County

Implementation of Semantic Information Retrieval. System in Mobile Environment

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha

Simple Searching with CAB Direct

Tag Recommendation for Photos

The drop down menu at the top of the page also provides option to browse journal titles by:

PRIOR System: Results for OAEI 2006

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Semantic Document Architecture for Desktop Data Integration and Management

LODatio: A Schema-Based Retrieval System forlinkedopendataatweb-scale

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

A conceptual model of trademark retrieval based on conceptual similarity

Usability of Annotation Tools: comparison and outlook

Theme Identification in RDF Graphs

Transcription:

GoNTogle: A Tool for Semantic Annotation and Search Giorgos Giannopoulos 1,2, Nikos Bikakis 1, Theodore Dalamagas 2 and Timos Sellis 1,2 1 KDBS Lab, School of ECE, NTU Athens, Greece. {giann@dblab.ntua.gr, bikakis@dblab.ntua.gr} 2 Institute for the Management of Information Systems, Research Center "Athena", Greece. {dalamag@imis.athena-innovation.gr, timos@imis.athena-innovation.gr} Abstract. This paper presents GoNTogle, a tool which provides advanced document annotation and search facilities. GoNTogle allows users to annotate several document formats, using ontology concepts. It also produces automatic annotation suggestions based on textual similarity and previous document annotations. Finally, GoNTogle combines keyword and ontology-based search, offering advanced ontology query facilities. Keywords: Document Annotations, Ontology based Retrieval, Hybrid Search, Semantic Search, Keyword Search, Ontology Search, Annotation Tool. 1 Introduction Semantic annotation and search tools are at the core of Semantic Web Technology [2, 3, 10]. Annotations involve tagging of data with concepts (i.e., ontology classes) so that data becomes meaningful. Annotating data can help in providing better search facilities, since it helps users to search for information not only based on the traditional keywordbased search, but also using well-defined general concepts that describe the domain of their information need. A great number of approaches on semantic annotation have been proposed in the literature. Most of them are focused on annotating web resources such as html pages [5, 6, 7]. As far as document annotation is concerned, there are approaches that differ in the annotation and search facilities they offer [1, 8, 9]. Especially GATE [8] is a platform that offers an architecture, a framework and a graphical tool for annotating, searching and generally managing document corpora. In this paper we present GoNTogle. GoNTogle supports manual and automatic annotation of several types of documents (doc, pdf, rtf, txt, odt, sxw) using ontology classes. It also provides searching facilities beyond the traditional keyword-based search, including ontology-based search with advanced semantic queries and combined search. In contrast with other works, our aim was to implement an easy-to-use document annotation and search tool, that would fully support (a) viewing and annotating popular document types while maintaining their initial format, (b) sharing those annotations and (c) searching for documents combining keyword and ontology search. The key features of our tool are the following: It allows users to open and view widely used document formats such as.doc and.pdf, maintaining their original format. It provides an easy and intuitive way of annotating documents (or document parts) using ontologies.

Figure 1. GoNTogle architecture It provides an automatic annotation mechanism based on user history, thus, offering personalization of annotations. It is based on a server-based architecture, where document annotations are reachable from all users. It combines keyword and ontology search, providing advanced search facilities for both types of search. 2 System Overview GoNTogle's architecture is presented in Figure 1. The system is divided into 4 basic components: a) Semantic Annotation Component provides facilities regarding the semantic annotation of documents. It consists of 3 modules: a Document Viewer, an Ontology Viewer and an Annotation Editor. b) Ontology Server Component stores the semantic annotations of documents in the form of OWL ontology instances. c) Indexing Component is responsible for indexing the documents using an inverted index. d) Search Component allows users to search for documents using both textual (keyword search) and semantic (ontology search) information. 2.1 Semantic Annotation Semantic Annotation Component offers 2 primary functionalities: (a) annotation of whole document and (b) annotation of parts of a document. Also, a user may select between manual and automatic annotation. Figure 2 depicts the Semantic Annotation window of our application. The user may open a document in the Document Viewer, maintaining its original format. In the particular example, the user has opened a.doc document. Moreover, she can load and view the hierarchy of an ontology through the Ontology Viewer. The user can, then, select one or more ontology classes and manually annotate the whole document or part of it. The annotation is saved as an ontology instance in the Ontology Server, along with information about the annotated document (or part). On the same time, an annotation instance is added in the Annotation Editor list. Each record of this list corresponds to an annotation stored in the Ontology Server. For example (Figure 2), the abstract of the document is annotated with class H.2.3_Languages- Query_Languages, while the whole document is annotated with class H.2_DATABASE_MANAGEMENT. The user can edit those annotation instances, adding or removing ontology classes, or completely remove them. Also, when she selects an annotation from the list (regarding a part of a document), the document scrolls to the corresponding part, which is highlighted with the same color as the annotation instance. The user may also choose the automatic annotation functionality. In this case, the system suggests candidate ontology classes, executing the weighted knn classification method [4].

Figure 2. Semantic annotation example Use of the automatic annotation feature requires that the system has previously been trained. The key points of our method follow: We gather all documents that have been previously annotated with one or more classes of an ontology. Those documents form the training dataset D. We create an inverted index I on D, storing both textual information about the documents and information about the classes they are annotated with. In this way, we relate textual information with ontology classes. When a user chooses to automatically annotate a document (or a part of it), the document's (part's) text dt is queried on index I. The k most similar documents are considered for further processing, forming dataset S. A ranked list of candidate annotation classes is presented to the user. Each class is given a score that combines (a) the textual similarity score between dt and the text of each document in S and (b) a score representing the extend to which each document in S is annotated with each class. The user may choose one or more suggested classes to conclude the automatic annotation process. 2.2 Search Search component provides a series of search facilities which are described below. We start with the simple ones: Keyword-based search. This is the traditional searching model. The user provides keywords and the system retrieves relevant documents based on textual similarity. Ontology-based search. The user navigates through the classes of the ontology and selects one or more of them. The result list from this type of search consists of all documents that have (partially or on their whole) been annotated with one or more of the chosen classes. Combined search. The user may search for documents using keywords and ontology classes. She can, also, determine whether the results of her search will be the intersection or the union of the two searches. Next we present a set of advanced searching facilities that can be used after an initial search has been completed. Find related documents. Starting from a result document d, the user may search for all documents which have been annotated with a class that also annotates d. Find similar documents. This is a variation of the previous search facility. Starting from a result document d, the user may search for all documents which are already in the result list and have been annotated with a class that also annotates d.

Figure 3. Search example Get Next Generation. The resulting list from an ontology-based search can be expanded by propagating the search on lower levels in the ontology (i.e., if class c has been used, then search is propagated only in direct subclasses of c). This is the case when the search topic is too general. Get Previous Generation. This offers the inverse functionality of the previous option. The resulting list from an ontology-based search can be expanded by propagating the search on higher levels in the ontology (i.e., if class c has been used, then search is propagated only in direct superclasses of c). This is the case when a search topic is too narrow. Proximity Search. This search option allows the user to search for documents not only belonging to a selected class but also belonging to the classes's subclasses. That is, if class c has been used, then search is propagated in all direct subclasses of c and their direct subclasses. The resulting documents gathered from those three levels of the ontology hierarchy are weighted properly (documents from the initial class c get higher score). All above facilities can also be used for searching only in the results of a previous search. Moreover, the user can narrow down her search to specific document types (i.e. only pdf documents). Finally, the user may open a result document and provide new annotations, based on the same ontology she used in her search. The resulting documents are presented in a ranked list. From the information presented in this list the user can find out whether a resulting document comes from keyword/ontology/combined search, and which classes it is annotated with (if any). Also, each result is accompanied by a score. This score is a weighted sum of (a) the textual similarity score given by keyword search and (b) a score that represents the extend to which each document is annotated with the classes selected in ontology search. 3 Implementation and Evaluation In what follows we provide technical information about the implementation of our system and we present a preliminary experiment we performed in order to evaluate the effectiveness of the automatic annotation process. Implementation. To compose our system, we utilized several open source tools and libraries. For indexing and keyword searching we used Lucene search engine library 1. We used Protégé 2 as Ontology Server, so that document annotations are stored as OWL class instances. 1 http://lucene.apache.org/java/docs/ 2 http://protege.stanford.edu/

Figure 4. Evaluation results for automatic annotation based on knn OpenOffice API 3 was essential in incorporating in our system a viewer that could maintain the exact format of.doc documents, which is a very common filetype. The same applies for Multivalent 4, a generalized document viewer that was integrated in our system so that.pdf files could also maintain their format when being viewed and annotated. All annotation and search facilities presented in this paper have been implemented in a Java prototype. Detailed information and application screenshots, as well as the application itself and installation instructions can be found in http://web.imis.athenainnovation.gr/~dalamag/gontogle. Evaluation. We have performed a preliminary evaluation of the semantic annotation tool. More specifically, we tested the precision and recall of the automatic annotation process. We turned the ACM Computing Classification 5 into an OWL ontology. The ontology produced is a 4-level structure with 1463 nodes. We used 500 papers, pre-categorized according to the ACM System, as sample documents for the ontology classes. Then, we used the tool to automatically annotate 66 papers from the publication database maintained in our lab 6. Figure 4 presents precision and recall diagrams for simple and weighted knn. Best values were observed for k=7. We observed better results when using weighted knn compaired to simple knn, so we adopted the former in our final system implementation. Future work. Our future work involves (a) incorporating support for more filetypes (html, e-mail files) (b) adding more advanced search facilities regarding ontology search and (c) performing experiments in large corpora. References 1. A. Duke, T. Glover, and J. Davies. "Squirrel: An advanced semantic search and browse facility". In Proceedings of the ESWC'07. 2. V. Uren, Y. Lei and E. Motta. "Refining Semantic Search". In Proceedings of the ESWC'08. 3. R. Bhagdev, S. Chapman, F. Ciravegna, V. Lanfranchi and D. Petrelli. "Hybrid Search: Effectively Combining Keywords and Semantic Searches". In Proceedings of the ESWC'08. 4. T.M. Mitchell. "Machine Learning". WCB/McGraw-Hill, 1997. 5. SMORE: Create OWL Markup for HTML Web Pages. http://www.mindswap.org/2005/smore/. 6. V. Lanfranchi, F. Ciravegna and D. Petrelli. "Semantic Web-Based Document: Editing and Browsing in AktiveDoc". In Proceedings of the ESWC'05. 7. A. Hogue and D. Karger. "Thresher: automating the unwrapping of semantic content from the World Wide Web". In Proceedings of the WWW'05. 8. H. Cunningham, D. Maynard, K. Bontcheva and V. Tablan. "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications". In Proceedings of the ACL'02. 9. A. Kiryakov, B. Popov, I. Terziev, D. Manov and D. Ognyanoff. "Semantic annotation, indexing, and retrieval". Journal of Web Semantics 2(1), 2004. 10. L. Reeve and H. Han. "Survey of semantic annotation platforms". In Proceedings of the ACM symposium on Applied computing, 2005. 3 http://api.openoffice.org/ 4 http://multivalent.sourceforge.net/ 5 http://www.acm.org/about/class/ 6 http://web.dbnet.ntua.gr/en/pubs.html