Semantic Indexing of Algorithms Courses Based on a New Ontology

Similar documents
Semantic Indexing of Technical Documentation

Toward a new Information retrieval system based on an e-commerce ontology

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

An ontological approach for modeling technical standards for compliance checking

Knowledge Engineering Models and Tools for the Digital Scholarly Publishing of Manuscripts

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

An Improving for Ranking Ontologies Based on the Structure and Semantics

AN APPROACH TO CLASSIFY SYNONYMS IN A DICTIONARY OF VERBS

Dmesure: a readability platform for French as a foreign language

Natural Language Processing with PoolParty

Information Retrieval and Knowledge Organisation

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

AROMA results for OAEI 2009

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

Using External Knowledge to Solve Multi-Dimensional Queries. FALQUET, Gilles, RADHOUANI, Saïd

XML Document Classification using SVM

Refinement and Formalization of Semi-Formal Use Case Descriptions

VISO: A Shared, Formal Knowledge Base as a Foundation for Semi-automatic InfoVis Systems

Extension and integration of i* models with ontologies

A Linguistic Approach for Semantic Web Service Discovery

Query Expansion Based on Ontology, Application. and Utilization Classes

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

Semantic-Based Information Retrieval for Java Learning Management System

2 Which Methodology for Building Ontologies? 2.1 A Work Still in Progress Many approaches (for a complete survey, the reader can refer to the OntoWeb

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Domain-specific Concept-based Information Retrieval System

Functional Blue Prints for the Development of a KMapper Prototype

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

A Constraint Programming Based Approach to Detect Ontology Inconsistencies

Making Sense Out of the Web

An ontology-based approach for semantics ranking of the web search engines results

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

OAEI 2017 results of KEPLER

Visual tools to select a layout for an adapted living area

Watson & WMR2017. (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself)

QAKiS: an Open Domain QA System based on Relational Patterns

Ontology integration in a multilingual e-retail system

Information Retrieval and Web Search

Enhanced retrieval using semantic technologies:

A Semantic Role Repository Linking FrameNet and WordNet

Linguistic annotation model. Introduction

Text Mining. Representation of Text Documents

SKOS. COMP62342 Sean Bechhofer

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca

Information mining and information retrieval : methods and applications

Ontology Creation and Development Model

Reading group on Ontologies and NLP:

Named Entity Detection and Entity Linking in the Context of Semantic Web

Ontologies SKOS. COMP62342 Sean Bechhofer

Falcon-AO: Aligning Ontologies with Falcon

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Web Information Retrieval using WordNet

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

Modeling Legal Documents as Typed Linked Data for Relational Querying

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Development of an Ontology-Based Portal for Digital Archive Services

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Knowledge Engineering with Semantic Web Technologies

Chapter 5, Analysis: Object Modeling

Text Document Clustering Using DPM with Concept and Feature Analysis

Exploring HeTOP, Exploring ICPC-2 on HeTOP. M ar c Jam oulle. August 24, Université de Liège, Département de médecine générale

MSc Advanced Computer Science School of Computer Science The University of Manchester

Dictionary Building with the Jibiki Platform: the GDEF case

VLANs. Commutation LAN et Wireless Chapitre 3

Text Mining. Munawar, PhD. Text Mining - Munawar, PhD

Integrating Ontologies into Distributed Multi-Agent System

Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF

Document Structure Analysis in Associative Patent Retrieval

Semantic Web. Tahani Aljehani

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

Improving Collaborations in Neuroscientist Community

Formal modelling of ontologies within Event-B

Using a Medical Thesaurus to Predict Query Difficulty

structure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame

NEW MODEL OF FRAMEWORK FOR TASK SCHEDULING BASED ON MOBILE AGENTS

Question Answering Systems

Key-Words: - A.M.A operation, Contextual exploration, Filimage system, scientific formulas extraction, semantic filtering.

Noida institute of engineering and technology,greater noida

Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal

Tutorial 4: Flow Artificial Intelligence

Ontology Based Prediction of Difficult Keyword Queries

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94

Query Difficulty Prediction for Contextual Image Retrieval

Ontology for Exploring Knowledge in C++ Language

Putting ontologies to work in NLP

Blazo Nastov. Journée des doctorant, Nîmes, France 19 June 2014

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Models versus Ontologies - What's the Difference and where does it Matter?

DEVELOPMENT OF ONTOLOGY-BASED INTELLIGENT SYSTEM FOR SOFTWARE TESTING

String Vector based KNN for Text Categorization

A Model for Information Retrieval Agent System Based on Keywords Distribution

MERGING BUSINESS VOCABULARIES AND RULES

XML ALONE IS NOT SUFFICIENT FOR EFFECTIVE WEBEDI

A Knowledge Model Driven Solution for Web-Based Telemedicine Applications

Keywords Data alignment, Data annotation, Web database, Search Result Record

Chapter 27 Introduction to Information Retrieval and Web Search

Typeful Ontologies with Direct Multilingual Verbalization

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Semantic reasoning for dynamic knowledge bases. Lionel Médini M2IA Knowledge Dynamics 2018

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN

Transcription:

Semantic Indexing of Algorithms Courses Based on a New Ontology EL Guemmat Kamal 1, Benlahmer Elhabib 2, Talea Mohamed 1, Chara Aziz 2, Rachdi Mohamed 2 1 Université Hassan II - Mohammedia Casablanca, Faculté des Sciences Ben M sik, Laboratoire de Traitement de l Information, Cdt Driss El Harti,BP 7955 Sidi Othman Casablanca, Maroc k.elguemmat@gmail.com, taleamohamed@yahoo.fr 2 Université Hassan II - Mohammedia Casablanca, Faculté des Sciences Ben M sik, Laboratoire de Technologie de l Information et Modélisation, Cdt Driss El Harti,BP 7955 Sidi Othman Casablanca, Maroc h.benlahmer@gmail.com, aziz.chara@hotmail.fr, mohamed.rachdi@yahoo.fr Abstract: Since the publication of the indexation documents and queries are represented by keywords from their content. The use of words to represent the document content and query generates several problems, the ambiguity of words and their disparity. The semantic indexing is as a solution that answers these problems. The goal is to index by the meaning of words rather than words. In a context where the ambiguity is present, the semantic indexing is meant to improve the performance of IRS (Information Retrieval System). In this sense we will soon overcome the problems of traditional indexing approaches. What we propose is a new approach that will allow semantically indexing algorithms courses written in French language, based on a new application ontology. The aim of our approach is to adjust a semantic annotation tool with the reference ontology. The semantic annotation tool we generate an index that will be used in e-learning as needed (question answering systems, information retrieval systems...) while improving performance on the field. Keywords: semantic indexing; algorithms courses; french language; ontology; e-learning. 1. Introduction The problem in e-learning is to facilitate the learning task to involve actors (learners, trainers...). we will continue in this direction to define a method of semantic indexing algorithms courses written in french language that will be used in e-learning. however several limitations confront this area. In terms of indexing has been several approaches but how to adopt as part of e-learning with a transparent, fluid method, adapting to the field of teaching algorithms. The field of information retrieval since the early 1950s [1] whose goal is to find documents (text, a piece of text, web page, image, video) relevant (the user must be able to find the information of which he needs) to a user request (express the need for information of a user), from a large database. The first problem that interested the researchers focused on the indexing of documents to find them, this research has known much progress, which is valued by the arrival of semantic indexing. The goal is to take advantage of semantic indexing, to overcome the limitations posed by traditional indexing of algorithms courses, based on a new ontology which we called OntAlgO and an approach that determines the document to be indexed, identifies key concepts and finally generates the index that characterizes the exact meaning of the course that will be operated in e- Learning as needed (question answering systems, information retrieval systems...). This article is distributed as follows: we define in the next section the issues of indexing, we identify the boundaries of traditional indexing and semantic indexing passage, the later section presents our contribution in the field we have presented our ontology(ontalgo) and our approach to index semantically algorithms courses, the last section will present conclusion and perspectives. 2. The Challenges of Indexing The purpose of indexing is to find representatives of the concept or the most important concepts in a document, 354

those represented in the re-use by an information retrieval system, for easier comparison representations of the query and the document [2]. Among the most important approaches to indexing, we find classic which will be explained with these strengths and weaknesses in Section 2.1 and the passage to semantic indexing will be presented in Section 2.2. 2.1.Classic Indexing Indexing can be done by different means: Manual: The document is analyzed by a human expert in the field, Automatic: a fully automated process, Semi-automatic: it is based primarily on automatic mode. However the final choice remains to the expert in the field to select the significant terms. The automatic indexing mode provides benefits, given its ability to automate the indexing process. It includes several treatments on the documents : automatic extraction of descriptors, the use of an antidictionary to remove function words, stemming, the identification of groups of words, the weighting of words before creating the index. This set of weighted terms will be used to form a representation of the contents of the document, these terms are organized into a representation It depends on the model of IR (Information Retrieval) that we use (Boolean models, vector models, probabilistic models) [3] Among the problems that confront the traditional indexing, the ambiguity of words and their disparity. Textual entities that represent documents and queries are specified by keywords from the content [4]: The ambiguity of words, called lexical ambiguity, refers to words and lexically identical with different grammatical functions, it is generally divided into two types, the syntactic ambiguity and semantic ambiguity. The disparity of words (word mismatch) refers to different lexical words have the same grammatical function. Various solutions are proposed to overcome the limitations of traditional indexing: As a solution to the problem of the ambiguity of words, is to use compound expressions ([5], [6]), to reduce ambiguity. Yet it is not always possible to provide a compound expression in the query that meets the desired direction, and the formulation of expressions requires a great effort from the user. Solution to the problem of disparity of words, is to expand the query using a thesaurus of synonyms [7]. To add a word in the query by its synonyms, we must not only know the word in the query, but also the word that is used to extend [8]. As part of IR, A new type of indexing appeared to overcome the limitations of traditional indexing, called semantic indexing, which will be explained in the next section. 2.2.Semantic Indexing The semantic indexing provides outcomes at the representation of documents and queries. This is a specialization of traditional indexing, according to [3], the goal is to index by the meaning of words rather than by words. In a context where the ambiguity is present, the semantic indexing is meant to improve the performance of IRS. The semantic indexing focuses on two main phase [4]: Disambiguation phase: find the correct meaning of each word in the document (respectively query). Representation phase: to represent the document (this query respectively). We have several approaches to disambiguation, we find those based on training corpus to compute the correct meaning of a word and there are others who rely on the exploitation of the local context and definitions from external linguistic resources such as dictionaries or computerized MRD (Machine Readable dictionary), thesaurus, ontologies, or a combination of them. Among the approaches to representation, we have either a representation based on the senses or a combined representation key-words /sense. Since the 90s, ontologies have became a research subject at the heart of different communities, including artificial intelligence, semantic web, software engineering, biomedical informatics, or the information architecture, etc.. The reason for this popularity is partly due to the fact that ontology is a controlled and organized vocabulary and corresponds to the explicit formalization of the relations created between the various vocabulary terms. On the other hand it offers a common and shared understanding of a domain, as well as human users and at the level of software applications [9]. In this respect our contribution presented in the next section, benefits the advantages of semantic indexing precisely those presented by the approach based on ontologies for indexing algorithms courses that will be used as part of e-learning. 355

3. Our Contribution in the Field Our contribution focuses on the semantic indexing of e- Learning resources (courses on algorithms). Our solution is based on the approach of ontology for semantically indexing algorithms courses written in french language, Section 3.1 presents the new ontology, OntAlgO, when created, however, Section 3.2 discusses our approach adopted for semantic indexing of algorithms courses. 3.1.Construction of Ontology OntAlgO The division of the domain knowledge of teaching used to classify the knowledge of a specific domain to be taught, this is possible through ontologies which play a crucial role since they model the knowledge through concepts, attributes and relationships are used to index content of the documents. Cutting of knowledge: Our teaching field (algorithms) will be organized around the following concepts: CoA (Concepts of Application): Are the keys concepts that models the algorithms courses. CoU (Concepts of Use): These are surface markers that model a set of knowledge elements to describe the use case of CoA; OntAlgO our ontology is obtained by the classification of knowledge about algorithms when extracted from the course ALGORITHMIQUE ET PROGRAMMATION NON-MATHEUX COURS COMPLET avec exercices, corrigés et citations philosophiques 1, form: Seven CoA (variable, test, boucle, tableau, fonction, procédure, tri) and four CoU (définition, syntaxe, types, exemple) Figure 1 shows our ontology OntAlgO implemented with Protégé 2 editor, where we present the ontology concepts. Figure 1 : OntAlgO implemented with Protégé. The semantic annotation language chosen to define the OntAlgO is OWL, recommended by the W3C 3 in February 2004 is the most expressive ontology language for the Web. It offers features that were not defined by other W3C languages, RDF and RDFS. The objective of ontology development is to implement it in an annotation tool. The formal description of our ontology is designed to prepare its integration into the annotation tool. Attributes and relationships of OntAlgO All the concepts CoA of the ontology have an attribute nom. The concepts CoU of the ontology have an attribute marqueur. The semantic relationships between CoA and CoU are defined by associative relations cf. Figure 2, for example the relationship between CoA variable and CoU définition will be déf de var. Figure 2 : Relation between CoA and CoU. 1 http://www.pise.info/algo/index.htm 2 http://protege.stanford.edu/ 3 http://www.w3c.org 356

Instantiating of OntAlgO The last step is to provide instances of classes in the hierarchy, Table 1 explicit an example for the CoA variable and CoU définition. Table 1 : Example of OntAlgO instantiation. Concept attribute values variable1 nom variable définition 1 marqueur est une définition 2 marqueur sont définition 3 marqueur on définit définition 4 marqueur on utilise définition 5 marqueur permet définition 6 marqueur nom de définition 7 marqueur un ensemble de All concepts CoA have as the value for the attribute nom the same value used to describe the concept. The concepts CoU have several description to identify the use cases of CoA. 3.2.Developed Approach The proposed approach, called indexing algorithms courses, aims to improve the relevance of IR, addresses a new method for disambiguation of semantic descriptors contained in the courses through our OntAlgO. Approach to Semantic Indexing algorithms courses Process followed by our approach cf. Figure 3: We identify firstly the algorithm course to index. An annotation tool processes the document and mark the first CoA before moving to CoU from the reference ontology OntAlgO with this form <CoU> <CoA> </ CoU>, the CoU will be limited between two points (sentence) or with the appearance of a new CoA. The annotation tool identifies the semantic relationship between the CoU and CoA in OntAlgO. Generation of the index. Figure 3 : Approach of Semantic Indexing algorithms courses. Example of semantic indexing of an extract from the course on algorithms Extract from the course on algorithms to index: «Pour employer une image, une variable est une boîte, que le programme (l ordinateur) va repérer par une étiquette. Pour avoir accès au contenu de la boîte, il suffit de la désigner par son étiquette.» Identification of CoA, CoU and the relationship between them by the annotation tool: CoA: variable and CoU: définition and relationship déf de var. <Définition>Pour employer une image, une <variable>variable</variable> est une boîte, que le programme (l ordinateur) va repérer par une étiquette</définition>. Pour avoir accès au contenu de la boîte, il suffit de la désigner par son etiquette. 357

Finally we have the index, cf. Figure 4, generated by the annotation tool which is based on OntAlgO. Figure4 : Example of index. 4. Conclusion and Outlook Our contribution, semantic indexing of algorithms courses, results a consistent index derived from this approach: The design of a new application ontology OntAlgO. Development of an approach to index algorithms courses. Our result has multiple perspective, in terms of operating over indexed courses, for: Search courses. Search complementary documentation. (1988), Department of Computer Science, Cornell University, Ithaca, New York, pp. 204 210. [7] Salton, G., Fox, E., and Wu, H. Extended Boolean information retrieval. Communications of the ACM, 26(12), 1983. [8] R. KROVETZ and W. B. CROFT. Lexical Ambiguity and Information Retrieval. ACM Transactions on Information Systems, Vol. 10, No 2, pp. 115_141. April 1992. [9] Florence Amardeilh, Web Sémantique et Informatique Linguistique : propositions méthodologiques et réalisation d une plateforme logicielle, DOCTORAT DE UNIVERSITE PARIS X NANTERRE, 2007. 5. References [1] Mooers, C.N., Application of Random Codes to the Gathering of Statistical Information, MIT Master's Thesis, 1948. [2] Catherine roussey, une méthode d indexation sémantique adaptée aux corpus multilingues, DOCTORAT DE Institut national des sciences appliquées de lyon, 2001. [3] Mustapha BAZIZ, INDEXATION CONCEPTUELLE GUIDEE PAR ONTOLOGIE POUR LA RECHERCHE D INFORMATION, DOCTORAT DE INSTITUT DE RECHERCHE EN INFORMATIQUE DE TOULOUSE, 2005. [4] Fatiha BOUBEKEUR-AMIROUCHE, Contribution à la définition de modèles de recherche d'information flexibles basés sur les CP-Nets, DOCTORAT DE L UNIVERSITÉ DE TOULOUSE, 2008. [5] Fagan, Joel L. 1987. Experiments in Automatic Phrase Indexing for Document Retrieval : A Comparison of Syntactic and Non-syntactic methods, PhD thesis, Dept. of Computer Science, Cornell University, Sept. 1987. [6] Salton, G. Syntactic approaches to automatic book indexing. In Proc. of the annual meeting on Association for Computational Linguistics (ACL) 358