WikiOnto: A System For Semi-automatic Extraction And Modeling Of Ontologies Using Wikipedia XML Corpus

Size: px
Start display at page:

Download "WikiOnto: A System For Semi-automatic Extraction And Modeling Of Ontologies Using Wikipedia XML Corpus"

Transcription

1 2009 IEEE International Conference on Semantic Computing WikiOnto: A System For Semi-automatic Extraction And Modeling Of Ontologies Using Wikipedia XML Corpus Lalindra De Silva University of Colombo School of Computing Colombo, Sri Lanka lalindra84@gmail.com Lakshman Jayaratne University of Colombo School of Computing Colombo, Sri Lanka klj@ucsc.cmb.ac.lk Abstract This paper introduces WikiOnto: a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus of one of the largest knowledge bases in the world - the Wikipedia. Based on the Wikipedia XML Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well. Keywords-Ontology, Wikipedia XML Corpus, Ontology Modeling, Ontology Extraction I. INTRODUCTION With the growing issue of information overload in the present world and with the emergence of the semantic web, the importance of ontologies has come under the spotlight in recent years. Ontologies, commonly defined as an explicit specification of a conceptualization provides an agreed upon representation of concepts of a domain for easier knowledge management [1]. However, the modeling of ontologies is generally considered to be a painstaking task, often requiring the knowledge and expertise of an ontology engineer who is well versed in the concerned domain. Previous efforts in building ontology development environments were successful in their comprehensiveness of the tools they provided to the user, but it still involved a lot of manual work when building ontologies using these environments [2]. Additionally, most of the efforts to extract ontologies were focused on using textual data as their sources. As such, in this research, we have looked into extracting ontologies and modeling them using the Wikipedia XML Corpus [3] as the source, with a futuristic view of extending this research into using other semi-structured data domains as the sources for building ontologies 1. The research efforts and the applications that resulted from those research in the area of ontology development, extraction and ontology learning are many and diverse. With the light of these attempts, section 2 takes a look at related prior research to construct topic ontologies using various sources as input. In section 3, we describe our source - the Wikipedia XML Corpus - and the structure of the documents in the corpus, along with recognition as to why it is an ideal source for such research. The framework of our system consists of three major layers (Figure 1). As such, section 4 looks at in detail how this three-tiered framework is implemented. Finally, we will present the ontology development environment with both the current and proposed facilities that are/will be available to the users of the system. II. RELATED WORKS The research efforts to extract ontologies have spanned over multifarious domains in terms of the input sources they use. A large number of these researches have focused on using textual data as their input while a lesser number of research have focused on semi-structured data and relational databases. [4] presents a method in which the words that appear in the source texts are ontologically annotated using the WordNet [5] as the lexical database. The authors have primarily worked on assigning verbs that appear in source articles into ontological classes. Several other researchers have utilized the XML schemas as a starting point for the creation of ontologies. In [6], the authors have used the ORA-SS (Object-Relationship-Attribute Model for Semistructured Data) model to initially determine the nature of the relationship between the elements of the XML schema and subsequently to create a genetic model for organizing the ontology. [7] gives a detailed survey of some of the work that has been carried out in terms of learning ontologies for the semantic web. [8] presents a framework for ontology 1 This work is based on the continuing undergraduate thesis of the first author titled A Machine Learning Approach To Ontology Extraction and Evaluation Using Semi-structured Data /09 $ IEEE DOI /ICSC

2 extraction for document classification. In this work, the author has experimented on the Reuters collection and the Wikipedia database dump as her sources and produces a methodology in which concepts can be extracted from such large document bases. Other tools for extracting ontologies from text such as ASIUM [9], OntoLearn [10] and Text- ToOnto [11] have also been around. However, the biggest motivation for our research spans from OntoGen [12]. Onto- Gen is a tool that enables semi-automatic generation of topic ontologies using textual data as the initial source. We have utilized the best features of the OntoGen methodology while extending our research into the semi-structured data domain. Consequently, we have prposed to enhance our system with additional concept extraction mechanisms such as lexico syntactic pattern matching, which the OntoGen project has not incorporated. III. WIKIPEDIA XML CORPUS The Wikipedia, being one of the largest knowledge bases in the world and the most popular reference work in the current World Wide Web, has inspired so many people and projects in diverse fields including knowledge management, information retrieval and ontology development. The reliability of the articles in the Wikipedia is often criticized as it is a knowledge base that can be accessed and made modifications to by anyone in the world. However, for the most part, the Wikipedia is considered to be reliable enough to be used in non-critical applications. The Wikipedia XML Corpus is a collection of documents derived from the Wikipedia which is being used in a large number of Information Retrieval and Machine Learning tasks at present research communities. The corpus contains articles in eight languages and the latest collection contains approximately 660,000 articles in English language that correspond to articles in the Wikipedia. Each XML document in the corpus is uniquely named (e.g xml) and has the following uniform element structure: (e.g xml representing the article on Ball ) <article> <name id="3928">ball</name> <body> <p>a <emph3>ball</emph3> is a round object that is used most often in xlink:href="26853.xml">sport </collectionlink>s and xlink:href="11970.xml">game </collectionlink>s. </p> <section> <title>popular ball games</title> There are many popular games <normallist> <item> xlink:href="3850.xml">baseball </collectionlink> </item> <item> xlink:href="3812.xml">basketball </collectionlink> </item> </normallist> </section> <languagelink lang="cs">m?</languagelink> <languagelink lang="de">ball</languagelink> </section> </body> </article> Each document is contained within the article tag within which lies a limited number of predefined element tags that correspond to specific relations between text segments in the document. The name element carries the title of the article while title elements within each section elements correspond to the titles of different subsections. The normallist elements contain lists of information while the collectionlink and unknownlink elements correspond to other articles referred from within the document. In addition to the documents, the corpus provides category information which hierarchically organizes the articles in the relevant categories. Using this category information, we were able to develop the initial segment of our system where the user can choose a certain number of articles from the interested domain to be initially input to the system. This selection of domain specific document set enables better targeted and expedite ontology creation as opposed to using the entire corpus as the input. IV. WIKIONTO FRAMEWORK As previously mentioned, the WikiOnto system is implemented in a three-tiered framework. The following sections explain each layer in detail. A. Concept Extraction Using Document Structure In this layer, we make use of the structure of the documents to deduce substantial information about the taxonomic hierarchy of possible concepts in that document. After careful review of many documents in the corpus, we have established the following assumptions in extracting concepts from the documents. Word phrases contained within the name, title, item, collectionlink and unknownlink elements are proposed as concepts to the user (e.g. in the previous example of the XML file, the words Ball, Sport, Game, Popular Ball Games, Baseball and Basketball are all suggested to the user as concepts) 572

3 Figure 1. Design of the WikiOnto System The word phrases contained within title elements in each of the first-level sections are suggested as subconcepts of the concept within the name element of the document (e.g. Popular Ball Games is a subconcept of the concept Ball ) The word phrases contained within a title element nested within several section elements is suggested as a sub-concept of the title concept of the immediate section above it Any word phrase wrapped inside collectionlink or unknownlink elements are suggested as sub-concepts of the concept immediately above it in the structure of the XML document (e.g. the concepts Baseball and BasketBall are sub-concepts of the concept Popular Ball Games while the concepts Sport and Game are sub-concepts of the concept Ball ) In the initialization of the system, the user is given the opportunity to define the maximum number of words that can be contained in a potential concept. Once the user inputs the documents that are to be used as the sources to extract concepts (either using the domain-specific document selector explained earlier, choosing manually or inputting the entire corpus 2 ), the system iterates through all the documents to extract the potential concepts and their relationships according to the assumptions listed earlier. The ontology modeling environment (section 5) explains how the user can refine these relationships after examining the concepts (i.e. how the user can label the relationships as hyponomic, hypernimic, meronomic relations etc). In order to validate the concepts that are extracted and suggested, we have incorporated WordNet [5] into our system. When all the concepts are extracted from the documents as per the above assumptions, each and every concept is matched morphologically, word-by-word with WordNet. If even a single word in a word phrase cannot be morphologically matched with WordNet (owing to the reasons such as being a foreign word, a person s name, place name, etc), the whole word phrase is withheld from being added to the concept collection and is presented to the user at the end of the processing stage along with the rest of such concepts. 2 Due to computer resource restraints, the system is yet to be tested with the full corpus (approx. 660,000 articles) as input 573

4 The user has the full discretion to decide whether these concepts should be added to the concept collection or not. At the end of this processing stage, the user is able to query for a concept in the collection and start building an ontology with the queried concept as the root concept. The system automatically populates first level concepts and according to the user s needs, will populate the additional levels of concepts as well (i.e. initially the ontology includes only the immediate concepts that have a sub-concept relationship with the root concept. With the user s discretion, the system will treat each concept in the first level as a parent concept and add its child concepts to it). B. Concepts From Keyword Clustering As the second layer of our framework, we have implemented a keyword clustering method to identify the concepts related to a given concept. There have been several welldocumented approaches and metrics for extracting keywords and measuring their relevance to the documents. However, we have used the well known TFIDF measure [13], which is generally accepted to be a good measure of the relevance of a term in a document and have represented each document using the vector-space model with the use of these keywords. The TFIDF measure is the product between the Term Frequency (TF) and the Inverse Document Frequency (IDF) defined as follows: TF t,d = n t,d k n k,d TF t,d :the number of times a word t appears in a document d n :the frequency of term t in the document d k :the number of distinct words in document d D IDF i = log {d : t i d} IDF i :the log of the fraction between the number of documents in the corpus and the number of documents in which the word t appears D :the number of documents in the corpus TFIDF i,d =TF i,d IDF i In selecting keywords in the document collection, we have defined and removed all the stopwords (words that appear commonly in sentences and have little meaning with regard to the ontologies, such as prepositions, definite and indefinite articles, etc) and extracted all the distinct words in the document collection to build the vector-space model for each document. Afterwards, calculating the TFIDF measure for each and every word in the word list within the individual documents and imposing a threshold value, we were able to identify the keywords that correspond to a given document. With this vector-space representation of each document, the user is then given the opportunity to group the documents in to a desired number of clusters. This is achieved through the k-means clustering algorithm [14] described below. k-means Algorithm: 1) Choose k cluster centers randomly from the data points 2) Assign each data point to the closest cluster center based on the similarity measure 3) Re-compute the cluster centers 4) If the assignment of data points have changed (i.e. the process has not converged) repeat from step 2 The similarity measure used in our approach is the cosine similarity [15] between two vectors which is defined as follows: If A and B are two vectors in our vector space: Cosine Similarity = cos (θ) = A.B A B Once these clusters have been formed, the user selects a concept in the raw ontology that is taking shape and the system provides suggestions for that concept (excluding the concepts already added as sub-concepts of that concept). In achieving this, the system looks for the cluster that contains the highest TFIDF value for that word (in the case where a concept consists of more than one word, the system looks for all the clusters where each word is located) and suggests the keywords using two criteria. 1) Suggest the individual keywords of the document vectors of that cluster 2) Suggest the highest valued keywords in the centroid vector of that cluster. The centroid vector is the single vector comprised by summing the respective elements of all the vectors C. Concepts From Sentence Pattern Matching In enhancing the accuracy and the comprehensiveness of the ontologies constructed through our system, we have proposed to implement a sentence analyzing module to work alongside the previous two layers. Several research attempts at identifying lexical patterns within text files have been proposed and we intend to utilize the best of these methods to enable the users of our system to enhance the ontologies they are constructing. A popular method for acquiring hyponymic relations was presented by [16] and we have begun to implement a similar approach in our system. The motivation for such syntactic processing comes from the fact that there is a significant number of common sentence patterns appearing in the Wikipedia XML Corpus and this 574

5 Figure 2. WikiOnto Ontology Construction Environment will allow the system to provide better suggestions to the user in constructing the ontology. In implementing this layer, we will incorporate a Part-Of- Speech tagger and with the help of the POS-tagged text, we intend to extract relations as follows. Sentences like A ball is a round object that is used most often which appeared in the example XML file earlier are evidence of common sentence patterns in the corpus. In this instance, A ball is a round object matches with the pattern {NP} is a {NP} in the POS-tagged text where NP refers to a noun phrase. Several other patterns such as {NP} including {NP,}*and {NP} (e.g. All third world countries including Sri Lanka, India and Pakistan ) are candidates for very obvious relations and this should enable us to make more comprehensive suggestions to the user. Again, these candidate word phrases will be validated against WordNet and the decision to add them to the ontology will completely lie at the user s discretion. V. THE GRAPHICAL MODELING ENVIRONMENT We have used C# as our language of choice for this system and piccolo2d for the visualization of the ontology editor. The user is given the flexibility to add, delete concepts from the taxonomic hierarchy and also the capability to rename relations and concepts according to their discretion (Figure. 2). In the beginning of the ontology contsruction process, the user has the ability to query for a concept, irrespective of the fact that that concept exists in the concept collection extracted initially. The system will generate the OWL definition for the ontology being built and the user has the ability to export the ontology in this standard format. VI. EVALUATION AND FUTURE WORK Owing to the reason that the project is still continuing, a thorough evaluation of the system seems distant. However, the initial prototype was tested among the undergraduates and several faculty members of the authors university and we received commendable feedback and several requests for additional features for the system. Additionally, since the ontology being generated, for the most part, depends on the user s choices, a standard evaluation mechanism seems impractical. The most well-known evaluation methods such as comparing with a gold standard (an ontology that is accepted to be defining the targeted domain accurately) or testing the results of the ontology in an application seems inappropriate given the dependance of the system upon the user. Owing to the reason that our system is a facilitator rather than a fully-fledged ontology generator, we plan to evaluate the system through user trials towards the end of the project. With the progress of this project we have furthered our 575

6 expectations of the potential applications that can make use of a system that takes semi-structured data and extracts topic ontologies from them. Especially, we are focusing on how the proposed project can be extended to other document sources, such as the Reuters Collection [17]. VII. CONCLUSION In this paper, we have introduced WikiOnto: a system for extracting and modeling topic ontologies using the Wikipedia XML corpus. Through detail explanations, we have presented our methodology of the system where we have proposed a three-tiered approach to concept and relation extraction from the corpus as well as a development environment for modeling the ontology. The project is still continuing and is expected to produce successful outcomes in the area of ontology extraction as well as spring up further research in ontology extraction and modeling. We expect to make the system and the source code available for free. The extendability for other document sources is still being tested and is the reason why it was left out from the paper. We hope to make the results announced with enough time so that the final system will be available for the this paper s intended audiance. REFERENCES [1] T. R. Gruber, A translation approach to portable ontology specifications, Knowl. Acquis., vol. 5, no. 2, pp , [10] P. Velardi, R. Navigli, A. Cucchiarelli, and F. Neri, Evaluation of OntoLearn, a methodology for automatic population of domain ontologies, in Ontology Learning from Text: Methods, Applications and Evaluation, P. Buitelaar, P. Cimiano, and B. Magnini, Eds. IOS Press, [11] A. Maedche and S. Staab, Semi-automatic Engineering of Ontologies from Text, in Proceedings of the 12th International Conference on Software Engineering and Knowledge Engineering, [12] Ontogen: Semi-automatic ontology editor, in Human Interface, Part II, HCII 2007, M. Smith and G. Salvendy, Eds., 2007, pp [13] G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, in Information Processing and Management, 1988, pp [14] A. K. Jain, M. N. Murty, and P. J. Flynn, Data clustering: A review, [15] G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, in Information Processing and Management, 1988, pp [16] M. A. Hearst, Automatic acquisition of hyponyms from large text corpora, in In Proceedings of the 14th International Conference on Computational Linguistics, 1992, pp [17] M. Sanderson, Reuters Test Collection, in BSC IRSG, [Online]. Available: citeseer.ist.psu.edu/sanderson94reuters. html [2] Ontoprise, Ontostudio, , (accessed ). [3] L. Denoyer and P. Gallinari, The Wikipedia XML Corpus, SIGIR Forum, [4] S. Tratz, M. Gregory, P. Whitney, C. Posse, P. Paulson, B. Baddeley, R. Hohimer, and A. White, Ontological annotation with wordnet. [5] C. Fellbaum, Ed., WordNet An Electronic Lexical Database. Cambridge, MA ; London: The MIT Press, [Online]. Available: item/default.asp?ttype=2&tid=8106 [6] C. Li and T. W. Ling, From xml to semantic web, in In 10th International Conference on Database Systems for Advanced Applications, 2005, pp [7] B. Omelayenko, Learning of ontologies for the web: the analysis of existent approaches, in In Proceedings of the International Workshop on Web Dynamics, [8] N. Kozlova, Automatic ontology extraction for document classification - masters thesis, Saarland University, Germany, Tech. Rep., February [9] D. Faure, C. Ndellec, and C. Rouveirol, Acquisition of semantic knowledge using machine learning methods: The system asium, Universite Paris Sud, Tech. Rep.,

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

Enabling Semantic Search in Large Open Source Communities

Enabling Semantic Search in Large Open Source Communities Enabling Semantic Search in Large Open Source Communities Gregor Leban, Lorand Dali, Inna Novalija Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana {gregor.leban, lorand.dali, inna.koval}@ijs.si

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Question Answering Approach Using a WordNet-based Answer Type Taxonomy Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

OntoGen: Semi-automatic Ontology Editor

OntoGen: Semi-automatic Ontology Editor OntoGen: Semi-automatic Ontology Editor Blaz Fortuna, Marko Grobelnik, and Dunja Mladenic Department of Knowledge Technologies, Institute Jozef Stefan, Jamova 39, 1000 Ljubljana, Slovenia {blaz.fortuna,

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

Context Sensitive Search Engine

Context Sensitive Search Engine Context Sensitive Search Engine Remzi Düzağaç and Olcay Taner Yıldız Abstract In this paper, we use context information extracted from the documents in the collection to improve the performance of the

More information

Motivating Ontology-Driven Information Extraction

Motivating Ontology-Driven Information Extraction Motivating Ontology-Driven Information Extraction Burcu Yildiz 1 and Silvia Miksch 1, 2 1 Institute for Software Engineering and Interactive Systems, Vienna University of Technology, Vienna, Austria {yildiz,silvia}@

More information

Using ART2 Neural Network and Bayesian Network for Automating the Ontology Constructing Process

Using ART2 Neural Network and Bayesian Network for Automating the Ontology Constructing Process Available online at www.sciencedirect.com Procedia Engineering 29 (2012) 3914 3923 2012 International Workshop on Information and Electronics Engineering (IWIEE) Using ART2 Neural Network and Bayesian

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Ontology-Based Information Extraction

Ontology-Based Information Extraction Ontology-Based Information Extraction Daya C. Wimalasuriya Towards Partial Completion of the Comprehensive Area Exam Department of Computer and Information Science University of Oregon Committee: Dr. Dejing

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Text Documents clustering using K Means Algorithm

Text Documents clustering using K Means Algorithm Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Ontology Learning and Reasoning Dealing with Uncertainty and Inconsistency

Ontology Learning and Reasoning Dealing with Uncertainty and Inconsistency Ontology Learning and Reasoning Dealing with Uncertainty and Inconsistency Peter Haase, Johanna Völker Institute AIFB, University of Karlsruhe, Germany {pha,jvo}@aifb.uni-karlsruhe.de Abstract. Ontology

More information

Text clustering based on a divide and merge strategy

Text clustering based on a divide and merge strategy Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and

More information

CADIAL Search Engine at INEX

CADIAL Search Engine at INEX CADIAL Search Engine at INEX Jure Mijić 1, Marie-Francine Moens 2, and Bojana Dalbelo Bašić 1 1 Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia {jure.mijic,bojana.dalbelo}@fer.hr

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

ScienceDirect. Enhanced Associative Classification of XML Documents Supported by Semantic Concepts

ScienceDirect. Enhanced Associative Classification of XML Documents Supported by Semantic Concepts Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 194 201 International Conference on Information and Communication Technologies (ICICT 2014) Enhanced Associative

More information

Heading-Based Sectional Hierarchy Identification for HTML Documents

Heading-Based Sectional Hierarchy Identification for HTML Documents Heading-Based Sectional Hierarchy Identification for HTML Documents 1 Dept. of Computer Engineering, Boğaziçi University, Bebek, İstanbul, 34342, Turkey F. Canan Pembe 1,2 and Tunga Güngör 1 2 Dept. of

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy

Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence

More information

A framework for retrieving conceptual knowledge from Web pages

A framework for retrieving conceptual knowledge from Web pages A framework for retrieving conceptual knowledge from Web pages Nacéra Bennacer, Lobna Karoui Ecole Supérieure d Electricité (Supélec), Plateau de Moulon 3 rue Joliot Curie, 91192 Gif-sur-Yvette cedex,

More information

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Question Answering Using XML-Tagged Documents

Question Answering Using XML-Tagged Documents Question Answering Using XML-Tagged Documents Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/trec11/index.html XML QA System P Full text processing of TREC top 20 documents Sentence

More information

GernEdiT: A Graphical Tool for GermaNet Development

GernEdiT: A Graphical Tool for GermaNet Development GernEdiT: A Graphical Tool for GermaNet Development Verena Henrich University of Tübingen Tübingen, Germany. verena.henrich@unituebingen.de Erhard Hinrichs University of Tübingen Tübingen, Germany. erhard.hinrichs@unituebingen.de

More information

Leopold Franzens University Innsbruck. Ontology Learning. Institute of Computer Science STI - Innsbruck. Seminar Paper

Leopold Franzens University Innsbruck. Ontology Learning. Institute of Computer Science STI - Innsbruck. Seminar Paper Leopold Franzens University Innsbruck Institute of Computer Science STI - Innsbruck Ontology Learning Seminar Paper Applied Ontology Engineering (WS 2010) Supervisor: Dr. Katharina Siorpaes Michael Rogger

More information

Ontology based Web Page Topic Identification

Ontology based Web Page Topic Identification Ontology based Web Page Topic Identification Abhishek Singh Rathore Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Devshri Roy Department of Computer

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using

More information

A Comparative Study Weighting Schemes for Double Scoring Technique

A Comparative Study Weighting Schemes for Double Scoring Technique , October 19-21, 2011, San Francisco, USA A Comparative Study Weighting Schemes for Double Scoring Technique Tanakorn Wichaiwong Member, IAENG and Chuleerat Jaruskulchai Abstract In XML-IR systems, the

More information

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---

More information

Knowledge Engineering with Semantic Web Technologies

Knowledge Engineering with Semantic Web Technologies This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning

More information

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering

Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering A. Anil Kumar Dept of CSE Sri Sivani College of Engineering Srikakulam, India S.Chandrasekhar Dept of CSE Sri Sivani

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1 Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1 Dhirubhai Ambani Institute for Information and Communication Technology, Gandhinagar, Gujarat, India Email:

More information

Manually vs semiautomatic domain specific ontology building

Manually vs semiautomatic domain specific ontology building Facoltà di Lettere e Filosofia Corso di Laurea Specialistica in Comunicazione d impresa e pubblica Tesi di Laurea in Informatica per il Commercio Elettronico Manually vs semiautomatic domain specific ontology

More information

Clustering (COSC 488) Nazli Goharian. Document Clustering.

Clustering (COSC 488) Nazli Goharian. Document Clustering. Clustering (COSC 488) Nazli Goharian nazli@ir.cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,

More information

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION Ms. Nikita P.Katariya 1, Prof. M. S. Chaudhari 2 1 Dept. of Computer Science & Engg, P.B.C.E., Nagpur, India, nikitakatariya@yahoo.com 2 Dept.

More information

Data and Information Integration: Information Extraction

Data and Information Integration: Information Extraction International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Data and Information Integration: Information Extraction Varnica Verma 1 1 (Department of Computer Science Engineering, Guru Nanak

More information

IRCE at the NTCIR-12 IMine-2 Task

IRCE at the NTCIR-12 IMine-2 Task IRCE at the NTCIR-12 IMine-2 Task Ximei Song University of Tsukuba songximei@slis.tsukuba.ac.jp Yuka Egusa National Institute for Educational Policy Research yuka@nier.go.jp Masao Takaku University of

More information

Dynamically Building Facets from Their Search Results

Dynamically Building Facets from Their Search Results Dynamically Building Facets from Their Search Results Anju G. R, Karthik M. Abstract: People are very passionate in searching new things and gaining new knowledge. They usually prefer search engines to

More information

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science

More information

Concept-Based Document Similarity Based on Suffix Tree Document

Concept-Based Document Similarity Based on Suffix Tree Document Concept-Based Document Similarity Based on Suffix Tree Document *P.Perumal Sri Ramakrishna Engineering College Associate Professor Department of CSE, Coimbatore perumalsrec@gmail.com R. Nedunchezhian Sri

More information

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94 ه عا ی Semantic Web Ontology Engineering and Evaluation Morteza Amini Sharif University of Technology Fall 93-94 Outline Ontology Engineering Class and Class Hierarchy Ontology Evaluation 2 Outline Ontology

More information

Refining Ontologies by Pattern-Based Completion

Refining Ontologies by Pattern-Based Completion Refining Ontologies by Pattern-Based Completion Nadejda Nikitina and Sebastian Rudolph and Sebastian Blohm Institute AIFB, University of Karlsruhe D-76128 Karlsruhe, Germany {nikitina, rudolph, blohm}@aifb.uni-karlsruhe.de

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Chapter 13 XML: Extensible Markup Language

Chapter 13 XML: Extensible Markup Language Chapter 13 XML: Extensible Markup Language - Internet applications provide Web interfaces to databases (data sources) - Three-tier architecture Client V Application Programs Webserver V Database Server

More information

Semi-Automatic Ontology Engineering in Business Applications

Semi-Automatic Ontology Engineering in Business Applications Semi-Automatic Ontology Engineering in Business Applications Felix Burkhardt*, Jon Atle Gulla**, Jin Liu*, Christian Weiss*, Jianshen Zhou* *T-Systems Enterprise Services Goslarer Ufer 35 10589 Berlin,

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Information Retrieval

Information Retrieval Natural Language Processing SoSe 2015 Information Retrieval Dr. Mariana Neves June 22nd, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing

More information

Knowledge Engineering in Search Engines

Knowledge Engineering in Search Engines San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2012 Knowledge Engineering in Search Engines Yun-Chieh Lin Follow this and additional works at:

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

An Approach To Web Content Mining

An Approach To Web Content Mining An Approach To Web Content Mining Nita Patil, Chhaya Das, Shreya Patanakar, Kshitija Pol Department of Computer Engg. Datta Meghe College of Engineering, Airoli, Navi Mumbai Abstract-With the research

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering

SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering SAACO: Semantic Analysis based Ant Colony Optimization Algorithm for Efficient Text Document Clustering 1 G. Loshma, 2 Nagaratna P Hedge 1 Jawaharlal Nehru Technological University, Hyderabad 2 Vasavi

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Transforming Requirements into MDA from User Stories to CIM

Transforming Requirements into MDA from User Stories to CIM , pp.15-22 http://dx.doi.org/10.14257/ijseia.2017.11.8.03 Transing Requirements into MDA from User Stories to CIM Meryem Elallaoui 1, Khalid Nafil 2 and Raja Touahni 1 1 Faculty of Sciences, Ibn Tofail

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time: English Student no:... Page 1 of 14 Contact during the exam: Geir Solskinnsbakk Phone: 735 94218/ 93607988 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

More information

Enhancing Web Page Skimmability

Enhancing Web Page Skimmability Enhancing Web Page Skimmability Chen-Hsiang Yu MIT CSAIL 32 Vassar St Cambridge, MA 02139 chyu@mit.edu Robert C. Miller MIT CSAIL 32 Vassar St Cambridge, MA 02139 rcm@mit.edu Abstract Information overload

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Automatically Annotating Text with Linked Open Data

Automatically Annotating Text with Linked Open Data Automatically Annotating Text with Linked Open Data Delia Rusu, Blaž Fortuna, Dunja Mladenić Jožef Stefan Institute Motivation: Annotating Text with LOD Open Cyc DBpedia WordNet Overview Related work Algorithms

More information

Using Shallow Natural Language Processing in a Just-In-Time Information Retrieval Assistant for Bloggers

Using Shallow Natural Language Processing in a Just-In-Time Information Retrieval Assistant for Bloggers Using Shallow Natural Language Processing in a Just-In-Time Information Retrieval Assistant for Bloggers Ang Gao and Derek Bridge Department of Computer Science, University College Cork, Ireland ang.gao87@gmail.com,

More information

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids Marek Lipczak Arash Koushkestani Evangelos Milios Problem definition The goal of Entity Recognition and Disambiguation

More information

SIG-SWO-A OWL. Semantic Web

SIG-SWO-A OWL. Semantic Web ì î SIG-SWO-A201-02 OWL ƒp Semantic Web Ý Ý ÝÛ Ú Û ÌÍÍÛ ì Web 90-: ñå Tom Gruber ~ (Ontolingua) ì (KIF) Generic Ontology CYC, WordNet, EDR PSM Task Ontology 95-97: XML as arbitrary structures 97-98: RDF

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT

A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT ABSTRACT Tahar Guerram and Nacima Mellal Departement of Mathematics and Computer Science, University Larbi Ben M hidi of Oum El Bouaghi -

More information

Precise Medication Extraction using Agile Text Mining

Precise Medication Extraction using Agile Text Mining Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,

More information

SEMANTIC ANALYSIS BASED TEXT CLUSTERING BY THE FUSION OF BISECTING K-MEANS AND UPGMA ALGORITHM

SEMANTIC ANALYSIS BASED TEXT CLUSTERING BY THE FUSION OF BISECTING K-MEANS AND UPGMA ALGORITHM SEMANTIC ANALYSIS BASED TEXT CLUSTERING BY THE FUSION OF BISECTING K-MEANS AND UPGMA ALGORITHM G. Loshma 1 and Nagaratna P. Hedge 2 1 Jawaharlal Nehru Technological University, Hyderabad, India 2 Vasavi

More information

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY Asian Journal Of Computer Science And Information Technology 2: 3 (2012) 26 30. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science and Information Technology Journal

More information

Instructor: Stefan Savev

Instructor: Stefan Savev LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information

More information

LexOnt: A Semi-Automatic Ontology Creation Tool for Programmable Web

LexOnt: A Semi-Automatic Ontology Creation Tool for Programmable Web AAAI Technical Report SS-12-04 Intelligent Web Services Meet Social Computing LexOnt: A Semi-Automatic Ontology Creation Tool for Programmable Web Knarig Arabshian Bell Labs, Alcatel-Lucent Murray Hill,

More information

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH Andreas Walter FZI Forschungszentrum Informatik, Haid-und-Neu-Straße 10-14, 76131 Karlsruhe, Germany, awalter@fzi.de

More information

Prior Art Retrieval Using Various Patent Document Fields Contents

Prior Art Retrieval Using Various Patent Document Fields Contents Prior Art Retrieval Using Various Patent Document Fields Contents Metti Zakaria Wanagiri and Mirna Adriani Fakultas Ilmu Komputer, Universitas Indonesia Depok 16424, Indonesia metti.zakaria@ui.edu, mirna@cs.ui.ac.id

More information

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES

KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES KEYWORD EXTRACTION FROM DESKTOP USING TEXT MINING TECHNIQUES Dr. S.Vijayarani R.Janani S.Saranya Assistant Professor Ph.D.Research Scholar, P.G Student Department of CSE, Department of CSE, Department

More information

Clustering (COSC 416) Nazli Goharian. Document Clustering.

Clustering (COSC 416) Nazli Goharian. Document Clustering. Clustering (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,

More information

ARKTiS - A Fast Tag Recommender System Based On Heuristics

ARKTiS - A Fast Tag Recommender System Based On Heuristics ARKTiS - A Fast Tag Recommender System Based On Heuristics Thomas Kleinbauer and Sebastian Germesin German Research Center for Artificial Intelligence (DFKI) 66123 Saarbrücken Germany firstname.lastname@dfki.de

More information

Data-Mining Algorithms with Semantic Knowledge

Data-Mining Algorithms with Semantic Knowledge Data-Mining Algorithms with Semantic Knowledge Ontology-based information extraction Carlos Vicient Monllaó Universitat Rovira i Virgili December, 14th 2010. Poznan A Project funded by the Ministerio de

More information

Development of Search Engines using Lucene: An Experience

Development of Search Engines using Lucene: An Experience Available online at www.sciencedirect.com Procedia Social and Behavioral Sciences 18 (2011) 282 286 Kongres Pengajaran dan Pembelajaran UKM, 2010 Development of Search Engines using Lucene: An Experience

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume

More information

Annotation Science From Theory to Practice and Use Introduction A bit of history

Annotation Science From Theory to Practice and Use Introduction A bit of history Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora

More information

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department

More information

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA Ernesto William De Luca Overview 2 Motivation EuroWordNet RDF/OWL EuroWordNet RDF/OWL LexiRes Tool Conclusions Overview 3 Motivation EuroWordNet

More information

Error annotation in adjective noun (AN) combinations

Error annotation in adjective noun (AN) combinations Error annotation in adjective noun (AN) combinations This document describes the annotation scheme devised for annotating errors in AN combinations and explains how the inter-annotator agreement has been

More information

A Methodology for Extracting Knowledge about Controlled Vocabularies from Textual Data using FCA-Based Ontology Engineering

A Methodology for Extracting Knowledge about Controlled Vocabularies from Textual Data using FCA-Based Ontology Engineering A Methodology for Extracting Knowledge about Controlled Vocabularies from Textual Data using FCA-Based Ontology Engineering 1 st Simin Jabbari Information Management Institute University of Neuchâtel Neuchâtel

More information

A Tagging Approach to Ontology Mapping

A Tagging Approach to Ontology Mapping A Tagging Approach to Ontology Mapping Colm Conroy 1, Declan O'Sullivan 1, Dave Lewis 1 1 Knowledge and Data Engineering Group, Trinity College Dublin {coconroy,declan.osullivan,dave.lewis}@cs.tcd.ie Abstract.

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information