3 Background technologies 3.1 OntoGen The two main characteristics of the OntoGen system [1,2,6] are the following.

Size: px
Start display at page:

Download "3 Background technologies 3.1 OntoGen The two main characteristics of the OntoGen system [1,2,6] are the following."

Transcription

1 ADVANCING TOPIC ONTOLOGY LEARNING THROUGH TERM EXTRACTION Blaž Fortuna (1), Nada Lavrač (1, 2), Paola Velardi (3) (1) Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia (2) University of Nova Gorica, Vipavska 13, 5000 Nova Gorica, Slovenia (3) Universita di Roma La Sapienza, 113 Via Salaria, Roma RM 00198, Italy ABSTRACT This paper presents a novel methodology for topic ontology learning from text documents. The proposed methodology, named OntoTermExtraction is based on OntoGen, a semi-automated tool for topic ontology construction, upgraded by using and an advanced terminology extraction tool in an iterative, semiautomated ontology construction process. This process consists of (a) document clustering to find the nodes in the topic ontology, (b) term extraction from document clusters, (c) populating the term vocabulary and keyword extraction, and (d) choosing the concept names by comparing the best ranked terms with the extracted keywords. The approach is illustrated on a case study analysis of the ILPNet2 publications data. 1 Introduction OntoGen [1, 2] is a semi-automated, data-driven ontology construction tool, focused on the construction and editing of topic ontologies. In a topic ontology, each node is a cluster of documents, represented by keywords (topics), and nodes are connected by relations (typically, the the SubConcept-Of relation). The system combines text mining techniques with an efficient user interface aimed to reduce user s time and the complexity of ontology construction. In this way it presents a significant improvement in comparison with present manual, and relatively complex ontology editing tools, such as Protégé [3], whose use is hindered by the lack of ontology engineering skills of domain experts constructing the ontology. Concept naming suggestion (i.e. description of a document cluster through a set of relevant terms) plays a central part of the OntoGen system. Concept naming helps the user at evaluating clusters and organizing them hierarchically. This facility is provided by employing unsupervised and supervised methods for generating the suggestions. Despite the well-elaborated and user-friendly approach to concept naming, as currently provided by OntoGen, the approach was limited to single-word keyword suggestions, and by the use of very basic text lemmatization in the OntoGen text preprocessing phase. This paper aims at improving the ontology construction process through improved concept naming, using terminology extraction as implemented in the advanced TermExtractor tool [4,5]. The improved ontology construction process, proposed in this paper, consists of the following steps: document clustering to find the nodes in the topic ontology, terminology extraction from document clusters, Tel: ; Blaz.Fortuna@ijs.si population of the terminology vocabulary and keyword extraction, and choice of concept names by comparing the bestranked terms with the extracted keywords. The proposed approach is illustrated on a case study analysis of the ILPNet2 publications database [4,5], a database of publications in the area of Inductive Logic Programming, extensively gathered for the period of about 20 years. The paper is structured as follows. Section 2 describes the ILPNet2 domain used to illustrate the proposed approach to ontology construction. Section 3 describes the background technologies, as implemented in the OntoGen and TermExtractor tools. Section 4 presents the proposed methodology, through a detailed description of the individual steps of the advanced ontology construction process, illustrated by the results achieved in the analysis of the ILPNet2 database. 2 The ILPnet2 database The domain we analized is the scientific publications database of the ILPnet2 Network of Excellence in Inductive Logic Programming [4]. ILPNet2 consisted of 37 project partners composed mainly of universities and research institutes. Our entity for the analysis are ILP publictions. The ILPnet2 database is publicly available on the Web and contains information about ILP publications between years 1971 and The data about publications in the BibTeX format, available in files at ,..., (one file for each year 2003, 2002, ). The first stage of the data-driven ontology construction process is data acquisition and preprocessing. The data was acquired with the wget utility and converted into the XML format. For easier data management in exploratory analysis of the social network of authors of ILP publications [5], it was convenient to put the data into a relational database format, using the Microsoft SQL Sever. One of the tasks accompanying the database population was the normalization of authors names. While this was crucially needed for social network analysis(described in [5]), this step is not needed for the experiments in ontology construction described in this paper, as ontology construction uses only document titles and abstracts, preprocessed using a predefined list of stop-words and the Porter stemmer. 3 Background technologies 3.1 OntoGen The two main characteristics of the OntoGen system [1,2,6] are the following.

2 Semi-Automatic. The system is an interactive tool that aids the user during the ontology construction process. It suggests concepts, relations between the concepts, and concept names, automatically assigns instances to the concepts, visualizes instances within a concept and provides a good overview of the ontology to the user through concept browsing and various kinds of visualizations. At the same time the user is always in full control of the system and can affect the ontology construction by accepting or rejecting the system s suggestions or manually editing the ontology. Data-Driven. Most of the aid provided by the system is based on the underlying data, provided by the user typically at the beginning of the ontology construction process. The data affects the structure of the domain for which the user is building the ontology. The data is usually a document corpus, where ontological instances are either documents themselves or named entities occurring in the documents. The system supports automated extraction of instances (used for learning concepts) and co-occurrences of instances (used for learning relations between the concepts) from the data. Major features of the system serve one or both of the two major design goals of OntoGen: (1) visualization and exploration of existing concepts from the ontology, and (2) addition of new concepts or modification of existing concept using simple and straightforward machine learning and text mining algorithms. The main window of the system (see Figure 1) provides multiple views on the ontology. A tree-based view on the ontology, as it is intuitive for most users, presents a natural way to represent a concept hierarchy. This view is used to show the folder structure and as a visualization offering a one-glance view of the whole ontology. Each concept from the ontology is further explained by the most informative keywords describing the target concept, automatically extracted by employing unsupervised and supervised learning methods. A sample ontology in the form of a tree-based concept hierarchy is shown in Figure 2. Both the first and the second level of the concept hierarchy were constructed using the k-means clustering algorithm, where the first level was split into 7 concepts and each of these concepts was than further split into three sub-concepts. The hierarchical structuring is user-triggered. At each single level, k-means is invoked for various user-defined values of k, then selecting the preferred k and dividing all the documents into k-subclusters, as a consequence. While this procedure of ontology construction is elegant and simple for the user, quite some effort is needed to understand the content and the meaning of the selected concepts. This is especially striking when comparing the second level concepts, for example the sub-concepts of the concept named logic_program, program, and inductive_logic in Figure 2 with the sub-concepts of the concept logic program in Figure 3, which shows the concept hierarchy developed by the novel concept naming methodology based on TermExtractor. 3.2 TermExtractor The TermExtractor tool [7,8] for automatic extraction of terms (possibly consisting of several words, as opposed to single keywords) from documents works as follows. Figure 1. The user gets suggestions for the sub-concepts of the selected concept (left bottom part); the ontology is visualized as a tree-based concept hierarchy in a textual mode (left upper part) and in a graphical mode (right part). Given a collection of documents from the desired domain, TermExtractor first extracts a list of candidate terms (frequent multi-word expressions). In the second step it evaluates each of the candidate terms using several scores which are then combined and the candidates are ranked according to the combined score. The output is a set of candidates whose score excides a given threshold. Documents from contrast domains are used as extra input for term evaluation and serve as a control group for measuring the term significance. The following scores are used to evaluate candidate terms in the second step (normalized score values are in the [0,1] interval): Domain Relevance is high if the term is significantly more frequent in the domain of interest than in other domains. Domain consensus is high if the term is used consistently across the documents from the domain. Lexical cohesion is high if the words composing the term are more frequently found with the term than alone in the documents. Structural Relevance is high for terms that are emphasized in the documents (e.g. appear in the title). Miscellaneous set of heuristics is used to remove generic modifiers (e.g. large knowledge base). The combined score is a weighted convex combination of the individual scores. 4 OntoTermExtraction methodology 4.1 Motivation There are several ways in which a vocabulary can be acquired. In some domains there already exist established vocabularies (e.g. EUROVOC used for annotating European legislation, AGROVOC used for annotating agricultural documents, ASFA used within UN FAO,

3 DMOZ created collaboratively to categorize web pages, etc.). Another option is automatic extraction of terms from documents, which is especially attractive for the domains where there is no established vocabulary. Figure 2 Ontology constructed by the standard OntoGen approach, constructed from ILPnet2 publications data, using the k- means clustering algorithm without any help from the pre-calculated vocabulary extracted by TermExtractor. Concept and concept name suggestions play a central part in every ontology construction system. OntoGen provides unsupervised and supervised methods for generating such suggestions [1,2,6]. Unsupervised learning methods automatically generate a list of sub-concepts for a currently selected concept by using k-means clustering and latent semantic indexing (LSI) techniques to generate a list of possible sub-concepts. On the other hand, supervised learning methods require the user to have a rough idea about a new topic 1 this is identified through a query returning the documents. The system automatically identifies the documents that correspond to the topic and the selection can be further refined by the user-computer interaction through an active learning loop using a machine learning technique for semi-automatic acquisition of the user's knowledge. While OntoGen originally used only the input documents for proposing concept suggestions and term extraction techniques for providing help at naming the concepts, it should be noted that the whole process can be significantly improved by constructing a predefined vocabulary from the domain of the ontology under construction. The vocabulary can be used to support the user during 1 Hereafter we name concepts the document clusters generated by the k means clustering algorithm, while a topic is a description of the concept, e.g. a term of a set of terms that best identify the document cluster. hierarchical ordering of concepts, and to create concept descriptions, thus helping concept evaluation. 4.2 Steps in the proposed OntoTermExtraction methodology for concept naming The advanced ontology construction process, proposed in this paper, consists of the following steps: (a) document clustering to find the nodes in the ontology (described in Section 3.1), (b) terminology extraction from document clusters (described in Section 3.2), using TermExtractor (c) populating the term vocabulary and keyword extraction (described in Section 4.3), (d) choosing the concept name (topic) by comparing the best-ranked terms with the extracted keywords (described in the ILPNet2 application in Section 5). 4.3 Populating the terms and keyword extraction For each term from the vocabulary, a classification model is need which can predict if the term is relevant for a given document cluster. In this paper we use a centroid based nearest neighbor classifier [6] which was developed for fast classification of documents into taxonomies. We use this approach since it can scale well to larger collections of terms (hundreds of thousands of terms). A training set of documents is needed to generate a classification model. In some cases vocabularies already come with a set of documents annotated by the terms. In this case these documents can be used for training the term models. When no annotated documents are available, information

4 retrieval can be applied for finding documents to populate the terms. In this paper we propose using two different techniques to populate terms extracted by TermExtractor. Let T be the set of terms automatically extracted from document clusters: The first technique uses the ILPnet2 collection. Each term t T was issued in turn as a query and the top ranked documents (according to cosine similarity, using TFIDF word weighting) were used to populate the term. The second technique did not use the ILPnet2 collection and relied on Google web search instead [9]. A query was generated from each term t by taking its words and attaching an extra keyword "ILP" to limit the search to ILP related web pages. For example, if t is inductive logic programming, the query is ILP inductive logic programming. The query is then sent to Google and snippets of the returned search results are used to populate the term. The ILP vocabulary prepared in this way was used as an extra input to OntoGen, besides the collection of the articles. We tried both approaches but in this report we only show the results of the second technique, because retrieval from the whole web turned out to be a richer resource than just the ILPnet2 collection. Details on how the vocabulary looked and how it was applied in the ILP ontology construction are described in Section 5. 5 ILPnet2 vocabulary and ontology construction 5.1 Vocabulary extraction As described in the previous section, we used TermExtractor to automatically extract the vocabulary for the ILP domain from the ILPnet2 collection of ILP publications. Table 1 shows the 11 top-ranked terms (out of 97) extracted from ILPNet2 documents. Table 1: Top-10 terms extracted from ILPNet2 Term Weig ht Doma in Relev ance Doma in Conse nsus Lexical Cohesion inductive logic logic programming inductive logic programming background knowledge logic program machine learning data mining refinement operator decision tree inverse resolution experimental result All the terms were populated using Google web search. As an example, here are the top 5 snippets that were returned for the query "ILP predictive accuracy": Boosting Descriptive ILP for Predictive Learning in Bioinformatics -- general, this means that a higher predictive accuracy can be achieved. Thirdly, although some predictive ILP systems may produce multiple classification... Imperial College Computational Bioinformatics Laboratory (CBL) -- Results on scientific discovery applications of ILP are separated below... Progol's predictive accuracy was equivalent to regression on the main set of Evolving Logic Programs to Classify Chess-Endgame Positions -- indicate that in the cases where the ILP algorithm performs badly, the introduc-. tion of either union or crossover increases predictive accuracy.... Estimating the Predictive Accuracy of a Classifier -- the predictive accuracy of a classifier. We present a scenario where meta-... Workshop on Data Mining, Decision Support, Meta-Learning and ILP, *-BibTeX An outline of the theory of ILP is given, together with a description of Golem... Performance is measured using both predictive accuracy and a new cost... For each query the snippets of the first 1000 results were used. The snippets served as input for term modeling, described in Section 4.3. The models generated for each term, using this data, were then used for generating the concept suggestions and name suggestions in OntoGen. 5.2 Ontology learning First the ILPnet2 collection and vocabulary were loaded into the program. The collection was imported in OntoGen as a directory of files, where each document was a separate ASCII text file (File -> New ontology -> Folder). The vocabulary was loaded using the Tools -> Context menu. After experimenting with different numbers and with the help of concept visualization, a partition into seven concepts using the k-means clustering algorithm was chosen. For all the seven concepts the first-ranked term suggested from the vocabulary suggested by TermExtractor was selected. This means that the term extraction and population have indeed succeeded to rank the terms in a meaningful way. This is illustrated also by the following list of discovered concepts, with best-ranked concept names proposed by TermExtractor, followed by the second best-ranked concept name (in parantheses), and the list of most important keywords, as chosen originally by OntoGen: Learning system (learning algorithm) -- learning, system, rule, language, methods, machine_learning, machine, approach, ilp, grammars Decision tree (logical decision tree) -- order, inductive, trees, order_logic, discovery, decision, application, decision_trees, database, experiments Structured data (chemical structure) -- structural, data, machine, predict, examples, relations, machine_learning, mining, definitions, knowledge Clausal theory (theory revision) -- theories, refinement, inverse, resolution, predicates, operators, inverse_resolution, invention, refinement_operators, revision Relational database (inductive learning) -- ilp, generalization, relations, model, algorithm, constraints, integrating, rule, agent, evaluation

5 Figure 3 Ontology constructed on top of ILPnet2 dataset using the pre-calculated terminology. By checking the publication years of articles from different concepts it was possible to analyse the evolution of topics. For example, we can notice that most frequent years in concepts clausal theory, concept learning and logic program were around 1994, concepts structured data and learning system were most frequent around year 2000, and concepts decision tree and relational database appear to be most recent in years following Each of the concepts was further split into sub-concepts using suggestions from the vocabulary which resulted in the two-level taxonomy shown in Figure 3. 5 Conclusions We presented a novel concept naming methodology applicable in advanced ontology construction, and illustrated the improved concept naming facility on the ontology of topics, extracted from the ILPNet2 scientific publications database. Concept naming supports the user in the task of concept discovery, concept naming and keeps the constructed ontology more consistent and aligned with the established terminology in the domain. Acknowledgement This work was supported by the Slovenian Research Agency and the IST Programme of the EC under NeOn (IST IP), PASCAL (IST ) and ECOLEAD. References [1] B. Fortuna, D. Mladenić, M. Grobelnik. Semiautomatic construction of topic ontologies. In: Ackermann et al. (eds.) Semantics, Web and Mining. LNCS (LNAI), vol. 4289, pp Springer, [2] B. Fortuna, M. Grobelnik, and D. Mladenić. Semiautomatic Data-driven Ontology Construction System. In: Proc. of the 9th International multi-conference Information Society IS-2006, Ljubljana, Slovenia, [3] The Protégé project Available online at [4] ILPNet2 publications database. Available online at [5] S. Sabo, M. Grčar, D.A. Fabjan, P. Ljubič, N. Lavrač. Exploratory analysis of the ILPnet2 social network. In Proc. of the 10th International multi-conference Information Society IS-2006, Ljubljana, Slovenia, [6] M. Grobelnik, D. Mladenić. Simple classification into large topic ontology of web documents. In Proc. of the 27th International Conference Information Technology Interfaces, Dubrovnik, Croatia, pp , 2005 [7] The TermExtractor tool. Available online through a Web interface at [8] F. Sclano and P. Velardi "TermExtractor: A Web application to learn the common terminology of interest groups and research communities. In Proc. of the 9th Conf. on Terminology and Artificial Intelligence (TIA 2007), Sophia Antinopolis, France, [9] M. Grcar, E. Klien. Using Term-matching Algorithms for the Annotation of Geo-services. Web Mining 2.0, Workshop at ECML 2007.

OntoGen: Semi-automatic Ontology Editor

OntoGen: Semi-automatic Ontology Editor OntoGen: Semi-automatic Ontology Editor Blaz Fortuna, Marko Grobelnik, and Dunja Mladenic Department of Knowledge Technologies, Institute Jozef Stefan, Jamova 39, 1000 Ljubljana, Slovenia {blaz.fortuna,

More information

User Profiling for Interest-focused Browsing History

User Profiling for Interest-focused Browsing History User Profiling for Interest-focused Browsing History Miha Grčar, Dunja Mladenič, Marko Grobelnik Jozef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia {Miha.Grcar, Dunja.Mladenic, Marko.Grobelnik}@ijs.si

More information

Contextualizing Ontologies with OntoLight: A Pragmatic Approach

Contextualizing Ontologies with OntoLight: A Pragmatic Approach Contextualizing Ontologies with OntoLight: A Pragmatic Approach Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič Department of Knowledge Technologies Jozef Stefan Institute Jamova 39, 1000 Ljubljana,

More information

COLLABORATION OPPORTUNITY FINDER

COLLABORATION OPPORTUNITY FINDER 18 COLLABORATION OPPORTUNITY FINDER Damjan Demšar 1, Igor Mozeti 1, Nada Lavra 1,2 {damjan.demsar, igor.mozetic, nada.lavrac}@ijs.si 1 Jožef Stefan Institute, Ljubljana, SLOVENIA 2 University of Nova Gorica,

More information

Semantic text features from small world graphs

Semantic text features from small world graphs Semantic text features from small world graphs Jurij Leskovec 1 and John Shawe-Taylor 2 1 Carnegie Mellon University, USA. Jozef Stefan Institute, Slovenia. jure@cs.cmu.edu 2 University of Southampton,UK

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Visualization of Text Document Corpus

Visualization of Text Document Corpus Informatica 29 (2005) 497 502 497 Visualization of Text Document Corpus Blaž Fortuna, Marko Grobelnik and Dunja Mladenić Jozef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia E-mail: {blaz.fortuna,

More information

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report Technical Report A B2B Search Engine Abstract In this report, we describe a business-to-business search engine that allows searching for potential customers with highly-specific queries. Currently over

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the

More information

Remotely Sensed Image Processing Service Automatic Composition

Remotely Sensed Image Processing Service Automatic Composition Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University

More information

Enabling Semantic Search in Large Open Source Communities

Enabling Semantic Search in Large Open Source Communities Enabling Semantic Search in Large Open Source Communities Gregor Leban, Lorand Dali, Inna Novalija Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana {gregor.leban, lorand.dali, inna.koval}@ijs.si

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

Web Usage Mining: A Research Area in Web Mining

Web Usage Mining: A Research Area in Web Mining Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining

More information

Introduction to Web Clustering

Introduction to Web Clustering Introduction to Web Clustering D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 June 26, 2009 Outline Introduction to Web Clustering Some Web Clustering engines The KeySRC approach Some

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Multi-Aspect Tagging for Collaborative Structuring

Multi-Aspect Tagging for Collaborative Structuring Multi-Aspect Tagging for Collaborative Structuring Katharina Morik and Michael Wurst University of Dortmund, Department of Computer Science Baroperstr. 301, 44221 Dortmund, Germany morik@ls8.cs.uni-dortmund

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

GoNTogle: A Tool for Semantic Annotation and Search

GoNTogle: A Tool for Semantic Annotation and Search GoNTogle: A Tool for Semantic Annotation and Search Giorgos Giannopoulos 1,2, Nikos Bikakis 1,2, Theodore Dalamagas 2, and Timos Sellis 1,2 1 KDBSL Lab, School of ECE, Nat. Tech. Univ. of Athens, Greece

More information

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Clustering Results. Result List Example. Clustering Results. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to

More information

DL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

DL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza DL User Interfaces Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Delos work on DL interfaces Delos Cluster 4: User interfaces and visualization Cluster s goals:

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Semi-Automatic Ontology Engineering in Business Applications

Semi-Automatic Ontology Engineering in Business Applications Semi-Automatic Ontology Engineering in Business Applications Felix Burkhardt*, Jon Atle Gulla**, Jin Liu*, Christian Weiss*, Jianshen Zhou* *T-Systems Enterprise Services Goslarer Ufer 35 10589 Berlin,

More information

Terminologies, Knowledge Organization Systems, Ontologies

Terminologies, Knowledge Organization Systems, Ontologies Terminologies, Knowledge Organization Systems, Ontologies Gerhard Budin University of Vienna TSS July 2012, Vienna Motivation and Purpose Knowledge Organization Systems In this unit of TSS 12, we focus

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Social Network Analysis of Ontology Edit Logs

Social Network Analysis of Ontology Edit Logs Social Network Analysis of Ontology Edit Logs Nenad Tomašev, Dunja Mladenić Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia e-mail: nenad.tomasev@ijs.si, dunja.mladenic@ijs.si Abstract. This

More information

ImgSeek: Capturing User s Intent For Internet Image Search

ImgSeek: Capturing User s Intent For Internet Image Search ImgSeek: Capturing User s Intent For Internet Image Search Abstract - Internet image search engines (e.g. Bing Image Search) frequently lean on adjacent text features. It is difficult for them to illustrate

More information

Pattern Mining in Frequent Dynamic Subgraphs

Pattern Mining in Frequent Dynamic Subgraphs Pattern Mining in Frequent Dynamic Subgraphs Karsten M. Borgwardt, Hans-Peter Kriegel, Peter Wackersreuther Institute of Computer Science Ludwig-Maximilians-Universität Munich, Germany kb kriegel wackersr@dbs.ifi.lmu.de

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Motivating Ontology-Driven Information Extraction

Motivating Ontology-Driven Information Extraction Motivating Ontology-Driven Information Extraction Burcu Yildiz 1 and Silvia Miksch 1, 2 1 Institute for Software Engineering and Interactive Systems, Vienna University of Technology, Vienna, Austria {yildiz,silvia}@

More information

Text Categorization (I)

Text Categorization (I) CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization

More information

Visualization and text mining of patent and non-patent data

Visualization and text mining of patent and non-patent data of patent and non-patent data Anton Heijs Information Solutions Delft, The Netherlands http://www.treparel.com/ ICIC conference, Nice, France, 2008 Outline Introduction Applications on patent and non-patent

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

A Survey on Information Extraction in Web Searches Using Web Services

A Survey on Information Extraction in Web Searches Using Web Services A Survey on Information Extraction in Web Searches Using Web Services Maind Neelam R., Sunita Nandgave Department of Computer Engineering, G.H.Raisoni College of Engineering and Management, wagholi, India

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Collaborative editing of knowledge resources for cross-lingual text mining

Collaborative editing of knowledge resources for cross-lingual text mining UNIVERSITÀ DI PISA Scuola di Dottorato in Ingegneria Leonardo da Vinci Corso di Dottorato di Ricerca in INGEGNERIA DELL INFORMAZIONE Tesi di Dottorato di Ricerca Collaborative editing of knowledge resources

More information

TALP at WePS Daniel Ferrés and Horacio Rodríguez

TALP at WePS Daniel Ferrés and Horacio Rodríguez TALP at WePS-3 2010 Daniel Ferrés and Horacio Rodríguez TALP Research Center, Software Department Universitat Politècnica de Catalunya Jordi Girona 1-3, 08043 Barcelona, Spain {dferres, horacio}@lsi.upc.edu

More information

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation

Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation Clustering Web Documents using Hierarchical Method for Efficient Cluster Formation I.Ceema *1, M.Kavitha *2, G.Renukadevi *3, G.sripriya *4, S. RajeshKumar #5 * Assistant Professor, Bon Secourse College

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

DATA MINING - 1DL105, 1DL111

DATA MINING - 1DL105, 1DL111 1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Using Text Learning to help Web browsing

Using Text Learning to help Web browsing Using Text Learning to help Web browsing Dunja Mladenić J.Stefan Institute, Ljubljana, Slovenia Carnegie Mellon University, Pittsburgh, PA, USA Dunja.Mladenic@{ijs.si, cs.cmu.edu} Abstract Web browsing

More information

DATA MINING II - 1DL460. Spring 2014"

DATA MINING II - 1DL460. Spring 2014 DATA MINING II - 1DL460 Spring 2014" A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt14 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER CS6007-INFORMATION RETRIEVAL Regulation 2013 Academic Year 2018

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI

MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,

More information

Ontology-Based Web Query Classification for Research Paper Searching

Ontology-Based Web Query Classification for Research Paper Searching Ontology-Based Web Query Classification for Research Paper Searching MyoMyo ThanNaing University of Technology(Yatanarpon Cyber City) Mandalay,Myanmar Abstract- In web search engines, the retrieval of

More information

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 10-15 www.iosrjen.org Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm P.Arun, M.Phil, Dr.A.Senthilkumar

More information

Ontology Extraction from Heterogeneous Documents

Ontology Extraction from Heterogeneous Documents Vol.3, Issue.2, March-April. 2013 pp-985-989 ISSN: 2249-6645 Ontology Extraction from Heterogeneous Documents Kirankumar Kataraki, 1 Sumana M 2 1 IV sem M.Tech/ Department of Information Science & Engg

More information

Theme Identification in RDF Graphs

Theme Identification in RDF Graphs Theme Identification in RDF Graphs Hanane Ouksili PRiSM, Univ. Versailles St Quentin, UMR CNRS 8144, Versailles France hanane.ouksili@prism.uvsq.fr Abstract. An increasing number of RDF datasets is published

More information

Improving web search with FCA

Improving web search with FCA Improving web search with FCA Radim BELOHLAVEK Jan OUTRATA Dept. Systems Science and Industrial Engineering Watson School of Engineering and Applied Science Binghamton University SUNY, NY, USA Dept. Computer

More information

Bipartite Graph Partitioning and Content-based Image Clustering

Bipartite Graph Partitioning and Content-based Image Clustering Bipartite Graph Partitioning and Content-based Image Clustering Guoping Qiu School of Computer Science The University of Nottingham qiu @ cs.nott.ac.uk Abstract This paper presents a method to model the

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Optimization of Query Processing in XML Document Using Association and Path Based Indexing Optimization of Query Processing in XML Document Using Association and Path Based Indexing D.Karthiga 1, S.Gunasekaran 2 Student,Dept. of CSE, V.S.B Engineering College, TamilNadu, India 1 Assistant Professor,Dept.

More information

Available online at ScienceDirect. Procedia Computer Science 52 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 52 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 52 (2015 ) 1071 1076 The 5 th International Symposium on Frontiers in Ambient and Mobile Systems (FAMS-2015) Health, Food

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Clustering for Ontology Evolution

Clustering for Ontology Evolution Clustering for Ontology Evolution George Tsatsaronis, Reetta Pitkänen, and Michalis Vazirgiannis Department of Informatics, Athens University of Economics and Business, 76, Patission street, Athens 104-34,

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems

A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems A modified and fast Perceptron learning rule and its use for Tag Recommendations in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University

More information

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I. International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Mapping Bug Reports to Relevant Files and Automated

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery

An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery An Integrated Framework to Enhance the Web Content Mining and Knowledge Discovery Simon Pelletier Université de Moncton, Campus of Shippagan, BGI New Brunswick, Canada and Sid-Ahmed Selouani Université

More information

BPAL: A Platform for Managing Business Process Knowledge Bases via Logic Programming

BPAL: A Platform for Managing Business Process Knowledge Bases via Logic Programming BPAL: A Platform for Managing Business Process Knowledge Bases via Logic Programming Fabrizio Smith, Dario De Sanctis, Maurizio Proietti National Research Council, IASI Antonio Ruberti - Viale Manzoni

More information

Taxonomies and controlled vocabularies best practices for metadata

Taxonomies and controlled vocabularies best practices for metadata Original Article Taxonomies and controlled vocabularies best practices for metadata Heather Hedden is the taxonomy manager at First Wind Energy LLC. Previously, she was a taxonomy consultant with Earley

More information

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bonfring International Journal of Data Mining, Vol. 7, No. 2, May 2017 11 News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages Bamber and Micah Jason Abstract---

More information

INTELLIGENT SYSTEMS OVER THE INTERNET

INTELLIGENT SYSTEMS OVER THE INTERNET INTELLIGENT SYSTEMS OVER THE INTERNET Web-Based Intelligent Systems Intelligent systems use a Web-based architecture and friendly user interface Web-based intelligent systems: Use the Web as a platform

More information

Development of an Ontology-Based Portal for Digital Archive Services

Development of an Ontology-Based Portal for Digital Archive Services Development of an Ontology-Based Portal for Digital Archive Services Ching-Long Yeh Department of Computer Science and Engineering Tatung University 40 Chungshan N. Rd. 3rd Sec. Taipei, 104, Taiwan chingyeh@cse.ttu.edu.tw

More information

ijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System

ijade Reporter An Intelligent Multi-agent Based Context Aware News Reporting System ijade Reporter An Intelligent Multi-agent Based Context Aware Reporting System Eddie C.L. Chan and Raymond S.T. Lee The Department of Computing, The Hong Kong Polytechnic University, Hung Hong, Kowloon,

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

SAS Enterprise Miner : What does the future hold?

SAS Enterprise Miner : What does the future hold? SAS Enterprise Miner : What does the future hold? David Duling EM Development Director SAS Inc. Sascha Schubert Product Manager Data Mining SAS International Topics for Discussion: EM 4.2/SAS 9.0 AF/SCL

More information

Dynamic Ontology Evolution

Dynamic Ontology Evolution Dynamic Evolution Fouad Zablith Knowledge Media Institute (KMi), The Open University. Walton Hall, Milton Keynes, MK7 6AA, United Kingdom. f.zablith@open.ac.uk Abstract. Ontologies form the core of Semantic

More information

Self-tuning ongoing terminology extraction retrained on terminology validation decisions

Self-tuning ongoing terminology extraction retrained on terminology validation decisions Self-tuning ongoing terminology extraction retrained on terminology validation decisions Alfredo Maldonado and David Lewis ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

Adaptive and Personalized System for Semantic Web Mining

Adaptive and Personalized System for Semantic Web Mining Journal of Computational Intelligence in Bioinformatics ISSN 0973-385X Volume 10, Number 1 (2017) pp. 15-22 Research Foundation http://www.rfgindia.com Adaptive and Personalized System for Semantic Web

More information

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96 ه عا ی Semantic Web Ontology Engineering and Evaluation Morteza Amini Sharif University of Technology Fall 95-96 Outline Ontology Engineering Class and Class Hierarchy Ontology Evaluation 2 Outline Ontology

More information