Data-Mining Algorithms with Semantic Knowledge
|
|
- Marian Lewis
- 5 years ago
- Views:
Transcription
1 Data-Mining Algorithms with Semantic Knowledge Ontology-based information extraction Carlos Vicient Monllaó Universitat Rovira i Virgili December, 14th Poznan A Project funded by the Ministerio de Ciencia e Innovación and Universitat Rovira i Virgili DAMASK, 2010
2 Contents 1. DAMASK 1. Introduction 2. Goals 3. Working plan 2. Ontology-based information extraction (Task 1) 1. State of art 1. IR vs IE 2. Ontology-based IE 2. Main methodology 3. Step by step methodology 1. Named entities detection 2. Discovering entity-subsumer concept (Candidates extraction) 3. Semantic annotation 2
3 1.- DAMASK DATA MINING ALGORITHMS WITH SEMANTIC KNOWLEDGE 3
4 INTRODUCTION Data-Mining Algorithms with Semantic Knowledge Founded by Ministerio de Ciencia e Innovación and Universitat Rovira i Virgili Main motivations: Explosive growth in the amount of information available on networked computers around the world, much of it in the form of natural language documents Increasing interest in semantic web contents => Semantic Knowledge Lack of use of domain knowledge of traditional data mining methods 4
5 GOALS Processing and extraction of Web resources based on ontologies. Extraction of relevant data from a domain of structured, semi-structured and unstructured Web resources. Semantic integration of information in an attribute-value matrix that can be used for further clustering methods. Performing an automatic classification of data (clustering method based on ontologies) Adaptation of traditional clustering methods to create classifications (trees and partitions) using semantic information. Definition of methods to analyse automatically the clusters obtained from previous step Test the practical applicability of the developed methods in the strategic area of Tourism 5
6 WORK PLAN I The project is divided into 3 main task: Task 1 - Ontology-based information extraction and integration from heterogeneous Web resources Task 2 - Automatic clustering of entities based on the semantics of the concepts and attributes obtained from the Web resources Task 3 - Application of the developed methods to a Tourism test case 6
7 WORK PLAN I The project is divided into 3 main task: Task 1 - Ontology-based information extraction and integration from heterogeneous Web resources Task 2 - Automatic clustering of entities based on the semantics of the concepts and attributes obtained from the Web resources Task 3 - Application of the developed methods to a Tourism test case 7
8 WORK PLAN (Task 1) II The key point of this task is to complement the syntactical parsing and natural language processing techniques with the knowledge contained in one or several input ontologies in order to be able to: Identify relevant features describing a particular entity from textual data Associate, if applicable, extracted features to concepts contained in the input ontologies. 8
9 2.- ONTOLOGY-BASED INFORMATION EXTRACTION 9
10 STATE OF ART (IR vs IE) I IR simply finds texts and presents them to the user (as classic search engines) Information Extraction (IE) is the task of locating specific pieces of data within a natural language document IE analyses texts and presents only the specific information extracted from the text that is of interest to a user Wrapper : a set of extraction rules suitable to extract information from a Web site. Two main approaches: Knowledge engineering supervised, traditional IE Automatic training unsupervised, open IE 10
11 STATE OF ART (IR vs IE) I IR simply finds texts and presents them to the user (as classic search engines) Information Extraction (IE) is the task of locating specific pieces of data within a natural language document IE analyses texts and presents only the specific information extracted from the text that is of interest to a user Wrapper : a set of extraction rules suitable to extract information from a Web site. Two main approaches: Knowledge engineering supervised, traditional IE Automatic training unsupervised, open IE 11
12 STATE OF ART (IR vs IE) II Comparison of tradition IE and Open IE 12
13 STATE OF ART (Ontology-Based IE) III Ontology-Based IE (Motivations): Growing interest in the research community in developing data mining techniques Textual documents describing a particular entity are difficult to process in order to extract relevant features which could be exploited in order to apply semantically focused data mining algorithms There have been many conceptual approximations in the field of Semantic Web in which it is assumed that resources have been semantically annotated, in the short term future we cannot expect the availability of a massive amount of annotated Web resources Ontology Based information extraction relies on ontologies in order to interpret the textual content of a resource regardless of its format. 13
14 STATE OF ART (Ontology-Based IE) IV Ontologies have emerged as a new paradigm to model and formalize domain knowledge in a machine readable way IE and ontologies are involved in two main and related tasks. Used for: Information Extraction: IE needs ontologies as part of the understanding process for extracting the relevant information; Populating and enhancing the ontology: texts are useful sources of knowledge to design and enrich ontologies. These two tasks can be combined in a cyclic process: ontologies are used for interpreting the text at the right level for IE and IE extracts new knowledge from text, to be integrated in the ontology. 14
15 Cyclic process IE ALGORITHMS Relevant extracted information ONTOLOGY Populating and enhancing 15
16 METHODOLOGY I Task 1 methodology could be compared with respect to automatic semantic annotation of documents. 1. Named Entity detection (instances of things) 2. Discovering entity-subsumer concept (candidates from Named Entity) 3. Semantic annotation of Named Entities (Pairs of NE and candidate) 16
17 METHODOLOGY (Named Entity detection) II 17
18 METHODOLOGY (Named Entity detection) II Madrid Paris Llobregat Catalan Antoni Gaudí Sagrada Familia 18
19 METHODOLOGY (Named Entity detection) III Extracted NE Madrid Paris Llobregat Catalan Antoni Gaudí Sagrada Familia 19
20 METHODOLOGY (Named Entity detection) III Extracted NE Madrid Paris Llobregat Catalan Representative NE Catalan Llobregat Antoni Gaudí Sagrada Faminlia Antoni Gaudí Sagrada Familia 20
21 METHODOLOGY (discovering Entity-subset) IV Representative NE Sagrada Familia Catalan Llobregat Antoni Gaudí 21
22 METHODOLOGY (discovering Entity-subset) IV Representative NE Sagrada Familia Catalan Llobregat Antoni Gaudí Subset {Cathedral, church} {Language} {River, town} {Architect, person} 22
23 METHODOLOGY (Ontology Matching) V NE-Subset Sagrada Familia; {Cathedral, church} Semantic annotation Catalan; {Language} Llobregat; {River, town} Antoni Gaudí; {Architect, person} 23
24 METHODOLOGY (Ontology Matching) V NE-Subset Sagrada Familia; {Cathedral, church} Semantic annotation Catalan; {Language} Llobregat; {River, town} Antoni Gaudí; {Architect, person} 24
25 STEP by STEP (Named Entity detection) I "Named entities are phrases that contain the names of persons, organizations, locations, times, and quantities." (CoNLL 2002). Problems to detect NE: Unstructured and unlimited by nature Relationships remain hidden in the text from which the extraction has been performed. Approaches Using rules learned from pre-tagget examples => Recall problems Use a thesaurus to detect NE (if it is not found in the dictionary, it is assumed to be a NE => NE composed by common words are discarded Exploiting the way in which NE are expressed in languages such as English using heuristics => Inaccurate results. Using linguistic analyses, heuristics and statistics web 25
26 STEP by STEP (Named Entity detection) II Linguistic analysis are applied to detect NE Tool: OpenNLP => Natural language Parser Four steps: SD, TOK, TAG, CHUNK. CHUNK is able to detect NE using a database => Lower recall, Limited NE [NP The/VB gothic/jj cathedral/nn] [VP of/vb] [NP Barcelona/NNP] [NP Tarragona/EX] [VP is/nns] [NP a/jjs city/nn] Proposal: Filter noise: remove stop words, misspellings, etc. Heuristics: Select all Noun Phrases (NP) where [NP.+ Regex2: s[a-z] Problem: Not all potencial NE are representative for the analized instance. e.g. Neither Paris nor Madrid are representative for Barcelona 26
27 STEP by STEP (Named Entity detection) III In order to improve NE extraction precision it will be complemented with a Web-based reliability analysis. Wider context, i.e. several observations in heterogeneous contexts Web-based analysis approach consists in use Web-statistics to sort all NE combining Semantic Relatedness measures and hits Relatedness Measures: PMI (Pointwise mutual information) SCP (Symmetrical. Conditional Probability) NGD (Normalized Google distance) 27
28 STEP by STEP (Named Entity detection) IV PMI (Pointwise mutual information) Using hits, 28
29 STEP by STEP (Named Entity detection) V SCP (Symmetrical. Conditional Probability) NGD (Normalized Google Distance) Where, M is the total number of Internet webpages. 29
30 STEP by STEP (Named Entity detection) VI Hits for Barcelona and Sagrada Familia Hits, Similarity, Sim(Barcelona, SagradaFamilia) = * = 3,42803E-09 30
31 STEP by STEP (Named Entity detection) VII Named Entity Hits(Bcn) Hits(NE) Hits(Bcn^NE) PMI Sagrada Familia ,580461E-09 Llobregat ,336235E-09 Antoni Gaudí ,605033E-09 Madrid ,592376E-09 Paris ,789873E-10 Catalan ,772626E-10 (*) Queries has been performed using yahoo searcher engine 31
32 STEP by STEP (Named Entity detection) VII Named Entity Hits(Bcn) Hits(NE) Hits(Bcn^NE) PMI Sagrada Familia ,580461E-09 Llobregat ,336235E-09 Antoni Gaudí ,605033E-09 Madrid ,592376E-09 Paris ,789873E-10 Catalan ,772626E-10 (*) Queries has been performed using yahoo searcher engine 32
33 STEP by STEP (Named Entity detection) VIII Select representative NE using different thresholds Extracted NE Sorted by PMI Sagrada Familia Llobregat Antoni Gaudí Madrid Representative NE Sagrada Familia Llobregat Antoni Gaudí Madrid Paris Catalan Paris Catalan 33
34 STEP by STEP (Named Entity detection) VIII Select representative NE using different thresholds Extracted NE Sorted by PMI Sagrada Familia Llobregat Antoni Gaudí Madrid Representative NE Sagrada Familia Llobregat Antoni Gaudí Madrid Paris Catalan Paris Catalan 34
35 STEP by STEP (Named Entity detection) VIII Select representative NE using different thresholds Extracted NE Sorted by PMI Sagrada Familia Llobregat Antoni Gaudí Madrid Representative NE Sagrada Familia Llobregat Antoni Gaudí Madrid Paris Catalan Paris Catalan 35
36 STEP by STEP (Named Entity detection) VIII Select representative NE using different thresholds Extracted NE Sorted by PMI Sagrada Familia Llobregat Antoni Gaudí Madrid Representative NE Sagrada Familia Llobregat Antoni Gaudí Madrid Paris Catalan Paris Catalan 36
37 STEP by STEP (Discovering entity-subsumer concept) I It is needed a way to go from the instance level to the conceptual level in an unsupervised domain independent NE and subsumer concepts are related by means of taxonomic relationships Approaches: Document-based notion of term subsumption Semantic similarity according to the shared context =>Both cases require a considerable amount of document and linguistic parsing Linguistic patterns => offer a relatively high precision but suffer a low recall due to the fact that explicit linguistic patterns are rare in corpora 37
38 STEP by STEP (Discovering entity-subsumer concept) II Solution Exploiting the web in order to increase the corpus Hearst Patterns: Used to acquire hyponymy/hypernym relations from unrestricted text. NN such as NP (cities such as Tortosa) such NN as NP (such cities as Tarragona) NP or other NN (London or other cities) NP and other NN (Barcelona and other cities) NN incluiding NP (locations including Reus) NN especially NP (gothic cathedrals especially Sagrada Familia) 38
39 STEP by STEP (Discovering entity-subsumer concept) III Six queries are constructed for each NE Returned snippets for each query are analysed using a Natural Language Parser in order to extract the taxonomical relationships 39
40 STEP by STEP (Discovering entity-subsumer concept) IV Interpretation of snippets (query such as London ) a big city such as London [NP a/vbz big/jj city/nn] [PP such/pdt] [NP as/nns London/ NNP] travel topics such as London sightseeing [NP travel/nns topics/nns] [NP such/pdt] [NP as/nns London/ NNP museum/nn] 40
41 STEP by STEP (Semantic annotation) I Consist in the annotation of Named Entities with ontological classes. Approaches: Web-based statistical evaluation => Considerable amount of queries Semantically unstructured annotations covering heterogeneous domains which are hard to exploit (Barcelona is a Metropolis, Madrid is a city) Direct matching between subsumer concept and ontology class => If the subsumer concept does not appear in the ontology but their meaning is similar than one of the classes it is not annotated. Direct matching + statistics web + WordNet 41
42 STEP by STEP (Semantic annotation) II Direct Matching A stemming algorithms is applied to subsumer concepts and ontology classes in order to discover morphologically equivalent terms. e.g., city and cities Subsumer concepts are looked up in the ontology. if there is any result and it is possible to reduce the concept, it is performed and the process is repeated e.g., big city -> city Chose the best proposed annotation 42
43 STEP by STEP (Semantic annotation) III Semantic Matching For each candidate get synonyms, hyponyms and hypernims using WordNet (if there are more than 1 synset, the context is used to resolve the problem of semantic disambiguation) Candidate: church => {abbey, basilica, cathedral, kirk, place of worship, house of prayer} To perform direct matching using the new extracted subsumer candidates To choose the best annotation 43
44 STEP by STEP (Semantic annotation) IV Semantic disambiguation In most of cases one word could have different meanings (Polysemy) e.g., {head-> part of the body, head-> Geographic accident, etc.} When WordNet has to be used to get synonyms from one concept, it is necessary to know which WordNet synset is the most appropriate. 44
45 STEP by STEP (Semantic annotation) V To resolve this problem it is used the context from where the candidate has been extracted. e.g.: - Document: - Named Entity: Darren Aronofsky - Subsumer candidate: producer - Context: Filmmaker Darren Aronofsky commented, "I walked out of The Matrix [...] and I was thinking, 'What kind of science fiction movie can people make now? - WordNet Synsets: o S: (n) manufacturer, producer (someone who manufactures something) o S: (n) producer (someone who finds financing for and supervises the making and presentation of a show (play or film or program or similar work) o S: (n) producer (something that produces) "Maine is a leading producer of potatoes"; "this microorganism is a producer of disease" 45
46 STEP by STEP (Semantic annotation) VI Then the context is compared with each synset using cosine distance similarity measure. Context is not enough in order to decide which synset is better and, for this reason, it is increased by means of web snippets. Query: The Matrix Aron Aronofsky Snippets: [0]: Weeks before shooting his second movie, Requiem for a Dream, Darren Aronofsky took the film's star, Jared Leto, to see The Matrix at a Brooklyn mall. [1]: UPCOMING FILM PROJECTS! There's been rumors for a while now that "The Matrix" trilogy filmmakers Andy and Lana Wachowski have been developing a secret [N] 46
47 STEP by STEP (Semantic annotation) VII Finally, the synset with higher punctuation is selected (0.084) S: (n) producer (someone who finds financing for and supervises the making and presentation of a show (play or film or program or similar work) And the synonyms, hypernyms and hyponyms are extracted. Producer => {film maker, filmmaker, film producer, movie maker, theatrical producer} 47
48 STEP by STEP (Semantic annotation) VII Choose the best annotation Compare the proposed annotations by pairs Between father-child relationships, child is chosen Other relationships are solved using statistics web (PMI) 48
49 STEP by STEP (Semantic annotation) VIII monument cathedral church religious building monument X PMI -> Monument PMI -> Monument PMI -> monument cathedral PMI -> Monument X PMI -> cathedral church PMI -> Monument PMI -> cathedral X SuperClass -> cathedral Superclass -> church religious bulding PMI -> Monument Superclass -> cathedral Superclass -> church X Subsumer concept Hits(Sagrada familia) Hits(Cand) Hits(SgF^Cand) PMI monument ,50E-08 cathedral ,43E-08 church ,57E-08 religious bulding ,17E-12 49
50 50
Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence 26 (2013) 1092 1106 Contents lists available at SciVerse ScienceDirect Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai
More informationInternal project report T3.1 Damask Ontology
TIN2009-11005 DAMASK Data-Mining Algorithms with Semantic Knowledge PROYECTO DE INVESTIGACIÓN PROGRAMA NACIONAL DE INVESTIGACIÓN FUNDAMENTAL, PLAN NACIONAL DE I+D+i 2008-2011 ÁREA TEMÁTICA DE GESTIÓN:
More informationA Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet
A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch
More informationText Mining for Software Engineering
Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software
More informationText Mining. Munawar, PhD. Text Mining - Munawar, PhD
10 Text Mining Munawar, PhD Definition Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT).[1] A process of identifying novel information from a collection
More informationTourism applications of Artificial Intelligence techniques. Dr. Antonio Moreno, ITAKA research group, URV
Tourism applications of Artificial Intelligence techniques Dr. Antonio Moreno, ITAKA research group, URV ITAKA Basic research lines Multi-agent systems Ontology Learning Information Extraction Automated
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationKnowledge Engineering with Semantic Web Technologies
This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning
More informationText Mining: A Burgeoning technology for knowledge extraction
Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.
More informationMEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI
MEASURING SEMANTIC SIMILARITY BETWEEN WORDS AND IMPROVING WORD SIMILARITY BY AUGUMENTING PMI 1 KAMATCHI.M, 2 SUNDARAM.N 1 M.E, CSE, MahaBarathi Engineering College Chinnasalem-606201, 2 Assistant Professor,
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationNatural Language Processing with PoolParty
Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Hierarchical Non-hierarchical Evaluation 1 Hierarchical Non-hierarchical Evaluation The Concept of, proximity, affinity, distance, difference, divergence We use
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationRandom Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li
Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled
More informationMIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion
MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume
More informationUsing the Web as a Corpus. in Natural Language Processing
Using the Web as a Corpus in Natural Language Processing Malvina Nissim Laboratory for Applied Ontology ISTC-CNR, Roma nissim@loa-cnr.it Johan Bos Dipartimento di Informatica Università La Sapienza, Roma
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationScienceDirect. Enhanced Associative Classification of XML Documents Supported by Semantic Concepts
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 194 201 International Conference on Information and Communication Technologies (ICICT 2014) Enhanced Associative
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationLimitations of XPath & XQuery in an Environment with Diverse Schemes
Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML-Data Martin Theobald, Ralf Schenkel, and Gerhard Weikum Saarland University Saarbrücken, Germany 23.06.2003
More informationIt s time for a semantic engine!
It s time for a semantic engine! Ido Dagan Bar-Ilan University, Israel 1 Semantic Knowledge is not the goal it s a primary mean to achieve semantic inference! Knowledge design should be derived from its
More informationInternational ejournals
Available online at www.internationalejournals.com International ejournals ISSN 0976 1411 International ejournal of Mathematics and Engineering 112 (2011) 1023-1029 ANALYZING THE REQUIREMENTS FOR TEXT
More informationInformation Extraction Techniques in Terrorism Surveillance
Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism
More informationKnowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot
Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally
More informationAutomatic Construction of WordNets by Using Machine Translation and Language Modeling
Automatic Construction of WordNets by Using Machine Translation and Language Modeling Martin Saveski, Igor Trajkovski Information Society Language Technologies Ljubljana 2010 1 Outline WordNet Motivation
More informationError annotation in adjective noun (AN) combinations
Error annotation in adjective noun (AN) combinations This document describes the annotation scheme devised for annotating errors in AN combinations and explains how the inter-annotator agreement has been
More informationNamed Entity Detection and Entity Linking in the Context of Semantic Web
[1/52] Concordia Seminar - December 2012 Named Entity Detection and in the Context of Semantic Web Exploring the ambiguity question. Eric Charton, Ph.D. [2/52] Concordia Seminar - December 2012 Challenge
More informationWeb-scale taxonomy learning
Web-scale taxonomy learning David Sánchez DAVID.SANCHEZ@URV.NET Antonio Moreno AMORENO.MORENO@URV.NET Dep. Computer Science and Mathematics, University Rovira i Virgili, Av. Països Catalans, 24. 43007
More informationQuestion Answering Systems
Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction
More informationA DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT ABSTRACT Tahar Guerram and Nacima Mellal Departement of Mathematics and Computer Science, University Larbi Ben M hidi of Oum El Bouaghi -
More informationSemantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.
Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...
More informationText Mining. Representation of Text Documents
Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationSEMANTIC WEB POWERED PORTAL INFRASTRUCTURE
SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National
More information<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany
Information Systems & University of Koblenz Landau, Germany Semantic Search examples: Swoogle and Watson Steffen Staad credit: Tim Finin (swoogle), Mathieu d Aquin (watson) and their groups 2009-07-17
More informationOPEN INFORMATION EXTRACTION FROM THE WEB. Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni
OPEN INFORMATION EXTRACTION FROM THE WEB Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni Call for a Shake Up in Search! Question Answering rather than indexed key
More informationAutomatically Annotating Text with Linked Open Data
Automatically Annotating Text with Linked Open Data Delia Rusu, Blaž Fortuna, Dunja Mladenić Jožef Stefan Institute Motivation: Annotating Text with LOD Open Cyc DBpedia WordNet Overview Related work Algorithms
More informationMSc Advanced Computer Science School of Computer Science The University of Manchester
PROGRESS REPORT Ontology-Based Technical Document Retrieval System Ruvin Yusubov Supervisor: Professor Ulrike Sattler MSc Advanced Computer Science School of Computer Science The University of Manchester
More informationMaximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009
Maximizing the Value of STM Content through Semantic Enrichment Frank Stumpf December 1, 2009 What is Semantics and Semantic Processing? Content Knowledge Framework Technology Framework Search Text Images
More informationA Linguistic Approach for Semantic Web Service Discovery
A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam
More informationMotivating Ontology-Driven Information Extraction
Motivating Ontology-Driven Information Extraction Burcu Yildiz 1 and Silvia Miksch 1, 2 1 Institute for Software Engineering and Interactive Systems, Vienna University of Technology, Vienna, Austria {yildiz,silvia}@
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationA Comprehensive Analysis of using Semantic Information in Text Categorization
A Comprehensive Analysis of using Semantic Information in Text Categorization Kerem Çelik Department of Computer Engineering Boğaziçi University Istanbul, Turkey celikerem@gmail.com Tunga Güngör Department
More informationSemantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman
Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman Abstract We intend to show that leveraging semantic features can improve precision and recall of query results in information
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationCOMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE
COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful
More informationTagonto. Tagonto Project is an attempt of nearing two far worlds Tag based systems. Almost completely unstructured and semantically empty
Tagonto is an attempt of nearing two far worlds Tag based systems Almost completely unstructured and semantically empty Ontologies Strongly structured and semantically significant Taking the best of both
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationQuestion Answering Approach Using a WordNet-based Answer Type Taxonomy
Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering
More information3 Publishing Technique
Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach
More informationAn Improving for Ranking Ontologies Based on the Structure and Semantics
An Improving for Ranking Ontologies Based on the Structure and Semantics S.Anusuya, K.Muthukumaran K.S.R College of Engineering Abstract Ontology specifies the concepts of a domain and their semantic relationships.
More informationAn Approach To Web Content Mining
An Approach To Web Content Mining Nita Patil, Chhaya Das, Shreya Patanakar, Kshitija Pol Department of Computer Engg. Datta Meghe College of Engineering, Airoli, Navi Mumbai Abstract-With the research
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More information0.1 Knowledge Organization Systems for Semantic Web
0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization
More information* Overview. Ontology-Guided Information Extraction from Pathology Reports The SWPatho Project David Schlangen Universität Potsdam
Overview Background of project The task The system Digression: gently machine aided ontology construction Evaluation Future Work -Guided Information Extraction from Pathology Reports The SWPatho Project
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationCross-Lingual Word Sense Disambiguation
Cross-Lingual Word Sense Disambiguation Priyank Jaini Ankit Agrawal pjaini@iitk.ac.in ankitag@iitk.ac.in Department of Mathematics and Statistics Department of Mathematics and Statistics.. Mentor: Prof.
More informationUsing ART2 Neural Network and Bayesian Network for Automating the Ontology Constructing Process
Available online at www.sciencedirect.com Procedia Engineering 29 (2012) 3914 3923 2012 International Workshop on Information and Electronics Engineering (IWIEE) Using ART2 Neural Network and Bayesian
More informationINTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca
INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA Ernesto William De Luca Overview 2 Motivation EuroWordNet RDF/OWL EuroWordNet RDF/OWL LexiRes Tool Conclusions Overview 3 Motivation EuroWordNet
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationOntology Matching with CIDER: Evaluation Report for the OAEI 2008
Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task
More informationMultimedia Data Management M
ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA Multimedia Data Management M Second cycle degree programme (LM) in Computer Engineering University of Bologna Semantic Multimedia Data Annotation Home page:
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationModule 3: GATE and Social Media. Part 4. Named entities
Module 3: GATE and Social Media Part 4. Named entities The 1995-2018 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs Licence Named Entity Recognition Texts frequently
More informationOntology-Based Information Extraction
Ontology-Based Information Extraction Daya C. Wimalasuriya Towards Partial Completion of the Comprehensive Area Exam Department of Computer and Information Science University of Oregon Committee: Dr. Dejing
More informationA Semantic Role Repository Linking FrameNet and WordNet
A Semantic Role Repository Linking FrameNet and WordNet Volha Bryl, Irina Sergienya, Sara Tonelli, Claudio Giuliano {bryl,sergienya,satonelli,giuliano}@fbk.eu Fondazione Bruno Kessler, Trento, Italy Abstract
More informationText mining tools for semantically enriching the scientific literature
Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the
More informationDIT - University of Trento Concept Search: Semantics Enabled Information Retrieval
PhD Dissertation International Doctorate School in Information and Communication Technologies DIT - University of Trento Concept Search: Semantics Enabled Information Retrieval Uladzimir Kharkevich Advisor:
More informationWikiOnto: A System For Semi-automatic Extraction And Modeling Of Ontologies Using Wikipedia XML Corpus
2009 IEEE International Conference on Semantic Computing WikiOnto: A System For Semi-automatic Extraction And Modeling Of Ontologies Using Wikipedia XML Corpus Lalindra De Silva University of Colombo School
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationAlthough it s far from fully deployed, Semantic Heterogeneity Issues on the Web. Semantic Web
Semantic Web Semantic Heterogeneity Issues on the Web To operate effectively, the Semantic Web must be able to make explicit the semantics of Web resources via ontologies, which software agents use to
More informationRPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???
@ INSIDE DEEPQA Managing complex unstructured data with UIMA Simon Ellis INTRODUCTION 22 nd November, 2013 WAT SON TECHNOLOGIES AND OPEN ARCHIT ECT URE QUEST ION ANSWERING PROFESSOR JIM HENDLER S IMON
More informationA fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP
A fully-automatic approach to answer geographic queries: at GikiP Johannes Leveling Sven Hartrumpf Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen)
More informationChapter 6. Queries and Interfaces
Chapter 6 Queries and Interfaces Keyword Queries Simple, natural language queries were designed to enable everyone to search Current search engines do not perform well (in general) with natural language
More informationPapers for comprehensive viva-voce
Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India
More informationWordNet-based User Profiles for Semantic Personalization
PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM
More informationUsing NLP and context for improved search result in specialized search engines
Mälardalen University School of Innovation Design and Engineering Västerås, Sweden Thesis for the Degree of Bachelor of Science in Computer Science DVA331 Using NLP and context for improved search result
More informationUnderstanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22
Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task Junjun Wang 2013/4/22 Outline Introduction Related Word System Overview Subtopic Candidate Mining Subtopic Ranking Results and Discussion
More informationA MODEL-DRIVEN APPROACH OF ONTOLOGICAL COMPONENTS FOR ON- LINE SEMANTIC WEB INFORMATION RETRIEVAL
Journal of Web Engineering, Vol. 6, No.4 (2007) 303-329 Rinton Press A MODEL-DRIVEN APPROACH OF ONTOLOGICAL COMPONENTS FOR ON- LINE SEMANTIC WEB INFORMATION RETRIEVAL HAJER BAAZAOUI ZGHAL 1, MARIE-AUDE
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Intelligent Information Retrieval 1. Relevance feedback - Direct feedback - Pseudo feedback 2. Query expansion
More informationOntology Based Search Engine
Ontology Based Search Engine K.Suriya Prakash / P.Saravana kumar Lecturer / HOD / Assistant Professor Hindustan Institute of Engineering Technology Polytechnic College, Padappai, Chennai, TamilNadu, India
More informationCollective Intelligence in Action
Collective Intelligence in Action SATNAM ALAG II MANNING Greenwich (74 w. long.) contents foreword xv preface xvii acknowledgments xix about this book xxi PART 1 GATHERING DATA FOR INTELLIGENCE 1 "1 Understanding
More informationOntology Population and Enrichment: State of the Art
Ontology Population and Enrichment: State of the Art Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras, Anastasia Krithara, and Elias Zavitsanos Institute of Informatics and Telecommunications,
More informationCS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University
CS377: Database Systems Text data and information retrieval Li Xiong Department of Mathematics and Computer Science Emory University Outline Information Retrieval (IR) Concepts Text Preprocessing Inverted
More informationReal-time population of Knowledge Bases: Opportunities and Challenges. Ndapa Nakashole Gerhard Weikum
Real-time population of Knowledge Bases: Opportunities and Challenges Ndapa Nakashole Gerhard Weikum AKBC Workshop at NAACL 2012 Real-time Data Sources In news and social media, the implicit query is:
More informationDomain-Specific. Languages. Martin Fowler. AAddison-Wesley. Sydney Tokyo. With Rebecca Parsons
Domain-Specific Languages Martin Fowler With Rebecca Parsons AAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Sydney Tokyo Singapore
More informationText Classification and Clustering Using Kernels for Structured Data
Text Mining SVM Conclusion Text Classification and Clustering Using, pgeibel@uos.de DGFS Institut für Kognitionswissenschaft Universität Osnabrück February 2005 Outline Text Mining SVM Conclusion 1 Text
More informationCollaborative editing of knowledge resources for cross-lingual text mining
UNIVERSITÀ DI PISA Scuola di Dottorato in Ingegneria Leonardo da Vinci Corso di Dottorato di Ricerca in INGEGNERIA DELL INFORMAZIONE Tesi di Dottorato di Ricerca Collaborative editing of knowledge resources
More informationWeb Mining Evolution & Comparative Study with Data Mining
Web Mining Evolution & Comparative Study with Data Mining Anu, Assistant Professor (Resource Person) University Institute of Engineering and Technology Mahrishi Dayanand University Rohtak-124001, India
More informationSemantic Web Technologies Trends and Research in Ontology-based Systems
Semantic Web Technologies Trends and Research in Ontology-based Systems John Davies BT, UK Rudi Studer University of Karlsruhe, Germany Paul Warren BT, UK John Wiley & Sons, Ltd Contents Foreword xi 1.
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationIterative Learning of Relation Patterns for Market Analysis with UIMA
UIMA Workshop, GLDV, Tübingen, 09.04.2007 Iterative Learning of Relation Patterns for Market Analysis with UIMA Sebastian Blohm, Jürgen Umbrich, Philipp Cimiano, York Sure Universität Karlsruhe (TH), Institut
More information