Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018
|
|
- Myles McCormick
- 5 years ago
- Views:
Transcription
1 Building Multilingual Resources and Neural Models for Word Sense Disambiguation Alessandro Raganato March 15th, 2018
2 About me ERC project Multijedi ERC project Fotran Sapienza - University of Rome University of Helsinki prof. Roberto Navigli prof. Jörg Tiedemann 1
3 Slides from the Luxembourg BabelNet Workshop 2016 (
4 Slides from the Luxembourg BabelNet Workshop 2016 (
5 To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 16M entries in 284 languages and 1.3B semantic connections) Slides from the Luxembourg BabelNet Workshop 2016 (
6 To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 16M entries in 284 languages and 1.3B semantic connections) Initially created as an integration of Wikipedia and WordNet, now BabelNet is a merger of many different resources (Wiktionary, Wikidata, OmegaWiki, VerbNet, ImageNet, ) Slides from the Luxembourg BabelNet Workshop 2016 (
7 Slides from the Luxembourg BabelNet Workshop 2016 (
8 EN - Squash: a game played in a walled court with soft rubber balls and bats like tennis rackets Slides from the Luxembourg BabelNet Workshop 2016 (
9 EN - Squash: a game played in a walled court with soft rubber balls and bats like tennis rackets IT - Squash: una partita giocata in un campo recintato con palle di gomma morbida e pipistrelli come racchette da tennis Slides from the Luxembourg BabelNet Workshop 2016 (
10 Word Sense Disambiguation Language is ambiguous: Dave Grohl played bass in Rock Supergroup Teenage Time Killers. Word Sense Disambiguation (WSD) is the task of computationally determining which sense of a word is used in a particular context.
11 International Workshops on Semantic Evaluation Many evaluation datasets have been constructed for the task: Senseval 2 (2001) Senseval 3 (2004) SemEval 2007 SemEval 2013 SemEval 2015 Training Data: SemCor, a manually sense-annotated corpus OMSTI (One Million Sense-Tagged Instances), a large annotated corpus, automatically constructed
12 Building a Unified Evaluation Framework Our goal: build a unified framework for all-words WSD (training and testing) use this evaluation framework to perform a fair quantitative and qualitative empirical comparison
13 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD
14 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD
15 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD +0.4 (OMSTI)
16 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent
17 Challenges of supervised WSD Can we model disambiguation at the sequence level using a single all-words model? Can we develop more flexible models that are still able to retain state-of-the-art accuracy (e.g. can work with multiple languages)? No engineered features
18 WSD as (neural) sequence labeling 19
19 WSD as (neural) sequence labeling Attentive augmentation: with attention weights context vector 19
20 WSD as (neural) translation from English to English + word senses 20
21 WSD via multitask learning 21
22 WSD via multitask learning Aux. Task #1 (LEX): Coarse-grained semantic labels from WordNet lexicographer files Aux. Task #2 (POS): Universal parts of speech 21
23 WSD via multitask learning Aux. Task #2 (POS): Universal parts of speech Main loss LWSD(yi, yi*) + LPOS(POSi, POSi*) + LLEX(LEXi, LEXi*) Shared layers Aux. Task #1 (LEX): Coarse-grained semantic labels from WordNet lexicographer files Auxiliary losses 21
24 Experiments F1-score (%) F1-score (%)
25 Experiments F1-score (%) F1-score (%)
26 Zero-shot multilingual WSD...
27 Experiments: multilingual WSD F1-score (%)
28 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent
29 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent Lack of reliable sense-annotated data to train large-scale models (manually annotating word senses becomes quickly unfeasible), possibly in multiple languages
30
31 Problem: Wikipedia is designed for humans! Only a fraction of linkable mentions is in fact hyperlinked: * 580M noun tokens, only 116M covered ( 19%) Partly due to the Wikipedia style guidelines: Link each concept at most once within a page Link only when relevant and helpful in the context * English dump 11/2014 2
32
33 potentially linkable mentions!
34 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible 3
35 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible How? The existing Wikipedia hyperlink structure: Page-to-page direct connections (Wikilinks) Connections between pages and Wikipedia categories The multilingual sense inventory and semantic network of BabelNet, a merger of many different resources (including, e.g., WordNet and Wikipedia itself) with 14M entries and 380M semantic connections 3
36 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible How? The existing Wikipedia hyperlink structure: Page-to-page direct connections (Wikilinks) Connections between pages and Wikipedia categories The multilingual sense inventory of BabelNet: merger of many different resources (including WordNet and Wikipedia itself) 14M entries and 380M semantic connections 3
37 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement 4
38 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Tokenization Part-of-speech tagging Lemmatization Filtering of uninformative pages Refinement 4
39 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement Cascade of propagation heuristics, which p collect a list S of hyperlinks to be propagated across p scan the text of p to match any potential lexicalization (one-sense-per-page assumption) 4
40 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement Conservative policy to remove duplicates and overlapping mentions 4
41 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view 5
42 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view Intra-page heuristics: Propagate links that occur as mentions within p Surface Mention Propagation Lemmatized Mention Propagation Person Mention Propagation 5
43 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view Intra-page heuristics: Inter-page heuristics: Propagate links that occur as mentions within p Surface Mention Propagation Lemmatized Mention Propagation Person Mention Propagation Exploit the connection of p with other pages or categories Wikipedia Inlink Propagation BabelNet Inlink Propagation Category Propagation Monosemous Content Word 5
44 30 Original links
45 Lorenzo de Medici Lorenzo poets Lorenzo art Florence Italian Renaissance Lorenzo de Medici 30 Original links 21 new intra-page links Lorenzo Florence Republic of Lorenzo Lorenzo Lorenzo poets Lorenzo philosopher Piero philosophers Lucrezia Medici bishop
46 Lorenzo de Medici 1 January Lorenzo poets artists perhaps 9 April Lorenzo statesman Italian Renaissance patron Florentines art scholars artists Florence death death Lorenzo de Medici 30 Original links 21 new intra-page links 31 new inter-page links Lorenzo Florence arts men portion Europe government Florentine Lorenzo Lorenzo poets Lorenzo philosopher Piero philosophers Lucrezia Medici Academy diplomat bishop chiefly Lorenzo son Republic of patron art patron deaths
47 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view 7
48 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view Total number of sense annotations 300M 250M 225M 206M 150M 75M 71M 40M 1.3M WIKI SEW SEW Wikilinks MUN before after refinement refinement [1] [2] [1] S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia. Technical Report UM-CS [2] K. Taghipour and H. Ng. One million sense-tagged instances for word sense disambiguation and induction. CoNLL,
49 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view Total number of sense annotations Sense annotation by type 300M Original Intra-page Inter-page 250M 225M 206M 150M 75M 71M 40M 1.3M WIKI SEW SEW Wikilinks MUN before after refinement refinement [1] [2] 7
50 A Semantically Enriched Wikipedia (SEW) Experiments 8
51 A Semantically Enriched Wikipedia (SEW) Experiments Intrinsic Evaluation - Annotation Quality We compared our sense annotations against those discovered by 3W (Noraset et al. 2014), a Wikipedia-specific system designed to add automatically high-precision hyperlinks Extrinsic Evaluation - Entity Linking and Semantic Similarity We used SEW both as training set for Entity Linking, and as semantic network to develop Wikipedia-based vector representations for Semantic Similarity 8
52 A Semantically Enriched Wikipedia (SEW) Experiments Intrinsic Evaluation - Annotation Quality We compared our sense annotations against those discovered by 3W (Noraset et al. 2014), a Wikipedia-specific system designed to add automatically high-precision hyperlinks Extrinsic Evaluation - Entity Linking We used SEW as training set for Entity Linking 9
53 A Semantically Enriched Wikipedia (SEW) Intrinsic Evaluation Hand-labeled evaluation set of 2000 Wikipedia pages (Noraset et al. 2014): SEW 3W T. Noraset, C. Bhagavatula, and D. Downey. Adding high-precision links to Wikipedia. EMNLP, Precision Recall F1 10
54 A Semantically Enriched Wikipedia (SEW) Extrinsic Evaluation #1 - Entity Linking Benchmark system: IMS (Zhong and Ng 2010), a state-of-the-art supervised system for Word Sense Disambiguation in English based on SVMs IMS + SEW IMS + HL MFS IMS trained on SEW IMS trained only on the original hyperlinks (baseline #1) Most Frequent Sense provided by BabelNet (baseline #2) Datasets: SemEval 2013, task 12 SemEval 2015, task 13 MSNBC AIDA-CoNLL 11
55 A Semantically Enriched Wikipedia (SEW) Extrinsic Evaluation #1 - Entity Linking 12
56 Conclusion Built a unified evaluation framework for all-words Word Sense Disambiguation, including standardized training and testing data. (Raganato et al. EACL 2017) Framed the WSD task as sequence labelling problem, presenting several neural sequence learning models, showing that, for the first time in WSD, a model trained on a given language is able to seamlessly handle a different language at testing time. (Raganato et al. EMNLP 2017) Built one of the largest available collection of sense annotated corpora with high-quality annotations. (Raganato et al. IJCAI 2016, Camacho-Collados et al. LREC 2016, Delli Bovi et al. ACL 2017)
57 Thank you!
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationThe Luxembourg BabelNet Workshop
The Luxembourg BabelNet Workshop 2 March 2016: Session 3 Tech session Disambiguating text with Babelfy. The Babelfy API Claudio Delli Bovi Outline Multilingual disambiguation with Babelfy Using Babelfy
More informationLanguage Resources and Linked Data (EKAW 2014, Linköping, Sweden)
Language Resources and Linked Data (EKAW 2014, Linköping, Sweden) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano Flati Sapienza 18/11/2014
More informationThe Luxembourg BabelNet Workshop
The Luxembourg BabelNet Workshop 2 March 2016: Session 2 Tech session Downloading and installing BabelNet The BabelNet API Claudio Delli Bovi About me Claudio Delli Bovi dellibovi@di.uniroma1.it bn:17381128n
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationSemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses
SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses David Jurgens Dipartimento di Informatica Sapienza Universita di Roma jurgens@di.uniroma1.it Ioannis Klapaftis Search Technology
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationUsing the Multilingual Central Repository for Graph-Based Word Sense Disambiguation
Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Eneko Agirre, Aitor Soroa IXA NLP Group University of Basque Country Donostia, Basque Contry a.soroa@ehu.es Abstract
More informationKnowledge-based Word Sense Disambiguation using Topic Models
Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot, Ruslan Salakhutdinov {chaplot,rsalakhu}@cs.cmu.edu Machine Learning Department School of Computer Science Carnegie Mellon
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationPapers for comprehensive viva-voce
Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India
More informationA Quick Tour of BabelNet 1.1
A Quick Tour of BabelNet 1.1 Roberto Navigli Dipartimento di Informatica Sapienza Università diroma Viale Regina Elena, 295 Roma, Italy navigli@di.uniroma1.it http://lcl.uniroma1.it Abstract. In this paper
More informationKnowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot
Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally
More informationBabelDomains: Large-Scale Domain Labeling of Lexical Resources
BabelDomains: Large-Scale Domain Labeling of Lexical Resources Jose Camacho-Collados and Roberto Navigli Department of Computer Science Sapienza University of Rome {collados,navigli}@di.uniroma1.it Abstract
More informationThe Multilingual Language Library
The Multilingual Language Library @ LREC 2012 Let s build it together! Nicoletta Calzolari with Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale
More informationRandom Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li
Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled
More informationAutomatically Annotating Text with Linked Open Data
Automatically Annotating Text with Linked Open Data Delia Rusu, Blaž Fortuna, Dunja Mladenić Jožef Stefan Institute Motivation: Annotating Text with LOD Open Cyc DBpedia WordNet Overview Related work Algorithms
More informationCross-Lingual Word Sense Disambiguation
Cross-Lingual Word Sense Disambiguation Priyank Jaini Ankit Agrawal pjaini@iitk.ac.in ankitag@iitk.ac.in Department of Mathematics and Statistics Department of Mathematics and Statistics.. Mentor: Prof.
More informationBabelplagiarism: what can BabelNet do for crosslanguage plagiarism detection? Roberto Navigli
Babelplagiarism: what can BabelNet do for crosslanguage plagiarism detection? Joint work with Simone Ponzetto Mirella Lapata Andrea Moro 2 Outline Motivation: the knowledge acquisition bottleneck BabelNet:
More informationMRD-based Word Sense Disambiguation: Extensions and Applications
MRD-based Word Sense Disambiguation: Extensions and Applications Timothy Baldwin Joint Work with F. Bond, S. Fujita, T. Tanaka, Willy and S.N. Kim 1 MRD-based Word Sense Disambiguation: Extensions and
More informationBabelNet and! Word Sense Disambiguation
BabelNet and! Word Sense Disambiguation Overview: Original BabelNet BabelNet 2.5 ( today ) Extrinsic Evaluations (SemEval-2007 T#16, SemEval-2007 T#7) SemEval-2010 T#3, 2013 Next episode (preview): Babelfy
More informationWordNet-based User Profiles for Semantic Personalization
PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM
More informationAn Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann, Simone Paolo Ponzetto Data and Web
More informationDAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation
DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation Steve L. Manion University of Canterbury Christchurch, New Zealand steve.manion @pg.canterbury.ac.nz Raazesh Sainudiin University
More informationA CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012
A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of
More informationThe Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation
The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/dppdemo/index.html Dictionary Parsing Project Purpose: to
More informationDBpedia Spotlight at the MSM2013 Challenge
DBpedia Spotlight at the MSM2013 Challenge Pablo N. Mendes 1, Dirk Weissenborn 2, and Chris Hokamp 3 1 Kno.e.sis Center, CSE Dept., Wright State University 2 Dept. of Comp. Sci., Dresden Univ. of Tech.
More informationNLP Final Project Fall 2015, Due Friday, December 18
NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,
More informationAnnotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationMulti-Modal Word Synset Induction. Jesse Thomason and Raymond Mooney University of Texas at Austin
Multi-Modal Word Synset Induction Jesse Thomason and Raymond Mooney University of Texas at Austin Word Synset Induction kiwi Word Synset Induction chinese grapefruit kiwi kiwi vine Word Synset Induction
More informationSentiment Analysis using Weighted Emoticons and SentiWordNet for Indonesian Language
Sentiment Analysis using Weighted Emoticons and SentiWordNet for Indonesian Language Nur Maulidiah Elfajr, Riyananto Sarno Department of Informatics, Faculty of Information and Communication Technology
More informationAutomatic Word Sense Disambiguation Using Wikipedia
Automatic Word Sense Disambiguation Using Wikipedia Sivakumar J *, Anthoniraj A ** School of Computing Science and Engineering, VIT University Vellore-632014, TamilNadu, India * jpsivas@gmail.com ** aanthoniraja@gmail.com
More informationA Multilingual Social Media Linguistic Corpus
A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th
More informationMulti-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities
Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities Dirk Weissenborn, Leonhard Hennig, Feiyu Xu and Hans Uszkoreit Language Technology Lab, DFKI Alt-Moabit 91c Berlin,
More informationMultilinguality at Your Fingertips: BabelNet, Babelfy and Beyond! Roberto Navigli
Multilinguality at Your Fingertips: http://lcl.uniroma1.it ERC Starting Grant n. 259234 LIDER CSA n. 610782 Moscow, 28 th May 2015 Tiziano Flati 23/06/2015 Daniele Vannella Andrea Moro Taher Pilehvar Francesco
More informationAntonio Fernández Orquín, Andrés Montoyo, Rafael Muñoz
UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation Yoan Gutiérrez, Yenier Castañeda, Andy González,
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationTechreport for GERBIL V1
Techreport for GERBIL 1.2.2 - V1 Michael Röder, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo February 21, 2016 Current Development of GERBIL Recently, we released the latest version 1.2.2 of GERBIL [16] 1.
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More informationImprovement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation
Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336
More informationSwinburne Research Bank
Swinburne Research Bank http://researchbank.swinburne.edu.au Hu, S., & Liu, C. (2011). Incorporating coreference resolution into word sense disambiguation. Originally published A. Gelbukh (eds.). Proceedings
More informationA Korean Knowledge Extraction System for Enriching a KBox
A Korean Knowledge Extraction System for Enriching a KBox Sangha Nam, Eun-kyung Kim, Jiho Kim, Yoosung Jung, Kijong Han, Key-Sun Choi KAIST / The Republic of Korea {nam.sangha, kekeeo, hogajiho, wjd1004109,
More informationSupervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example
Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Many machine learning models for coreference resolution have been created, using not only different feature sets but also fundamentally
More informationTools for Annotating and Searching Corpora Practical Session 1: Annotating
Tools for Annotating and Searching Corpora Practical Session 1: Annotating Stefanie Dipper Institute of Linguistics Ruhr-University Bochum Corpus Linguistics Fest (CLiF) June 6-10, 2016 Indiana University,
More informationManaging a Multilingual Treebank Project
Managing a Multilingual Treebank Project Milan Souček Timo Järvinen Adam LaMontagne Lionbridge Finland {milan.soucek,timo.jarvinen,adam.lamontagne}@lionbridge.com Abstract This paper describes the work
More informationLearning to Rank Aggregated Answers for Crossword Puzzles
Learning to Rank Aggregated Answers for Crossword Puzzles Massimo Nicosia 1,2, Gianni Barlacchi 2 and Alessandro Moschitti 1,2 1 Qatar Computing Research Institute 2 University of Trento m.nicosia@gmail.com,
More informationBuilding Instance Knowledge Network for Word Sense Disambiguation
Building Instance Knowledge Network for Word Sense Disambiguation Shangfeng Hu, Chengfei Liu Faculty of Information and Communication Technologies Swinburne University of Technology Hawthorn 3122, Victoria,
More informationPutting ontologies to work in NLP
Putting ontologies to work in NLP The lemon model and its future John P. McCrae National University of Ireland, Galway Introduction In natural language processing we are doing three main things Understanding
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationLecture 4: Unsupervised Word-sense Disambiguation
ootstrapping Lecture 4: Unsupervised Word-sense Disambiguation Lexical Semantics and Discourse Processing MPhil in dvanced Computer Science Simone Teufel Natural Language and Information Processing (NLIP)
More informationSense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm
ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using
More informationMeaning Banking and Beyond
Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis
More informationProvided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Entity Linking with Multiple Knowledge Bases: An Ontology Modularization
More informationCOMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE
COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationTulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios
Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids Marek Lipczak Arash Koushkestani Evangelos Milios Problem definition The goal of Entity Recognition and Disambiguation
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationContext Sensitive Search Engine
Context Sensitive Search Engine Remzi Düzağaç and Olcay Taner Yıldız Abstract In this paper, we use context information extracted from the documents in the collection to improve the performance of the
More informationImproving Retrieval Experience Exploiting Semantic Representation of Documents
Improving Retrieval Experience Exploiting Semantic Representation of Documents Pierpaolo Basile 1 and Annalina Caputo 1 and Anna Lisa Gentile 1 and Marco de Gemmis 1 and Pasquale Lops 1 and Giovanni Semeraro
More informationCMU System for Entity Discovery and Linking at TAC-KBP 2015
CMU System for Entity Discovery and Linking at TAC-KBP 2015 Nicolas Fauceglia, Yiu-Chang Lin, Xuezhe Ma, and Eduard Hovy Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave, Pittsburgh,
More informationNERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017
NERD workshop Luca Foppiano @ ALMAnaCH - Inria Paris Berlin, 18/09/2017 Agenda Introducing the (N)ERD service NERD REST API Usages and use cases Entities Rigid textual expressions corresponding to certain
More informationEffectiveness of Automatic Translations for Cross-Lingual Ontology Mapping
Journal of Artificial Intelligence Research 55 (2016) 165-208 Submitted 03/15; published 01/16 Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping Mamoun Abu Helou Department of
More informationEnlargement of the Croatian Wordnet using the WN-Toolkit (and preliminary results for Slovene)
Enlargement of the Croatian Wordnet using the WN-Toolkit (and preliminary results for Slovene) Antoni Oliver (aoliverg@uoc.edu) Universitat Oberta de Catalunya Overview The WN-Toolkit The Expand Model
More informationTextual Emigration Analysis
Textual Emigration Analysis Andre Blessing and Jonas Kuhn IMS - Universität Stuttgart, Germany clarin@ims.uni-stuttgart.de Abstract We present a web-based application which is called TEA (Textual Emigration
More informationWebAnno: a flexible, web-based annotation tool for CLARIN
WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike
More informationBuilding the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format
Building the Multilingual Web of Data Integrating NLP with Linked Data and RDF using the NLP Interchange Format Presenter name 1 Outline 1. Introduction 2. NIF Basics 3. NIF corpora 4. NIF tools & services
More informationUBC Entity Discovery and Linking & Diagnostic Entity Linking at TAC-KBP 2014
UBC Entity Discovery and Linking & Diagnostic Entity Linking at TAC-KBP 2014 Ander Barrena, Eneko Agirre, Aitor Soroa IXA NLP Group / University of the Basque Country, Donostia, Basque Country ander.barrena@ehu.es,
More informationSEMANTIC INDEXING (ENTITY LINKING)
Анализа текста и екстракција информација SEMANTIC INDEXING (ENTITY LINKING) Jelena Jovanović Email: jeljov@gmail.com Web: http://jelenajovanovic.net OVERVIEW Main concepts Named Entity Recognition Semantic
More informationTopics in Opinion Mining. Dr. Paul Buitelaar Data Science Institute, NUI Galway
Topics in Opinion Mining Dr. Paul Buitelaar Data Science Institute, NUI Galway Opinion: Sentiment, Emotion, Subjectivity OBJECTIVITY SUBJECTIVITY SPECULATION FACTS BELIEFS EMOTION SENTIMENT UNCERTAINTY
More informationMapping WordNet Instances to Wikipedia
Mapping WordNet Instances to Wikipedia John P. McCrae Insight Centre for Data Analytics, National University of Ireland Galway Lexical vs. Encyclopedic Yellow (in a dictionary) Is a verb, noun and adjective
More informationA graph-based method to improve WordNet Domains
A graph-based method to improve WordNet Domains Aitor González, German Rigau IXA group UPV/EHU, Donostia, Spain agonzalez278@ikasle.ehu.com german.rigau@ehu.com Mauro Castillo UTEM, Santiago de Chile,
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationINTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca
INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA Ernesto William De Luca Overview 2 Motivation EuroWordNet RDF/OWL EuroWordNet RDF/OWL LexiRes Tool Conclusions Overview 3 Motivation EuroWordNet
More informationIt s time for a semantic engine!
It s time for a semantic engine! Ido Dagan Bar-Ilan University, Israel 1 Semantic Knowledge is not the goal it s a primary mean to achieve semantic inference! Knowledge design should be derived from its
More informationBc. Pavel Taufer. Named Entity Recognition and Linking
MASTER THESIS Bc. Pavel Taufer Named Entity Recognition and Linking Institute of Formal and Applied Linguistics Supervisor of the master thesis: Study programme: Study branch: RNDr. Milan Straka, Ph.D.
More informationA Linguistic Approach for Semantic Web Service Discovery
A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam
More informationTwo graph-based algorithms for state-of-the-art WSD
Two graph-based algorithms for state-of-the-art WSD Eneko Agirre, David Martínez, Oier López de Lacalle and Aitor Soroa IXA NLP Group University of the Basque Country Donostia, Basque Contry a.soroa@si.ehu.es
More informationWatson & WMR2017. (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself)
Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself) R. BASILI A.A. 2016-17 Overview Motivations Watson Jeopardy NLU in Watson
More informationEntity Linking at Web Scale
Entity Linking at Web Scale Thomas Lin, Mausam, Oren Etzioni Computer Science & Engineering University of Washington Seattle, WA 98195, USA {tlin, mausam, etzioni}@cs.washington.edu Abstract This paper
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationLinking Thesauri and Glossaries Case Study 0: linking a fake resource Roberto Navigli
Linking Thesauri and Glossaries Case Study 0: linking a fake resource http://lcl.uniroma1.it The Luxembourg BabelNet Workshop Session 6 Session 6 The Luxembourg BabelNet Workshop [11:00-12:15, 3 March,
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationIdentifying Poorly-Defined Concepts in WordNet with Graph Metrics
Identifying Poorly-Defined Concepts in WordNet with Graph Metrics John P. McCrae and Narumol Prangnawarat Insight Centre for Data Analytics, National University of Ireland, Galway john@mccr.ae, narumol.prangnawarat@insight-centre.org
More informationInfluence of Word Normalization on Text Classification
Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we
More informationWebSAIL Wikifier at ERD 2014
WebSAIL Wikifier at ERD 2014 Thanapon Noraset, Chandra Sekhar Bhagavatula, Doug Downey Department of Electrical Engineering & Computer Science, Northwestern University {nor.thanapon, csbhagav}@u.northwestern.edu,ddowney@eecs.northwestern.edu
More informationGhent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task
Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task Lucas Sterckx, Thomas Demeester, Johannes Deleu, Chris Develder Ghent University - iminds Gaston Crommenlaan 8 Ghent, Belgium
More informationin the NTU Multilingual Corpus (NTU-MC) January 15, 2016
. Sentiment Annotation in the NTU Multilingual Corpus (NTU-MC). 2 nd Wordnet Bahasa Workshop (WBW2016) Francis Bond, Tomoko Ohkuma, Luis Morgado Da Costa, Yasuhide Miura, Rachel Chen, Takayuki Kuribayashi,
More informationPersonalized Terms Derivative
2016 International Conference on Information Technology Personalized Terms Derivative Semi-Supervised Word Root Finder Nitin Kumar Bangalore, India jhanit@gmail.com Abhishek Pradhan Bangalore, India abhishek.pradhan2008@gmail.com
More informationNew York University 2014 Knowledge Base Population Systems
New York University 2014 Knowledge Base Population Systems Thien Huu Nguyen, Yifan He, Maria Pershina, Xiang Li, Ralph Grishman Computer Science Department New York University {thien, yhe, pershina, xiangli,
More informationLinked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library
Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual
More informationTowards Domain Independent Named Entity Recognition
38 Computer Science 5 Towards Domain Independent Named Entity Recognition Fredrick Edward Kitoogo, Venansius Baryamureeba and Guy De Pauw Named entity recognition is a preprocessing tool to many natural
More informationEnhanced retrieval using semantic technologies:
Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008
More informationExploiting Conversation Structure in Unsupervised Topic Segmentation for s
Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationSentiment Analysis using Support Vector Machine based on Feature Selection and Semantic Analysis
Sentiment Analysis using Support Vector Machine based on Feature Selection and Semantic Analysis Bhumika M. Jadav M.E. Scholar, L. D. College of Engineering Ahmedabad, India Vimalkumar B. Vaghela, PhD
More informationAnnotation and Evaluation
Annotation and Evaluation Digging into Data: Jordan Boyd-Graber University of Maryland April 15, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Annotation and Evaluation April 15, 2013 1 / 21 Exam Solutions
More informationOptimized Word Sense Disambiguation in Hindi using Genetic Algorithm
Optimized Word Sense Disambiguation in Hindi using Genetic Algorithm Sabnam Kumari 1, 1 M.Tech Scholar, Department of Computer Science and Engineering, PDM College of Engineering, Bahadurgarh, Haryana
More information