Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018

Size: px
Start display at page:

Download "Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018"

Transcription

1 Building Multilingual Resources and Neural Models for Word Sense Disambiguation Alessandro Raganato March 15th, 2018

2 About me ERC project Multijedi ERC project Fotran Sapienza - University of Rome University of Helsinki prof. Roberto Navigli prof. Jörg Tiedemann 1

3 Slides from the Luxembourg BabelNet Workshop 2016 (

4 Slides from the Luxembourg BabelNet Workshop 2016 (

5 To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 16M entries in 284 languages and 1.3B semantic connections) Slides from the Luxembourg BabelNet Workshop 2016 (

6 To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 16M entries in 284 languages and 1.3B semantic connections) Initially created as an integration of Wikipedia and WordNet, now BabelNet is a merger of many different resources (Wiktionary, Wikidata, OmegaWiki, VerbNet, ImageNet, ) Slides from the Luxembourg BabelNet Workshop 2016 (

7 Slides from the Luxembourg BabelNet Workshop 2016 (

8 EN - Squash: a game played in a walled court with soft rubber balls and bats like tennis rackets Slides from the Luxembourg BabelNet Workshop 2016 (

9 EN - Squash: a game played in a walled court with soft rubber balls and bats like tennis rackets IT - Squash: una partita giocata in un campo recintato con palle di gomma morbida e pipistrelli come racchette da tennis Slides from the Luxembourg BabelNet Workshop 2016 (

10 Word Sense Disambiguation Language is ambiguous: Dave Grohl played bass in Rock Supergroup Teenage Time Killers. Word Sense Disambiguation (WSD) is the task of computationally determining which sense of a word is used in a particular context.

11 International Workshops on Semantic Evaluation Many evaluation datasets have been constructed for the task: Senseval 2 (2001) Senseval 3 (2004) SemEval 2007 SemEval 2013 SemEval 2015 Training Data: SemCor, a manually sense-annotated corpus OMSTI (One Million Sense-Tagged Instances), a large annotated corpus, automatically constructed

12 Building a Unified Evaluation Framework Our goal: build a unified framework for all-words WSD (training and testing) use this evaluation framework to perform a fair quantitative and qualitative empirical comparison

13 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD

14 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD

15 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD +0.4 (OMSTI)

16 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent

17 Challenges of supervised WSD Can we model disambiguation at the sequence level using a single all-words model? Can we develop more flexible models that are still able to retain state-of-the-art accuracy (e.g. can work with multiple languages)? No engineered features

18 WSD as (neural) sequence labeling 19

19 WSD as (neural) sequence labeling Attentive augmentation: with attention weights context vector 19

20 WSD as (neural) translation from English to English + word senses 20

21 WSD via multitask learning 21

22 WSD via multitask learning Aux. Task #1 (LEX): Coarse-grained semantic labels from WordNet lexicographer files Aux. Task #2 (POS): Universal parts of speech 21

23 WSD via multitask learning Aux. Task #2 (POS): Universal parts of speech Main loss LWSD(yi, yi*) + LPOS(POSi, POSi*) + LLEX(LEXi, LEXi*) Shared layers Aux. Task #1 (LEX): Coarse-grained semantic labels from WordNet lexicographer files Auxiliary losses 21

24 Experiments F1-score (%) F1-score (%)

25 Experiments F1-score (%) F1-score (%)

26 Zero-shot multilingual WSD...

27 Experiments: multilingual WSD F1-score (%)

28 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent

29 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent Lack of reliable sense-annotated data to train large-scale models (manually annotating word senses becomes quickly unfeasible), possibly in multiple languages

30

31 Problem: Wikipedia is designed for humans! Only a fraction of linkable mentions is in fact hyperlinked: * 580M noun tokens, only 116M covered ( 19%) Partly due to the Wikipedia style guidelines: Link each concept at most once within a page Link only when relevant and helpful in the context * English dump 11/2014 2

32

33 potentially linkable mentions!

34 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible 3

35 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible How? The existing Wikipedia hyperlink structure: Page-to-page direct connections (Wikilinks) Connections between pages and Wikipedia categories The multilingual sense inventory and semantic network of BabelNet, a merger of many different resources (including, e.g., WordNet and Wikipedia itself) with 14M entries and 380M semantic connections 3

36 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible How? The existing Wikipedia hyperlink structure: Page-to-page direct connections (Wikilinks) Connections between pages and Wikipedia categories The multilingual sense inventory of BabelNet: merger of many different resources (including WordNet and Wikipedia itself) 14M entries and 380M semantic connections 3

37 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement 4

38 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Tokenization Part-of-speech tagging Lemmatization Filtering of uninformative pages Refinement 4

39 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement Cascade of propagation heuristics, which p collect a list S of hyperlinks to be propagated across p scan the text of p to match any potential lexicalization (one-sense-per-page assumption) 4

40 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement Conservative policy to remove duplicates and overlapping mentions 4

41 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view 5

42 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view Intra-page heuristics: Propagate links that occur as mentions within p Surface Mention Propagation Lemmatized Mention Propagation Person Mention Propagation 5

43 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view Intra-page heuristics: Inter-page heuristics: Propagate links that occur as mentions within p Surface Mention Propagation Lemmatized Mention Propagation Person Mention Propagation Exploit the connection of p with other pages or categories Wikipedia Inlink Propagation BabelNet Inlink Propagation Category Propagation Monosemous Content Word 5

44 30 Original links

45 Lorenzo de Medici Lorenzo poets Lorenzo art Florence Italian Renaissance Lorenzo de Medici 30 Original links 21 new intra-page links Lorenzo Florence Republic of Lorenzo Lorenzo Lorenzo poets Lorenzo philosopher Piero philosophers Lucrezia Medici bishop

46 Lorenzo de Medici 1 January Lorenzo poets artists perhaps 9 April Lorenzo statesman Italian Renaissance patron Florentines art scholars artists Florence death death Lorenzo de Medici 30 Original links 21 new intra-page links 31 new inter-page links Lorenzo Florence arts men portion Europe government Florentine Lorenzo Lorenzo poets Lorenzo philosopher Piero philosophers Lucrezia Medici Academy diplomat bishop chiefly Lorenzo son Republic of patron art patron deaths

47 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view 7

48 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view Total number of sense annotations 300M 250M 225M 206M 150M 75M 71M 40M 1.3M WIKI SEW SEW Wikilinks MUN before after refinement refinement [1] [2] [1] S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia. Technical Report UM-CS [2] K. Taghipour and H. Ng. One million sense-tagged instances for word sense disambiguation and induction. CoNLL,

49 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view Total number of sense annotations Sense annotation by type 300M Original Intra-page Inter-page 250M 225M 206M 150M 75M 71M 40M 1.3M WIKI SEW SEW Wikilinks MUN before after refinement refinement [1] [2] 7

50 A Semantically Enriched Wikipedia (SEW) Experiments 8

51 A Semantically Enriched Wikipedia (SEW) Experiments Intrinsic Evaluation - Annotation Quality We compared our sense annotations against those discovered by 3W (Noraset et al. 2014), a Wikipedia-specific system designed to add automatically high-precision hyperlinks Extrinsic Evaluation - Entity Linking and Semantic Similarity We used SEW both as training set for Entity Linking, and as semantic network to develop Wikipedia-based vector representations for Semantic Similarity 8

52 A Semantically Enriched Wikipedia (SEW) Experiments Intrinsic Evaluation - Annotation Quality We compared our sense annotations against those discovered by 3W (Noraset et al. 2014), a Wikipedia-specific system designed to add automatically high-precision hyperlinks Extrinsic Evaluation - Entity Linking We used SEW as training set for Entity Linking 9

53 A Semantically Enriched Wikipedia (SEW) Intrinsic Evaluation Hand-labeled evaluation set of 2000 Wikipedia pages (Noraset et al. 2014): SEW 3W T. Noraset, C. Bhagavatula, and D. Downey. Adding high-precision links to Wikipedia. EMNLP, Precision Recall F1 10

54 A Semantically Enriched Wikipedia (SEW) Extrinsic Evaluation #1 - Entity Linking Benchmark system: IMS (Zhong and Ng 2010), a state-of-the-art supervised system for Word Sense Disambiguation in English based on SVMs IMS + SEW IMS + HL MFS IMS trained on SEW IMS trained only on the original hyperlinks (baseline #1) Most Frequent Sense provided by BabelNet (baseline #2) Datasets: SemEval 2013, task 12 SemEval 2015, task 13 MSNBC AIDA-CoNLL 11

55 A Semantically Enriched Wikipedia (SEW) Extrinsic Evaluation #1 - Entity Linking 12

56 Conclusion Built a unified evaluation framework for all-words Word Sense Disambiguation, including standardized training and testing data. (Raganato et al. EACL 2017) Framed the WSD task as sequence labelling problem, presenting several neural sequence learning models, showing that, for the first time in WSD, a model trained on a given language is able to seamlessly handle a different language at testing time. (Raganato et al. EMNLP 2017) Built one of the largest available collection of sense annotated corpora with high-quality annotations. (Raganato et al. IJCAI 2016, Camacho-Collados et al. LREC 2016, Delli Bovi et al. ACL 2017)

57 Thank you!

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

The Luxembourg BabelNet Workshop

The Luxembourg BabelNet Workshop The Luxembourg BabelNet Workshop 2 March 2016: Session 3 Tech session Disambiguating text with Babelfy. The Babelfy API Claudio Delli Bovi Outline Multilingual disambiguation with Babelfy Using Babelfy

More information

Language Resources and Linked Data (EKAW 2014, Linköping, Sweden)

Language Resources and Linked Data (EKAW 2014, Linköping, Sweden) Language Resources and Linked Data (EKAW 2014, Linköping, Sweden) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano Flati Sapienza 18/11/2014

More information

The Luxembourg BabelNet Workshop

The Luxembourg BabelNet Workshop The Luxembourg BabelNet Workshop 2 March 2016: Session 2 Tech session Downloading and installing BabelNet The BabelNet API Claudio Delli Bovi About me Claudio Delli Bovi dellibovi@di.uniroma1.it bn:17381128n

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses

SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses David Jurgens Dipartimento di Informatica Sapienza Universita di Roma jurgens@di.uniroma1.it Ioannis Klapaftis Search Technology

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation

Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation Eneko Agirre, Aitor Soroa IXA NLP Group University of Basque Country Donostia, Basque Contry a.soroa@ehu.es Abstract

More information

Knowledge-based Word Sense Disambiguation using Topic Models

Knowledge-based Word Sense Disambiguation using Topic Models Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot, Ruslan Salakhutdinov {chaplot,rsalakhu}@cs.cmu.edu Machine Learning Department School of Computer Science Carnegie Mellon

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Papers for comprehensive viva-voce

Papers for comprehensive viva-voce Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India

More information

A Quick Tour of BabelNet 1.1

A Quick Tour of BabelNet 1.1 A Quick Tour of BabelNet 1.1 Roberto Navigli Dipartimento di Informatica Sapienza Università diroma Viale Regina Elena, 295 Roma, Italy navigli@di.uniroma1.it http://lcl.uniroma1.it Abstract. In this paper

More information

Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot

Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Knowledge-based Word Sense Disambiguation using Topic Models Devendra Singh Chaplot Ruslan Salakhutdinov Word Sense Disambiguation Word sense disambiguation (WSD) is defined as the problem of computationally

More information

BabelDomains: Large-Scale Domain Labeling of Lexical Resources

BabelDomains: Large-Scale Domain Labeling of Lexical Resources BabelDomains: Large-Scale Domain Labeling of Lexical Resources Jose Camacho-Collados and Roberto Navigli Department of Computer Science Sapienza University of Rome {collados,navigli}@di.uniroma1.it Abstract

More information

The Multilingual Language Library

The Multilingual Language Library The Multilingual Language Library @ LREC 2012 Let s build it together! Nicoletta Calzolari with Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale

More information

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li

Random Walks for Knowledge-Based Word Sense Disambiguation. Qiuyu Li Random Walks for Knowledge-Based Word Sense Disambiguation Qiuyu Li Word Sense Disambiguation 1 Supervised - using labeled training sets (features and proper sense label) 2 Unsupervised - only use unlabeled

More information

Automatically Annotating Text with Linked Open Data

Automatically Annotating Text with Linked Open Data Automatically Annotating Text with Linked Open Data Delia Rusu, Blaž Fortuna, Dunja Mladenić Jožef Stefan Institute Motivation: Annotating Text with LOD Open Cyc DBpedia WordNet Overview Related work Algorithms

More information

Cross-Lingual Word Sense Disambiguation

Cross-Lingual Word Sense Disambiguation Cross-Lingual Word Sense Disambiguation Priyank Jaini Ankit Agrawal pjaini@iitk.ac.in ankitag@iitk.ac.in Department of Mathematics and Statistics Department of Mathematics and Statistics.. Mentor: Prof.

More information

Babelplagiarism: what can BabelNet do for crosslanguage plagiarism detection? Roberto Navigli

Babelplagiarism: what can BabelNet do for crosslanguage plagiarism detection? Roberto Navigli Babelplagiarism: what can BabelNet do for crosslanguage plagiarism detection? Joint work with Simone Ponzetto Mirella Lapata Andrea Moro 2 Outline Motivation: the knowledge acquisition bottleneck BabelNet:

More information

MRD-based Word Sense Disambiguation: Extensions and Applications

MRD-based Word Sense Disambiguation: Extensions and Applications MRD-based Word Sense Disambiguation: Extensions and Applications Timothy Baldwin Joint Work with F. Bond, S. Fujita, T. Tanaka, Willy and S.N. Kim 1 MRD-based Word Sense Disambiguation: Extensions and

More information

BabelNet and! Word Sense Disambiguation

BabelNet and! Word Sense Disambiguation BabelNet and! Word Sense Disambiguation Overview: Original BabelNet BabelNet 2.5 ( today ) Extrinsic Evaluations (SemEval-2007 T#16, SemEval-2007 T#7) SemEval-2010 T#3, 2013 Next episode (preview): Babelfy

More information

WordNet-based User Profiles for Semantic Personalization

WordNet-based User Profiles for Semantic Personalization PIA 2005 Workshop on New Technologies for Personalized Information Access WordNet-based User Profiles for Semantic Personalization Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Ignazio Palmisano LACAM

More information

An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages

An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann, Simone Paolo Ponzetto Data and Web

More information

DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation

DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation Steve L. Manion University of Canterbury Christchurch, New Zealand steve.manion @pg.canterbury.ac.nz Raazesh Sainudiin University

More information

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012 A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of

More information

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation

The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation The Dictionary Parsing Project: Steps Toward a Lexicographer s Workstation Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/dppdemo/index.html Dictionary Parsing Project Purpose: to

More information

DBpedia Spotlight at the MSM2013 Challenge

DBpedia Spotlight at the MSM2013 Challenge DBpedia Spotlight at the MSM2013 Challenge Pablo N. Mendes 1, Dirk Weissenborn 2, and Chris Hokamp 3 1 Kno.e.sis Center, CSE Dept., Wright State University 2 Dept. of Comp. Sci., Dresden Univ. of Tech.

More information

NLP Final Project Fall 2015, Due Friday, December 18

NLP Final Project Fall 2015, Due Friday, December 18 NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,

More information

Annotating Spatio-Temporal Information in Documents

Annotating Spatio-Temporal Information in Documents Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de

More information

Multi-Modal Word Synset Induction. Jesse Thomason and Raymond Mooney University of Texas at Austin

Multi-Modal Word Synset Induction. Jesse Thomason and Raymond Mooney University of Texas at Austin Multi-Modal Word Synset Induction Jesse Thomason and Raymond Mooney University of Texas at Austin Word Synset Induction kiwi Word Synset Induction chinese grapefruit kiwi kiwi vine Word Synset Induction

More information

Sentiment Analysis using Weighted Emoticons and SentiWordNet for Indonesian Language

Sentiment Analysis using Weighted Emoticons and SentiWordNet for Indonesian Language Sentiment Analysis using Weighted Emoticons and SentiWordNet for Indonesian Language Nur Maulidiah Elfajr, Riyananto Sarno Department of Informatics, Faculty of Information and Communication Technology

More information

Automatic Word Sense Disambiguation Using Wikipedia

Automatic Word Sense Disambiguation Using Wikipedia Automatic Word Sense Disambiguation Using Wikipedia Sivakumar J *, Anthoniraj A ** School of Computing Science and Engineering, VIT University Vellore-632014, TamilNadu, India * jpsivas@gmail.com ** aanthoniraja@gmail.com

More information

A Multilingual Social Media Linguistic Corpus

A Multilingual Social Media Linguistic Corpus A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th

More information

Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities

Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities Dirk Weissenborn, Leonhard Hennig, Feiyu Xu and Hans Uszkoreit Language Technology Lab, DFKI Alt-Moabit 91c Berlin,

More information

Multilinguality at Your Fingertips: BabelNet, Babelfy and Beyond! Roberto Navigli

Multilinguality at Your Fingertips: BabelNet, Babelfy and Beyond! Roberto Navigli Multilinguality at Your Fingertips: http://lcl.uniroma1.it ERC Starting Grant n. 259234 LIDER CSA n. 610782 Moscow, 28 th May 2015 Tiziano Flati 23/06/2015 Daniele Vannella Andrea Moro Taher Pilehvar Francesco

More information

Antonio Fernández Orquín, Andrés Montoyo, Rafael Muñoz

Antonio Fernández Orquín, Andrés Montoyo, Rafael Muñoz UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation Yoan Gutiérrez, Yenier Castañeda, Andy González,

More information

NLP in practice, an example: Semantic Role Labeling

NLP in practice, an example: Semantic Role Labeling NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:

More information

Techreport for GERBIL V1

Techreport for GERBIL V1 Techreport for GERBIL 1.2.2 - V1 Michael Röder, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo February 21, 2016 Current Development of GERBIL Recently, we released the latest version 1.2.2 of GERBIL [16] 1.

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

NUS-I2R: Learning a Combined System for Entity Linking

NUS-I2R: Learning a Combined System for Entity Linking NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm

More information

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation

Improvement of Web Search Results using Genetic Algorithm on Word Sense Disambiguation Volume 3, No.5, May 24 International Journal of Advances in Computer Science and Technology Pooja Bassin et al., International Journal of Advances in Computer Science and Technology, 3(5), May 24, 33-336

More information

Swinburne Research Bank

Swinburne Research Bank Swinburne Research Bank http://researchbank.swinburne.edu.au Hu, S., & Liu, C. (2011). Incorporating coreference resolution into word sense disambiguation. Originally published A. Gelbukh (eds.). Proceedings

More information

A Korean Knowledge Extraction System for Enriching a KBox

A Korean Knowledge Extraction System for Enriching a KBox A Korean Knowledge Extraction System for Enriching a KBox Sangha Nam, Eun-kyung Kim, Jiho Kim, Yoosung Jung, Kijong Han, Key-Sun Choi KAIST / The Republic of Korea {nam.sangha, kekeeo, hogajiho, wjd1004109,

More information

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Many machine learning models for coreference resolution have been created, using not only different feature sets but also fundamentally

More information

Tools for Annotating and Searching Corpora Practical Session 1: Annotating

Tools for Annotating and Searching Corpora Practical Session 1: Annotating Tools for Annotating and Searching Corpora Practical Session 1: Annotating Stefanie Dipper Institute of Linguistics Ruhr-University Bochum Corpus Linguistics Fest (CLiF) June 6-10, 2016 Indiana University,

More information

Managing a Multilingual Treebank Project

Managing a Multilingual Treebank Project Managing a Multilingual Treebank Project Milan Souček Timo Järvinen Adam LaMontagne Lionbridge Finland {milan.soucek,timo.jarvinen,adam.lamontagne}@lionbridge.com Abstract This paper describes the work

More information

Learning to Rank Aggregated Answers for Crossword Puzzles

Learning to Rank Aggregated Answers for Crossword Puzzles Learning to Rank Aggregated Answers for Crossword Puzzles Massimo Nicosia 1,2, Gianni Barlacchi 2 and Alessandro Moschitti 1,2 1 Qatar Computing Research Institute 2 University of Trento m.nicosia@gmail.com,

More information

Building Instance Knowledge Network for Word Sense Disambiguation

Building Instance Knowledge Network for Word Sense Disambiguation Building Instance Knowledge Network for Word Sense Disambiguation Shangfeng Hu, Chengfei Liu Faculty of Information and Communication Technologies Swinburne University of Technology Hawthorn 3122, Victoria,

More information

Putting ontologies to work in NLP

Putting ontologies to work in NLP Putting ontologies to work in NLP The lemon model and its future John P. McCrae National University of Ireland, Galway Introduction In natural language processing we are doing three main things Understanding

More information

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Jianyong Wang Department of Computer Science and Technology Tsinghua University Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

Lecture 4: Unsupervised Word-sense Disambiguation

Lecture 4: Unsupervised Word-sense Disambiguation ootstrapping Lecture 4: Unsupervised Word-sense Disambiguation Lexical Semantics and Discourse Processing MPhil in dvanced Computer Science Simone Teufel Natural Language and Information Processing (NLIP)

More information

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using

More information

Meaning Banking and Beyond

Meaning Banking and Beyond Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis

More information

Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Entity Linking with Multiple Knowledge Bases: An Ontology Modularization

More information

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful

More information

A Hybrid Neural Model for Type Classification of Entity Mentions

A Hybrid Neural Model for Type Classification of Entity Mentions A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type

More information

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids Marek Lipczak Arash Koushkestani Evangelos Milios Problem definition The goal of Entity Recognition and Disambiguation

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Context Sensitive Search Engine

Context Sensitive Search Engine Context Sensitive Search Engine Remzi Düzağaç and Olcay Taner Yıldız Abstract In this paper, we use context information extracted from the documents in the collection to improve the performance of the

More information

Improving Retrieval Experience Exploiting Semantic Representation of Documents

Improving Retrieval Experience Exploiting Semantic Representation of Documents Improving Retrieval Experience Exploiting Semantic Representation of Documents Pierpaolo Basile 1 and Annalina Caputo 1 and Anna Lisa Gentile 1 and Marco de Gemmis 1 and Pasquale Lops 1 and Giovanni Semeraro

More information

CMU System for Entity Discovery and Linking at TAC-KBP 2015

CMU System for Entity Discovery and Linking at TAC-KBP 2015 CMU System for Entity Discovery and Linking at TAC-KBP 2015 Nicolas Fauceglia, Yiu-Chang Lin, Xuezhe Ma, and Eduard Hovy Language Technologies Institute Carnegie Mellon University 5000 Forbes Ave, Pittsburgh,

More information

NERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017

NERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017 NERD workshop Luca Foppiano @ ALMAnaCH - Inria Paris Berlin, 18/09/2017 Agenda Introducing the (N)ERD service NERD REST API Usages and use cases Entities Rigid textual expressions corresponding to certain

More information

Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping

Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping Journal of Artificial Intelligence Research 55 (2016) 165-208 Submitted 03/15; published 01/16 Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping Mamoun Abu Helou Department of

More information

Enlargement of the Croatian Wordnet using the WN-Toolkit (and preliminary results for Slovene)

Enlargement of the Croatian Wordnet using the WN-Toolkit (and preliminary results for Slovene) Enlargement of the Croatian Wordnet using the WN-Toolkit (and preliminary results for Slovene) Antoni Oliver (aoliverg@uoc.edu) Universitat Oberta de Catalunya Overview The WN-Toolkit The Expand Model

More information

Textual Emigration Analysis

Textual Emigration Analysis Textual Emigration Analysis Andre Blessing and Jonas Kuhn IMS - Universität Stuttgart, Germany clarin@ims.uni-stuttgart.de Abstract We present a web-based application which is called TEA (Textual Emigration

More information

WebAnno: a flexible, web-based annotation tool for CLARIN

WebAnno: a flexible, web-based annotation tool for CLARIN WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike

More information

Building the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format

Building the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format Building the Multilingual Web of Data Integrating NLP with Linked Data and RDF using the NLP Interchange Format Presenter name 1 Outline 1. Introduction 2. NIF Basics 3. NIF corpora 4. NIF tools & services

More information

UBC Entity Discovery and Linking & Diagnostic Entity Linking at TAC-KBP 2014

UBC Entity Discovery and Linking & Diagnostic Entity Linking at TAC-KBP 2014 UBC Entity Discovery and Linking & Diagnostic Entity Linking at TAC-KBP 2014 Ander Barrena, Eneko Agirre, Aitor Soroa IXA NLP Group / University of the Basque Country, Donostia, Basque Country ander.barrena@ehu.es,

More information

SEMANTIC INDEXING (ENTITY LINKING)

SEMANTIC INDEXING (ENTITY LINKING) Анализа текста и екстракција информација SEMANTIC INDEXING (ENTITY LINKING) Jelena Jovanović Email: jeljov@gmail.com Web: http://jelenajovanovic.net OVERVIEW Main concepts Named Entity Recognition Semantic

More information

Topics in Opinion Mining. Dr. Paul Buitelaar Data Science Institute, NUI Galway

Topics in Opinion Mining. Dr. Paul Buitelaar Data Science Institute, NUI Galway Topics in Opinion Mining Dr. Paul Buitelaar Data Science Institute, NUI Galway Opinion: Sentiment, Emotion, Subjectivity OBJECTIVITY SUBJECTIVITY SPECULATION FACTS BELIEFS EMOTION SENTIMENT UNCERTAINTY

More information

Mapping WordNet Instances to Wikipedia

Mapping WordNet Instances to Wikipedia Mapping WordNet Instances to Wikipedia John P. McCrae Insight Centre for Data Analytics, National University of Ireland Galway Lexical vs. Encyclopedic Yellow (in a dictionary) Is a verb, noun and adjective

More information

A graph-based method to improve WordNet Domains

A graph-based method to improve WordNet Domains A graph-based method to improve WordNet Domains Aitor González, German Rigau IXA group UPV/EHU, Donostia, Spain agonzalez278@ikasle.ehu.com german.rigau@ehu.com Mauro Castillo UTEM, Santiago de Chile,

More information

NATURAL LANGUAGE PROCESSING

NATURAL LANGUAGE PROCESSING NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity

More information

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca

INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA. Ernesto William De Luca INTERCONNECTING AND MANAGING MULTILINGUAL LEXICAL LINKED DATA Ernesto William De Luca Overview 2 Motivation EuroWordNet RDF/OWL EuroWordNet RDF/OWL LexiRes Tool Conclusions Overview 3 Motivation EuroWordNet

More information

It s time for a semantic engine!

It s time for a semantic engine! It s time for a semantic engine! Ido Dagan Bar-Ilan University, Israel 1 Semantic Knowledge is not the goal it s a primary mean to achieve semantic inference! Knowledge design should be derived from its

More information

Bc. Pavel Taufer. Named Entity Recognition and Linking

Bc. Pavel Taufer. Named Entity Recognition and Linking MASTER THESIS Bc. Pavel Taufer Named Entity Recognition and Linking Institute of Formal and Applied Linguistics Supervisor of the master thesis: Study programme: Study branch: RNDr. Milan Straka, Ph.D.

More information

A Linguistic Approach for Semantic Web Service Discovery

A Linguistic Approach for Semantic Web Service Discovery A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam

More information

Two graph-based algorithms for state-of-the-art WSD

Two graph-based algorithms for state-of-the-art WSD Two graph-based algorithms for state-of-the-art WSD Eneko Agirre, David Martínez, Oier López de Lacalle and Aitor Soroa IXA NLP Group University of the Basque Country Donostia, Basque Contry a.soroa@si.ehu.es

More information

Watson & WMR2017. (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself)

Watson & WMR2017. (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself) Watson & WMR2017 (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself) R. BASILI A.A. 2016-17 Overview Motivations Watson Jeopardy NLU in Watson

More information

Entity Linking at Web Scale

Entity Linking at Web Scale Entity Linking at Web Scale Thomas Lin, Mausam, Oren Etzioni Computer Science & Engineering University of Washington Seattle, WA 98195, USA {tlin, mausam, etzioni}@cs.washington.edu Abstract This paper

More information

Semantics Isn t Easy Thoughts on the Way Forward

Semantics Isn t Easy Thoughts on the Way Forward Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University

More information

Linking Thesauri and Glossaries Case Study 0: linking a fake resource Roberto Navigli

Linking Thesauri and Glossaries Case Study 0: linking a fake resource Roberto Navigli Linking Thesauri and Glossaries Case Study 0: linking a fake resource http://lcl.uniroma1.it The Luxembourg BabelNet Workshop Session 6 Session 6 The Luxembourg BabelNet Workshop [11:00-12:15, 3 March,

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Identifying Poorly-Defined Concepts in WordNet with Graph Metrics

Identifying Poorly-Defined Concepts in WordNet with Graph Metrics Identifying Poorly-Defined Concepts in WordNet with Graph Metrics John P. McCrae and Narumol Prangnawarat Insight Centre for Data Analytics, National University of Ireland, Galway john@mccr.ae, narumol.prangnawarat@insight-centre.org

More information

Influence of Word Normalization on Text Classification

Influence of Word Normalization on Text Classification Influence of Word Normalization on Text Classification Michal Toman a, Roman Tesar a and Karel Jezek a a University of West Bohemia, Faculty of Applied Sciences, Plzen, Czech Republic In this paper we

More information

WebSAIL Wikifier at ERD 2014

WebSAIL Wikifier at ERD 2014 WebSAIL Wikifier at ERD 2014 Thanapon Noraset, Chandra Sekhar Bhagavatula, Doug Downey Department of Electrical Engineering & Computer Science, Northwestern University {nor.thanapon, csbhagav}@u.northwestern.edu,ddowney@eecs.northwestern.edu

More information

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task Lucas Sterckx, Thomas Demeester, Johannes Deleu, Chris Develder Ghent University - iminds Gaston Crommenlaan 8 Ghent, Belgium

More information

in the NTU Multilingual Corpus (NTU-MC) January 15, 2016

in the NTU Multilingual Corpus (NTU-MC) January 15, 2016 . Sentiment Annotation in the NTU Multilingual Corpus (NTU-MC). 2 nd Wordnet Bahasa Workshop (WBW2016) Francis Bond, Tomoko Ohkuma, Luis Morgado Da Costa, Yasuhide Miura, Rachel Chen, Takayuki Kuribayashi,

More information

Personalized Terms Derivative

Personalized Terms Derivative 2016 International Conference on Information Technology Personalized Terms Derivative Semi-Supervised Word Root Finder Nitin Kumar Bangalore, India jhanit@gmail.com Abhishek Pradhan Bangalore, India abhishek.pradhan2008@gmail.com

More information

New York University 2014 Knowledge Base Population Systems

New York University 2014 Knowledge Base Population Systems New York University 2014 Knowledge Base Population Systems Thien Huu Nguyen, Yifan He, Maria Pershina, Xiang Li, Ralph Grishman Computer Science Department New York University {thien, yhe, pershina, xiangli,

More information

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual

More information

Towards Domain Independent Named Entity Recognition

Towards Domain Independent Named Entity Recognition 38 Computer Science 5 Towards Domain Independent Named Entity Recognition Fredrick Edward Kitoogo, Venansius Baryamureeba and Guy De Pauw Named entity recognition is a preprocessing tool to many natural

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

Exploiting Conversation Structure in Unsupervised Topic Segmentation for s

Exploiting Conversation Structure in Unsupervised Topic Segmentation for  s Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Sentiment Analysis using Support Vector Machine based on Feature Selection and Semantic Analysis

Sentiment Analysis using Support Vector Machine based on Feature Selection and Semantic Analysis Sentiment Analysis using Support Vector Machine based on Feature Selection and Semantic Analysis Bhumika M. Jadav M.E. Scholar, L. D. College of Engineering Ahmedabad, India Vimalkumar B. Vaghela, PhD

More information

Annotation and Evaluation

Annotation and Evaluation Annotation and Evaluation Digging into Data: Jordan Boyd-Graber University of Maryland April 15, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Annotation and Evaluation April 15, 2013 1 / 21 Exam Solutions

More information

Optimized Word Sense Disambiguation in Hindi using Genetic Algorithm

Optimized Word Sense Disambiguation in Hindi using Genetic Algorithm Optimized Word Sense Disambiguation in Hindi using Genetic Algorithm Sabnam Kumari 1, 1 M.Tech Scholar, Department of Computer Science and Engineering, PDM College of Engineering, Bahadurgarh, Haryana

More information