Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018

Size: px

Start display at page:

Download "Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018"

Myles McCormick
5 years ago
Views:

1 Building Multilingual Resources and Neural Models for Word Sense Disambiguation Alessandro Raganato March 15th, 2018

it/~raganato ERC project Multijedi ERC project Fotran

2 About me ERC project Multijedi ERC project Fotran Sapienza - University of Rome University of Helsinki prof. Roberto Navigli prof. Jörg Tiedemann 1

3 Slides from the Luxembourg BabelNet Workshop 2016 (

4 Slides from the Luxembourg BabelNet Workshop 2016 (

5 To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 16M entries in 284 languages and 1.3B semantic connections) Slides from the Luxembourg BabelNet Workshop 2016 (

3B semantic connections) Initially created as an integration of Wikipedia and WordNet, now BabelNet is

6 To the best of our knowledge, the largest multilingual encyclopedic dictionary and semantic network (almost 16M entries in 284 languages and 1.3B semantic connections) Initially created as an integration of Wikipedia and WordNet, now BabelNet is a merger of many different resources (Wiktionary, Wikidata, OmegaWiki, VerbNet, ImageNet, ) Slides from the Luxembourg BabelNet Workshop 2016 (

7 Slides from the Luxembourg BabelNet Workshop 2016 (

8 EN - Squash: a game played in a walled court with soft rubber balls and bats like tennis rackets Slides from the Luxembourg BabelNet Workshop 2016 (

9 EN - Squash: a game played in a walled court with soft rubber balls and bats like tennis rackets IT - Squash: una partita giocata in un campo recintato con palle di gomma morbida e pipistrelli come racchette da tennis Slides from the Luxembourg BabelNet Workshop 2016 (

10 Word Sense Disambiguation Language is ambiguous: Dave Grohl played bass in Rock Supergroup Teenage Time Killers. Word Sense Disambiguation (WSD) is the task of computationally determining which sense of a word is used in a particular context.

11 International Workshops on Semantic Evaluation Many evaluation datasets have been constructed for the task: Senseval 2 (2001) Senseval 3 (2004) SemEval 2007 SemEval 2013 SemEval 2015 Training Data: SemCor, a manually sense-annotated corpus OMSTI (One Million Sense-Tagged Instances), a large annotated corpus, automatically constructed

12 Building a Unified Evaluation Framework Our goal: build a unified framework for all-words WSD (training and testing) use this evaluation framework to perform a fair quantitative and qualitative empirical comparison

13 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD

14 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD

15 Evaluation: Results on the concatenation of all datasets Supervised vs. Knowledge-based WSD +0.4 (OMSTI)

16 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent

17 Challenges of supervised WSD Can we model disambiguation at the sequence level using a single all-words model? Can we develop more flexible models that are still able to retain state-of-the-art accuracy (e.g. can work with multiple languages)? No engineered features

18 WSD as (neural) sequence labeling 19

19 WSD as (neural) sequence labeling Attentive augmentation: with attention weights context vector 19

20 WSD as (neural) translation from English to English + word senses 20

21 WSD via multitask learning 21

22 WSD via multitask learning Aux. Task #1 (LEX): Coarse-grained semantic labels from WordNet lexicographer files Aux. Task #2 (POS): Universal parts of speech 21

23 WSD via multitask learning Aux. Task #2 (POS): Universal parts of speech Main loss LWSD(yi, yi*) + LPOS(POSi, POSi*) + LLEX(LEXi, LEXi*) Shared layers Aux. Task #1 (LEX): Coarse-grained semantic labels from WordNet lexicographer files Auxiliary losses 21

24 Experiments F1-score (%) F1-score (%)

25 Experiments F1-score (%) F1-score (%)

26 Zero-shot multilingual WSD...

27 Experiments: multilingual WSD F1-score (%)

28 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent

29 Issues with supervised WSD The word expert paradigm, in which each disambiguation target casts its own classification problem: For each ambiguous word type in the lexicon we have to learn a new dedicated model from scratch Disambiguation decisions within a sentence are independent Language dependent Lack of reliable sense-annotated data to train large-scale models (manually annotating word senses becomes quickly unfeasible), possibly in multiple languages

$Only a fraction of linkable mentions is in fact hyperlinked: * 580M noun tokens, only 116M covered ( 19%) Partly due to the Wikipedia style guidelines: Link$

31 Problem: Wikipedia is designed for humans! Only a fraction of linkable mentions is in fact hyperlinked: * 580M noun tokens, only 116M covered ( 19%) Partly due to the Wikipedia style guidelines: Link each concept at most once within a page Link only when relevant and helpful in the context * English dump 11/2014 2

33 potentially linkable mentions!

34 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible 3

35 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible How? The existing Wikipedia hyperlink structure: Page-to-page direct connections (Wikilinks) Connections between pages and Wikipedia categories The multilingual sense inventory and semantic network of BabelNet, a merger of many different resources (including, e.g., WordNet and Wikipedia itself) with 14M entries and 380M semantic connections 3

36 A Semantically Enriched Wikipedia (SEW) Our goal: Augment Wikipedia with as much semantic information as possible How? The existing Wikipedia hyperlink structure: Page-to-page direct connections (Wikilinks) Connections between pages and Wikipedia categories The multilingual sense inventory of BabelNet: merger of many different resources (including WordNet and Wikipedia itself) 14M entries and 380M semantic connections 3

37 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement 4

Propagation Tokenization Part-of-speech tagging

38 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Tokenization Part-of-speech tagging Lemmatization Filtering of uninformative pages Refinement 4

A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement Cascade of propagation

39 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement Cascade of propagation heuristics, which p collect a list S of hyperlinks to be propagated across p scan the text of p to match any potential lexicalization (one-sense-per-page assumption) 4

40 A Semantically Enriched Wikipedia (SEW) The algorithm: For each Wikipedia page p in W Preprocessing Hyperlink Propagation Refinement Conservative policy to remove duplicates and overlapping mentions 4

41 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view 5

42 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view Intra-page heuristics: Propagate links that occur as mentions within p Surface Mention Propagation Lemmatized Mention Propagation Person Mention Propagation 5

43 A Semantically Enriched Wikipedia (SEW) Hyperlink Propagation - A bird s eye view Intra-page heuristics: Inter-page heuristics: Propagate links that occur as mentions within p Surface Mention Propagation Lemmatized Mention Propagation Person Mention Propagation Exploit the connection of p with other pages or categories Wikipedia Inlink Propagation BabelNet Inlink Propagation Category Propagation Monosemous Content Word 5

44 30 Original links

45 Lorenzo de Medici Lorenzo poets Lorenzo art Florence Italian Renaissance Lorenzo de Medici 30 Original links 21 new intra-page links Lorenzo Florence Republic of Lorenzo Lorenzo Lorenzo poets Lorenzo philosopher Piero philosophers Lucrezia Medici bishop

Lorenzo de Medici 1 January Lorenzo poets artists perhaps 9 April Lorenzo statesman Italian Renaissance patron Florentines art scholars artists Florence death death Lorenzo de Medici 30 Original

46 Lorenzo de Medici 1 January Lorenzo poets artists perhaps 9 April Lorenzo statesman Italian Renaissance patron Florentines art scholars artists Florence death death Lorenzo de Medici 30 Original links 21 new intra-page links 31 new inter-page links Lorenzo Florence arts men portion Europe government Florentine Lorenzo Lorenzo poets Lorenzo philosopher Piero philosophers Lucrezia Medici Academy diplomat bishop chiefly Lorenzo son Republic of patron art patron deaths

47 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view 7

48 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view Total number of sense annotations 300M 250M 225M 206M 150M 75M 71M 40M 1.3M WIKI SEW SEW Wikilinks MUN before after refinement refinement [1] [2] [1] S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia. Technical Report UM-CS [2] K. Taghipour and H. Ng. One million sense-tagged instances for word sense disambiguation and induction. CoNLL,

49 A Semantically Enriched Wikipedia (SEW) Statistics - Another bird s eye view Total number of sense annotations Sense annotation by type 300M Original Intra-page Inter-page 250M 225M 206M 150M 75M 71M 40M 1.3M WIKI SEW SEW Wikilinks MUN before after refinement refinement [1] [2] 7

50 A Semantically Enriched Wikipedia (SEW) Experiments 8

51 A Semantically Enriched Wikipedia (SEW) Experiments Intrinsic Evaluation - Annotation Quality We compared our sense annotations against those discovered by 3W (Noraset et al. 2014), a Wikipedia-specific system designed to add automatically high-precision hyperlinks Extrinsic Evaluation - Entity Linking and Semantic Similarity We used SEW both as training set for Entity Linking, and as semantic network to develop Wikipedia-based vector representations for Semantic Similarity 8

2014), a Wikipedia-specific system designed to add automatically high-precision

52 A Semantically Enriched Wikipedia (SEW) Experiments Intrinsic Evaluation - Annotation Quality We compared our sense annotations against those discovered by 3W (Noraset et al. 2014), a Wikipedia-specific system designed to add automatically high-precision hyperlinks Extrinsic Evaluation - Entity Linking We used SEW as training set for Entity Linking 9

99 SEW 3W 0.62 0.47 0.47 0.31 T. Noraset, C. Bhagavatula, and D. Downey.

53 A Semantically Enriched Wikipedia (SEW) Intrinsic Evaluation Hand-labeled evaluation set of 2000 Wikipedia pages (Noraset et al. 2014): SEW 3W T. Noraset, C. Bhagavatula, and D. Downey. Adding high-precision links to Wikipedia. EMNLP, Precision Recall F1 10

54 A Semantically Enriched Wikipedia (SEW) Extrinsic Evaluation #1 - Entity Linking Benchmark system: IMS (Zhong and Ng 2010), a state-of-the-art supervised system for Word Sense Disambiguation in English based on SVMs IMS + SEW IMS + HL MFS IMS trained on SEW IMS trained only on the original hyperlinks (baseline #1) Most Frequent Sense provided by BabelNet (baseline #2) Datasets: SemEval 2013, task 12 SemEval 2015, task 13 MSNBC AIDA-CoNLL 11

55 A Semantically Enriched Wikipedia (SEW) Extrinsic Evaluation #1 - Entity Linking 12

56 Conclusion Built a unified evaluation framework for all-words Word Sense Disambiguation, including standardized training and testing data. (Raganato et al. EACL 2017) Framed the WSD task as sequence labelling problem, presenting several neural sequence learning models, showing that, for the first time in WSD, a model trained on a given language is able to seamlessly handle a different language at testing time. (Raganato et al. EMNLP 2017) Built one of the largest available collection of sense annotated corpora with high-quality annotations. (Raganato et al. IJCAI 2016, Camacho-Collados et al. LREC 2016, Delli Bovi et al. ACL 2017)

57 Thank you!

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual