Lecture 4: Unsupervised Word-sense Disambiguation

Size: px

Start display at page:

Download "Lecture 4: Unsupervised Word-sense Disambiguation"

Shonda Quinn
6 years ago
Views:

1 ootstrapping Lecture 4: Unsupervised Word-sense Disambiguation Lexical Semantics and Discourse Processing MPhil in dvanced Computer Science Simone Teufel Natural Language and Information Processing (NLIP) Group Slides after Frank Keller February 2, 2011 Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 1

2 ootstrapping 1 ootstrapping Heuristics Seed Set Classification Generalization 2 Reading: Yarowsky (1995), Navigli and Lapata (2010). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 2

3 Heuristics ootstrapping Heuristics Seed Set Classification Generalization Yarowsky s (1995) algorithm uses two powerful heuristics for WSD: One sense per collocation: nearby words provide clues to the sense of the target word, conditional on distance, order, syntactic relationship. One sense per discourse: the sense of a target words is consistent within a given document. The Yarowsky algorithm is a bootstrapping algorithm, i.e., it requires a small amount of annotated data. Figures and tables in this section from Yarowsky (1995). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 3

4 Seed Set ootstrapping Heuristics Seed Set Classification Generalization Step 1: Extract all instances of a polysemous or homonymous word. Step 2: Generate a seed set of labeled examples: either by manually labeling them; or by using a reliable heuristic. Example: target word plant: s seed set take all instances of plant life (sense ) and manufacturing plant (sense ). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 4

5 ootstrapping Heuristics Seed Set Classification Generalization Seed Set Life Manufacturing Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 5

6 Classification ootstrapping Heuristics Seed Set Classification Generalization Step 3a: Train classifier on the seed set. Step 3b: pply classifier to the entire sample set. dd those examples that are classified reliably (probability above a threshold) to the seed set. Yarowsky uses a decision list classifier: rules of the form: collocation sense rules are ordered by log-likelihood: log P(sense collocation i ) P(sense collocation i ) classification is based on the first rule that applies. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 6

7 Classification ootstrapping Heuristics Seed Set Classification Generalization LogL Collocation Sense 8.10 plant life 7.58 manufacturing plant 7.39 life (within words) 7.20 manufacturing (in words) 6.27 animal (within words) 4.70 equipment (within words) 4.39 employee (within words) 4.30 assembly plant 4.10 plant closure 3.52 plant species 3.48 automate (within +-10 words) 3.45 microscopic plant... Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 7

8 Classification ootstrapping Heuristics Seed Set Classification Generalization Step 3c: Use one-sense-per-discourse constraint to filter newly classified examples: If several examples have already been annotated as sense, then extend this to all examples of the word in the discourse. This can form a bridge to new collocations, and correct erroneously labeled examples. Step 3d: repeat Steps 3a d. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 8

9 ootstrapping Heuristics Seed Set Classification Generalization Classification Life Manufacturing Microscopic Species nimal utomate Equipment Employee Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 9

10 Generalization ootstrapping Heuristics Seed Set Classification Generalization Step 4: lgorithm converges on a stable residual set (remaining unlabeled instances): most training examples will now exhibit multiple collocations indicative of the same sense; decision list procedure uses only the most reliable rule, not a combination of rules. Step 5: The final classifier can now be applied to unseen data. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 10

11 Discussion ootstrapping Heuristics Seed Set Classification Generalization Strengths: simple algorithm that uses only minimal features (words in the context of the target word); minimal effort required to create seed set; does not rely on dictionary or other external knowledge. Weaknesses: uses very simple classifier (but could replace it with a more state-of-the-art one); not fully unsupervised: requires seed data; does not make use of the structure of the sense inventory. lternative: graph-based algorithms exploit the structure of the sense inventory for WSD. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 11

12 ootstrapping Navigli and Lapata s (2010) algorithm is an example of graph-based WSD. It exploits the fact that sense inventories have internal structure. Example: synsets (senses) of drink in Wordnet: (1) a. b. {drink 1 v, imbibe3 v } {drink 2 v, booze 1 v, fuddle 2 v } c. {toast 2 v, drink 3 v, pledge 2 v, salute 1 v, wassail 2 v} d. {drink in 1 v, drink 4 v } e. {drink 5 v, tope 1 v } Figures and tables in this section from Navigli and Lapata (2010). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 12

13 WN as a graph ootstrapping We can represent Wordnet as a graph whose nodes are synsets and whose edges are relations between synsets. Note that the edges are not labeled, i.e., the type of relation between the nodes is ignored. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 13

14 ootstrapping Example: graph for the first sense of drink. helping 1 n food 1 n liquid 1 n drink 1 n beverage 1 n milk 1 n sup 1 v toast 4 n consume 2 v drink 1 v sip 1 v nip 4 n consumer 1 n drinker 1 n drinking 1 n consumption 1 n potation 1 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 14

15 ootstrapping Disambiguation algorithm: 1 Use the Wordnet graph to construct a graph that incorporates each content word in the sentence to be disambiguated; 2 Rank each node in the sentence graph according to its importance using graph connectivity measures; 3 For each content word, pick the highest ranked sense as the correct sense of the word. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 15

16 ootstrapping Given a word sequence σ = (w 1,w 2,...,w n ), the graph G is constructed as follows: 1 Let V σ := n Senses(w i ) denote all possible word senses in σ. i=1 We set V := V σ and E :=. 2 For each node v V σ, we perform a depth-first search (DFS) of the Wordnet graph: every time we encounter a node v V σ (v v) along a path v v 1 v k v of length L, we add all intermediate nodes and edges on the path from v to v : V := V {v 1,...,v k } and E := E {{v,v 1 },..., {v k,v }}. For tractability, we fix the maximum path length at 6. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 16

17 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drink 3 v drink 4 v drink 5 v milk 2 n milk 3 n milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

18 ootstrapping Example: graph for drink milk. drink 1 v drink 2 v drink 1 n beverage 1 n milk 1 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

19 ootstrapping Example: graph for drink milk. drink 1 v drink 2 v drink 1 n beverage 1 n milk 1 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

20 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

21 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

22 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v boozing 1 n nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

23 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v boozing 1 n nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

24 ootstrapping Example: graph for drink milk. drink 1 v drink 1 n beverage 1 n milk 1 n drink 2 v drinker 2 n food 1 n drink 3 v boozing 1 n nutriment 1 n milk 2 n drink 4 v milk 3 n drink 5 v milk 4 n We get 3 2 = 6 interpretations, i.e., subgraphs obtained when only considering one connected sense of drink and milk. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 17

25 ootstrapping Once we have the graph, we pick the most connected node for each word as the correct sense. Two types of connectivity measures: Local measures: gives a connectivity score to an individual node in the graph; use this directly to pick a sense; Global measures: assigns a connectivity score the to the graph as a whole; apply the measure to each interpretation and select the highest scoring one. Navigli and Lapata (2010) discuss a large number of graph connectivity measures; we will focus on the most important ones. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 18

26 Degree Centrality ootstrapping ssume a graph with nodes V and edges E. Then the degree of v V is the number of edges terminating in it: deg(v) = {{u,v} E : u V } (1) Degree centrality is the degree of a node normalized by the maximum degree: C D (v) = deg(v) V 1 For the previous example, C D (drinkv 1) = 3 14, C D(drinkv 2) = C D (drinkv 5) = 2 14, and C D(milkn 1) = C D(milkn 2) = So we pick drink 1 v, while milk n is tied. (2) Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 19

27 Edge Density ootstrapping The edge density of a graph is the number of edges compared to a complete graph with V nodes (given by ( V ) 2 ): ED(G) = E(G) ) (3) ( V 2 The first interpretation of drink milk has ED(G) = ( 6 5 2) = 6 10 = 0.60, the second one ED(G) = ( 5 5 2) = 5 10 = Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 20

28 on SemCor ootstrapping Local Global WordNet EnWordNet Measure ll Poly ll Poly Random ExtLesk Degree PageRank HITS KPP etweenness Compactness Graph Entropy Edge Density First Sense Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 21

29 ootstrapping on Semeval ll-words Data System est Unsupervised (Sussex) 45.8 ExtLesk 43.1 Degree Unsupervised 52.9 est Semi-supervised (IRST-DDD) 56.7 Degree Semi-Unsupervised 60.7 First Sense 62.4 est Supervised (GML) 65.2 F Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 22

30 Discussion ootstrapping Strengths: exploits the structure of the sense inventory/dictionary; conceptually simple, doesn t require any training data, not even a seed set; achieves good performance for unsupervised system. Weaknesses: performance not good enough for real applications (F-score of 53 on Semeval); sense inventories take a lot of effort to create (Wordnet has been under development for more than 15 years). Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 23

31 Summary ootstrapping The Yarowsky algorithm uses two key heuristics: one sense per collocation; one sense per discourse; It starts with a small seed set, trains a classifier on it, and then applies it to the whole data set (bootstrapping); Reliable examples are kept, and the classifier is re-trained. Unsupervised graph-based WSD is an alternative, where the connectivity of the sense inventory is exploited. graph is constructed that represents the possible interpretations of a sentence; the nodes with the highest connectivity are picked as correct senses; range of connectivity measures exists, simple degree is best. Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 24

32 References ootstrapping Yarowsky (1995): Unsupervised Word Sense Disambiguation rivaling Supervised Methods. Proceedings of the CL. Navigli and Lapata (2010): n Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation. IEEE Transactions on Pattern nalysis and Machine Intelligence (TPMI), 32(4), IEEE Press, 2010, pp Simone Teufel Lecture 4: Unsupervised Word-sense Disambiguation 25

WORD sense disambiguation (WSD), the ability to identify

IEEE TPAMI 1 An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation Roberto Navigli and Mirella Lapata Abstract Word sense disambiguation (WSD), the task of identifying