slide courtesy of D. Yarowsky Splitting Words a.k.a. Word Sense Disambiguation Intro to NLP - J. Eisner 1

Similar documents
Lecture 4: Unsupervised Word-sense Disambiguation

Statistical NLP Spring 2009

Statistical NLP Spring 2009

Chapter 6: Cluster Analysis

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CSE 158. Web Mining and Recommender Systems. Midterm recap

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Number Sense Disambiguation

The Expectation Maximization (EM) Algorithm

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

Exam Marco Kuhlmann. This exam consists of three parts:

What to come. There will be a few more topics we will cover on supervised learning

10601 Machine Learning. Model and feature selection

KNOWLEDGE GRAPHS. Lecture 1: Introduction and Motivation. TU Dresden, 16th Oct Markus Krötzsch Knowledge-Based Systems

Information Retrieval and Web Search Engines

Making Sense Out of the Web

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CS 6320 Natural Language Processing

Information Retrieval. (M&S Ch 15)

Lecture 1: Introduction and Motivation Markus Kr otzsch Knowledge-Based Systems

CSE 573: Artificial Intelligence Autumn 2010

NLP Lab Session Week 9, October 28, 2015 Classification and Feature Sets in the NLTK, Part 1. Getting Started

CMPSCI 646, Information Retrieval (Fall 2003)

ECG782: Multidimensional Digital Signal Processing

Natural Language Processing

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Boosting Simple Model Selection Cross Validation Regularization

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

CS47300 Web Information Search and Management

Elementary IR: Scalable Boolean Text Search. (Compare with R & G )

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

Information Retrieval and Organisation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Text classification with Naïve Bayes. Lab 3

2. Design Methodology

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

K-Means Clustering 3/3/17

Natural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example

CSEP 517 Natural Language Processing Autumn 2013

CS 343: Artificial Intelligence

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Introduction to Data Mining

Applying Supervised Learning

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Information Retrieval and Web Search Engines

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Class 5: Attributes and Semantic Features

Review. CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Let bindings (CBV) Adding Stuff. Booleans and Conditionals

CS 188: Artificial Intelligence Fall Announcements

Announcements. CS 188: Artificial Intelligence Fall Machine Learning. Classification. Classification. Bayes Nets for Classification

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Introduction to Data Mining

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

ECE 5424: Introduction to Machine Learning

SVM: Multiclass and Structured Prediction. Bin Zhao

Neural Networks for Machine Learning. Lecture 15a From Principal Components Analysis to Autoencoders

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

1 General description of the projects

Web Spam Challenge 2008

Supervised Learning: The Setup. Spring 2018

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

Tagonto. Tagonto Project is an attempt of nearing two far worlds Tag based systems. Almost completely unstructured and semantically empty

Big Data Analytics CSCI 4030

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Integration of Mathematical Models and Simulations

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Topics for Today. The Last (i.e. Final) Class. Weakly Supervised Approaches. Weakly supervised learning algorithms (for NP coreference resolution)

Lecture 9: Support Vector Machines

Introduction to Lexical Analysis

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Regularization and model selection

Information Retrieval

CSCI 5417 Information Retrieval Systems Jim Martin!

Fortgeschrittene objektorientierte Programmierung (Advanced Object-Oriented Programming)

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall Machine Learning

Machine Learning: Symbol-based

Cross-Instance Tuning of Unsupervised Document Clustering Algorithms

Jan Pedersen 22 July 2010

Missing Data. Where did it go?

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Visualization and text mining of patent and non-patent data

The New C Standard (Excerpted material)

Learning to Learn: additional notes

COMP90051 Statistical Machine Learning

Clustering COMS 4771

COMS 4771 Clustering. Nakul Verma

Compilers and computer architecture From strings to ASTs (2): context free grammars

Transcription:

Splitting Words a.k.a. Word Sense Disambiguation 600.465 - Intro to NLP - J. Eisner

Representing Word as Vector Could average over many occurrences of the word... Each word type has a different vector Each word token has a different vector Each word sense has a different vector (for this one, we need sense-tagged training data) (is this more like a type vector or a token vector) Each word type has a different vector We saw this yesterday It s good for grouping words similar semantics similar syntax depends on how you build the vector What is each of these good for 600.465 - Intro to NLP - J. Eisner 9 600.465 - Intro to NLP - J. Eisner 0 Each word token has a different vector Good for splitting words - unsupervised WSD Cluster the tokens: each cluster is a sense! have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May the by-pass there will be a street party. "Then," he says, "we are going in the 983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt that had been passed to a second party who made a financial decision A future obliges each party to the contract to fulfil it by 600.465 - Intro to NLP - J. Eisner Each word sense has a different vector Represent each new word token as vector, too Now assign each token the closest sense (could lump together all tokens of the word in the same document: assume they all have same sense) have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May the by-pass there will be a street party. "Then," he says, "we are going let you know that there s a party at my house tonight. Directions: Drive in the 983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt 600.465 - Intro to NLP - J. Eisner

Where can we get senselabeled training data To do supervised WSD, need many examples of each sense in context have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May the by-pass there will be a street party. "Then," he says, "we are going let you know that there s a party at my house tonight. Directions: Drive in the 983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt 600.465 - Intro to NLP - J. Eisner 3 Where can we get senselabeled training data To do supervised WSD, need many examples of each sense in context Sources of sense-labeled training text: Human-annotated text - expensive Bilingual text (Rosetta stone) can figure out which sense of plant is meant by how it translates Dictionary definition of a sense is one sample context Roget s thesaurus entry of a sense is one sample context hardly any data per sense but we ll use it later to get unsupervised training started 600.465 - Intro to NLP - J. Eisner 4 A problem with the vector model Bad idea to treat all context positions equally: Just one cue is sometimes enough... Possible solutions: Faraway words don t count as strongly Words in different positions relative to plant are different elements of the vector (i.e., (pesticide, -) and (pesticide,+) are different features) Words in different syntactic relationships to plant are different elements of the vector 600.465 - Intro to NLP - J. Eisner 5 600.465 - Intro to NLP - J. Eisner 6 An assortment of possible cues... An assortment of possible cues... only a weak cue... but we ll trust it if there s nothing better generates a whole bunch of potential cues use data to find out which ones work best 600.465 - Intro to NLP - J. Eisner 7 merged ranking of all cues of all these types 600.465 - Intro to NLP - J. Eisner 8

Final decision list for lead (abbreviated) To disambiguate a token of lead : Scan down the sorted list The first cue that is found gets to make the decision all by itself Not as subtle as combining cues, but works well for WSD Cue s score is its log-likelihood ratio: log [ p(cue sense A) [smoothed] / p(cue sense B) ] 600.465 - Intro to NLP - J. Eisner 9 very readable paper at http://cs.jhu.edu/~yarowsky/acl95.ps sketched on the following slides... make that minimally supervised First, find a pair of seed words that correlate well with the senses Yarowsky s bootstrapping algorithm table taken from Yarowsky (995) If plant really has senses, it should appear in dictionary entries: Pick content words from those thesaurus entries: Pick synonyms from those different clusters of documents: Pick representative words from those translations in parallel text: Use the translations as seed words Or just have a human name the seed words (maybe from a list of words that occur unusually often near plant ) 600.465 - Intro to NLP - J. Eisner target word: plant life (%) 98% manufacturing (%) 600.465 - Intro to NLP - J. Eisner Yarowsky s bootstrapping algorithm figure taken from Yarowsky (995) Yarowsky s bootstrapping algorithm table taken from Yarowsky (995) Learn a classifier that distinguishes A from B. It will notice features like animal A, automate B. Learn a classifier that distinguishes A from B. It will notice features like animal A, automate B. no surprise what the top cues are but other cues also good for discriminating these seed examples 600.465 - Intro to NLP - J. Eisner 3 600.465 - Intro to NLP - J. Eisner 4

figure taken from Yarowsky (995) the strongest of the new cues help us classify more examples... from which we can extract and rank even more cues that discriminate them... 600.465 - Intro to NLP - J. Eisner 5 Should be a good classifier, unless we accidentally learned some bad cues along the way that polluted the original sense distinction. 600.465 - Intro to NLP - J. Eisner 6 One sense per discourse A final trick: All tokens of plant in the same document probably have the same sense. top ranked cue appearing in this test example life and manufacturing are no longer even in the top cues! many unexpected cues were extracted, without supervised training Now use the final decision list to classify test examples: 3 tokens in same document gang up on the 4th 600.465 - Intro to NLP - J. Eisner 8 One sense per discourse A final trick: All tokens of plant in the same document probably have the same sense. A Note on Combining Cues these stats come from term papers of known authorship (i.e., supervised training) 600.465 - Intro to NLP - J. Eisner 9 600.465 - Intro to NLP - J. Eisner 30

A Note on Combining Cues A Note on Combining Cues planta planta plantb planta plantb plantb Naive Bayes model for classifying text (Note the naive independence assumptions!) We ll look at it again in a later lecture Would this kind of sentence be more typical of a student A paper or a student B paper 600.465 - Intro to NLP - J. Eisner 3 planta planta planta plantb plantb plantb Naive Bayes model for classifying text Would this kind of sentence be more typical Used here for word sense disambiguation of a plant A context or a plant B context 600.465 - Intro to NLP - J. Eisner 3