Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Running Example. Mention Pair Model. Mention Pair Example

Similar documents
Fine-Grained Semantic Class Induction via Hierarchical and Collective Classification

Narrative Schema as World Knowledge for Coreference Resolution

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

UNIVERSITY OF CALIFORNIA, IRVINE. Graphical Models for Entity Coreference Resolution DISSERTATION

NUS-I2R: Learning a Combined System for Entity Linking

Proposed Task Description for Source/Target Belief and Sentiment Evaluation (BeSt) at TAC 2016

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

Introduction to Lexical Functional Grammar. Wellformedness conditions on f- structures. Constraints on f-structures

Topics for Today. The Last (i.e. Final) Class. Weakly Supervised Approaches. Weakly supervised learning algorithms (for NP coreference resolution)

A Twin-Candidate Model of Coreference Resolution with Non-Anaphor Identification Capability

Semantic Pattern Classification

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs

Question Answering Using XML-Tagged Documents

Natural Language Processing

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

A Hybrid Neural Model for Type Classification of Entity Mentions

PRIS at TAC2012 KBP Track

We extend SVM s in order to support multi-class classification problems. Consider the training dataset

BESTCUT: A Graph Algorithm for Coreference Resolution

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Stanford s 2013 KBP System

Final Project Discussion. Adam Meyers Montclair State University

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Using the Web as a Corpus. in Natural Language Processing

Module 3: GATE and Social Media. Part 4. Named entities

Identification of Coreferential Chains in Video Texts for Semantic Annotation of News Videos

Search Engines. Information Retrieval in Practice

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

Clustering & Classification (chapter 15)

slide courtesy of D. Yarowsky Splitting Words a.k.a. Word Sense Disambiguation Intro to NLP - J. Eisner 1

Conclusion and review

Chapter 27 Introduction to Information Retrieval and Web Search

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Function Algorithms: Linear Regression, Logistic Regression

Evaluation Algorithms for Event Nugget Detection : A Pilot Study

CHAPTER 6 PROPOSED HYBRID MEDICAL IMAGE RETRIEVAL SYSTEM USING SEMANTIC AND VISUAL FEATURES

Encoding Words into String Vectors for Word Categorization

Machine Learning in GATE

Handling Place References in Text

Information Extraction

CMU System for Entity Discovery and Linking at TAC-KBP 2017

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

EVENT EXTRACTION WITH COMPLEX EVENT CLASSIFICATION USING RICH FEATURES

Modeling the Evolution of Product Entities

String Vector based KNN for Text Categorization

Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018

Sparse Feature Learning

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

Exam Marco Kuhlmann. This exam consists of three parts:

Network Traffic Measurements and Analysis

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Predicting Popular Xbox games based on Search Queries of Users

Semantic Web. Ontology Alignment. Morteza Amini. Sharif University of Technology Fall 95-96

Gene Clustering & Classification

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

AutoODC: Automated Generation of Orthogonal Defect Classifications

PLN Curs Partial exam. A possible solution is the following:

Complex Prediction Problems

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Storyline Reconstruction for Unordered Images

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Self-tuning ongoing terminology extraction retrained on terminology validation decisions

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

Content-based Recommender Systems

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Learning Similarity Metrics for Event Identification in Social Media. Hila Becker, Luis Gravano

Lecture 4: Unsupervised Word-sense Disambiguation

Robust Discovery of Positive and Negative Rules in Knowledge-Bases

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task

MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Data Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.

PARALLEL CLASSIFICATION ALGORITHMS

Breaking it Down: The World as Legos Benjamin Savage, Eric Chu

Information Extraction Techniques in Terrorism Surveillance

Chapter 6 Evaluation Metrics and Evaluation

Detection and Extraction of Events from s

6. Dicretization methods 6.1 The purpose of discretization

Facial Expression Classification with Random Filters Feature Extraction

Dynamic Feature Selection for Dependency Parsing

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Handout 9: Imperative Programs and State

Keyword Extraction by KNN considering Similarity among Features

Introduction to Hidden Markov models

Yiqi Yan. May 10, 2017

Identifying and Ranking Possible Semantic and Common Usage Categories of Search Engine Queries

Performance Evaluation of Various Classification Algorithms

Semantic Annotation using Horizontal and Vertical Contexts

A Multi Cue Discriminative Approach to Semantic Place Classification

Creating a Classifier for a Focused Web Crawler

Part I: Data Mining Foundations

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Multi-label Classification. Jingzhou Liu Dec

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text

Transliteration as Constrained Optimization

Transcription:

Supervised Models for Coreference Resolution [Rahman & Ng, EMNLP09] Many machine learning models for coreference resolution have been created, using not only different feature sets but also fundamentally different designs. Rahman & Ng compare four different designs and discuss their strengths and weaknesses. Mention pair model Mention ranking model Entity mention model Cluster ranking model Running Example [Barack Obama] 1 1 nominated [Hillary Rodham Clinton] 2 2 as [his] 1 3 [secretary of state] 3 4 on [Monday] 4 5. [He] 1 6 Each mention appears in [brackets] A mention is annotated as [m] cid mid, where: à mid is the mention id à cid is the cluster id This example corresponds to the following clusters: 1: { Barack Obama, his, He } 2: { Hillary Rodham Clinton } 3: { secretary of state } 4: { Monday } Mention Pair Model Each training instance is a pair of mentions: (m j, m k ) An instance is labeled as positive if m j and m k are coreferent, otherwise it is labeled as negative. If all possible pairs were used, then the negative instances would substantially outnumber the positive! So the following approach has been adopted: a positive instance is created for each anaphoric mention m k and its closest antecedent m j. a negative instance is created for m k paired with each of the intervening mentions m j+1, m j+2,, m k-1 Mention Pair Example [Barack Obama] 1 1 nominated [Hillary Rodham Clinton] 2 2 as [his] 1 3 [secretary of state] 3 4 on [Monday] 4 5. [He] 1 6 The instances for the mention pair model would be: Positive = (He, his) (his, Barack Obama) Negative = (He, Monday) (He, secretary of state) (his, Hillary Rodham Clinton)

Post-classification Clustering The output of a mention pair model then needs to be clustered to coordinate the independent coreference decisions. Why? the coreference relation should have transitivity, but this may be violated by independent pairwise decisions many candidates may be classified as coreferent with a mention Common clustering algorithms include: transitive closure ( single link ): groups together all pairs that are connected by a path of links best first: group a mention with the antecedent that has the highest confidence value most recent: group a mention with its most recent antecedent Problems with Mention Pair Model Mention pair models are the traditional approach for supervised learning for coreference resolution. They are simple, but have several drawbacks. Each mention pair is considered independently from the others. The candidate antecedents cannot be compared to each other. Features can only be extracted from the two Clusterlevel information is not available. Need post-classification clustering step Computationally, this approach can be expensive. For long documents, the number of mention pairs can explode. Entity Mention Model An entity mention model decides whether a mention m k is coreferent with a (partial) cluster preceding m k. A cluster is viewed as representing an entity. A training instance is a mention and cluster pair: (m k, c j ) Two types of features are used: 1. features that describe m k 2. cluster-level features that characterize the relationship between m k and c j. Four values were used for these features: NONE: the feature is false for m k and all mentions in c j MOST-FALSE: the feature is true for m k and less than half (but at least one) of the mentions in c j MOST-TRUE: the feature is true for m k and at least half (but not all) of the mentions in c j ALL: the feature is true for m k and all mentions in c j Entity Mention Model A positive instance is created for each mention m k and the preceding cluster to which it belongs. A negative instance is created for each mention m k paired with each partial cluster whose last mention appears between m k and and its closest antecedent. When applying the classifier, mentions are processed left-to-right. For each m k, an instance is created between m k and each preceding cluster. The closest cluster classified as coreferent is chosen. Partial clusters are created incrementally based on the predictions of the classifier on the first k-1 mentions!

Mention Ranking model Reformulates the problem in terms of ranking rather than classification: which candidate antecedent is the most probable? all candidate antecedents are considered simultaneously and a ranking is imposed among them. an SVM ranker-learning algorithm is used. The features and training instances are identical to the mention pair model except for the values of the training instances: the pair with the closest antecedent gets a value of 2 all other (m j, m k ) pairs get a value of 1 When applying the model, the candidate antecedent with the largest value produced by ranker is chosen. Cluster Ranking Model Cluster Ranking combines the benefits of both the entity mention model and the mention ranking model: the set of preceding (partial) clusters are ranked. A training instance is a mention and cluster pair: (m k, c j ) An instance is created between m k and each of its preceding clusters. The values of the training instances are: if m k belongs to c j, then the pair s value is 2 otherwise the pair s value is 1 Both mention and cluster-level features are used. When applying the model, m k is paired with each of the preceding clusters and the one with the highest rank value is chosen. Features for Individual Mentions Features between Pairs of Mentions Feature values are Yes or No. Feature values are Compatible, Incompatible, or Not Applicable.

More Features between Pairs Anaphoricity Detection Two approaches were tried to explicity detect nonanaphoric 1. An independent anaphoricity classifier was trained. The classifier is applied first, and if m k is labeled as nonanaphoric then it will not be resolved. 2. The ranking models were trained to jointly learn discourse-new relations and to find resolutions. Training is done with both anaphoric and non-anaphoric For each m k, a new instance is created for it as a new cluster. Extracting Mentions To extract system mentions, a mention detector was trained with supervised learning. Results with Gold Mentions The first set of experiments uses gold mentions: Mention extraction was cast as a sequence labeling task using IOB tags and a CRF model was created. 29 features were used of the following types: Lexical (7): target word w i and window size +/-3 around it Capitalization (4): IsAllCap, IsInitCap, IsCapPeriod, IsAllLower Morphological (8): prefixes and suffixes up to length 4 Grammatical (1): POS tag of w i Semantic (1): Named Entity Tag of w i Gazetteers (8): dictionaries of pronouns, common words, person names and titles, vehicles, locations, companies, and hyponyms of PERSON from WordNet. Conclusions: The ranking models improve precision. Joint anaphoricity detection improves both ranking models. Cluster ranking outperforms mention ranking

Results with System Mentions The second set of experiments uses system-generated Precision is lower with system mentions, but the same general trends hold. Cluster ranking seems to be the best overall model.