Collins and Eisner s algorithms

Similar documents
Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

Transition-based dependency parsing

The CKY algorithm part 1: Recognition

The CKY algorithm part 2: Probabilistic parsing

The CKY algorithm part 1: Recognition

Exam Marco Kuhlmann. This exam consists of three parts:

Elements of Language Processing and Learning lecture 3

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11

Projective Dependency Parsing with Perceptron

Tekniker för storskalig parsning: Dependensparsning 2

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Dependency Parsing. Allan Jie. February 20, Slides: Allan Jie Dependency Parsing February 20, / 16

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Parsing with Dynamic Programming

Stack- propaga+on: Improved Representa+on Learning for Syntax

Incremental Integer Linear Programming for Non-projective Dependency Parsing

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-based Parsing with Neural Nets

Dynamic Feature Selection for Dependency Parsing

CS 224N Assignment 2 Writeup

Iterative CKY parsing for Probabilistic Context-Free Grammars

Statistical Methods for NLP

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Refresher on Dependency Syntax and the Nivre Algorithm

Dependency Parsing. Johan Aulin D03 Department of Computer Science Lund University, Sweden

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

CS395T Project 2: Shift-Reduce Parsing

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017

Accurate Parsing and Beyond. Yoav Goldberg Bar Ilan University

CSI33 Data Structures

Discriminative Training for Phrase-Based Machine Translation

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Advanced PCFG Parsing

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CS 261 Data Structures. Big-Oh Analysis: A Review

Backpropagating through Structured Argmax using a SPIGOT

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

Homework 2: Parsing and Machine Learning

Frequently Asked Questions

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

Assignment 4 CSE 517: Natural Language Processing

Natural Language Processing

Morpho-syntactic Analysis with the Stanford CoreNLP

Parsing partially bracketed input

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

NLP in practice, an example: Semantic Role Labeling

Parsing. Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing

The Application of Constraint Rules to Data-driven Parsing

Lecture 26. Introduction to Trees. Trees

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Machine Learning in GATE

17 STRUCTURED PREDICTION

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

Computational Linguistics

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

CSEP 573: Artificial Intelligence

Non-Projective Dependency Parsing in Expected Linear Time

The Expectation Maximization (EM) Algorithm

Lexical Analysis. Finite Automata

Analyzing the performance of top-k retrieval algorithms. Marcus Fontoura Google, Inc

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Advanced PCFG Parsing

CS101 Introduction to Programming Languages and Compilers

Parsing Chart Extensions 1

Multiword deconstruction in AnCora dependencies and final release data

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

CS 188: Artificial Intelligence. Recap Search I

A data structure and associated algorithms, NOT GARBAGE COLLECTION

The Application of Closed Frequent Subtrees to Authorship Attribution

A Collaborative Annotation between Human Annotators and a Statistical Parser

Reducing Multiclass to Binary. LING572 Fei Xia

Statistical Methods for NLP

Klein & Manning, NIPS 2002

Sum this up for me. Let s write a method to calculate the sum from 1 to some n. Gauss also has a way of solving this. Which one is more efficient?

Wh-questions. Ling 567 May 9, 2017

Chapter IR:IV. IV. Indexing. Abstract Model of Ranking Inverted Indexes Compression Auxiliary Structures Index Construction Query Processing

Density-Driven Cross-Lingual Transfer of Dependency Parsers

Natural Language Processing. SoSe Question Answering

Online Learning of Approximate Dependency Parsing Algorithms

CSEP 517 Natural Language Processing Autumn 2013

Natural Language Processing with Deep Learning CS224N/Ling284

Introduction to Lexical Analysis

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Syntax Intro and Overview. Syntax

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

I Know Your Name: Named Entity Recognition and Structural Parsing

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Transcription:

Collins and Eisner s algorithms Syntactic analysis (5LN455) 2015-12-14 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann

Recap: Dependency trees dobj subj det pmod I booked a flight from LA In an arc h d, the word h is called the head, and the word d is called the dependent. The arcs form a rooted tree.

Recap: Scoring models and parsing algorithms Distinguish two aspects: Scoring model: How do we want to score dependency trees? Parsing algorithm: How do we compute a highest-scoring dependency tree under the given scoring model?

Recap: The arc-factored model To score a dependency tree, score the individual arcs, and combine the score into a simple sum. score(t) = score(a1) + + score(an) Define the score of an arc h d as the weighted sum of all features of that arc: score(h d) = f1w1 + + fnwn

Recap: Example features The head is a verb. The dependent is a noun. The head is a verb and the dependent is a noun. The head is a verb and the predecessor of the head is a pronoun. The arc goes from left to right. The arc has length 2.

Arc-factored dependency parsing Recap: Training using structured prediction Take a sentence w and a gold-standard dependency tree g for w. Compute the highest-scoring dependency tree under the current weights; call it p. Increase the weights of all features that are in g but not in p. Decrease the weights of all features that are in p but not in g.

Recap: Collin s algorithm Collin s algorithm is a simple algorithm for computing the highest-scoring dependency tree under an arc-factored scoring model. It can be understood as an extension of the CKY algorithm to dependency parsing. Like the CKY algorithm, it can be characterized as a bottom-up algorithm based on dynamic programming.

Collins algorithm Recap: Signatures C min max [min, max, C]

Collins algorithm Recap: Signatures root min max [min, max, root]

Collins algorithm Recap: Initialization I booked a flight from LA 0 I 2 3 4 5

Collins algorithm Recap: Initialization I booked a flight from LA 0 I 2 3 4 5 [0, 1, I] [1, 2, booked] [2, 3, a] [3, 4, flight] [4, 5, from LA]

Collins algorithm Recap: Adding a left-to-right arc I booked a flight from LA 0 I 2 3 4 5

Collins algorithm Recap: Adding a left-to-right arc I booked a flight from LA 0 I 2 3 4 5 [3, 4, flight] [4, 5, from LA]

Collins algorithm Recap: Adding a left-to-right arc pmod I booked a flight from LA 0 I 2 3 4 5 [3, 4, flight] [4, 5, from LA]

Collins algorithm Recap: Adding a left-to-right arc pmod I booked a flight from LA 0 I 2 3 4 5 [3, 5, flight]

Collins algorithm Recap: Adding a left-to-right arc

Collins algorithm Recap: Adding a left-to-right arc l r t 1 t 2 min mid max

Collins algorithm Recap: Adding a left-to-right arc l r t 1 t 2 min mid max

Collins algorithm Recap: Adding a left-to-right arc l t min max

Collins algorithm Recap: Adding a left-to-right arc l t min max score(t) = score(t1) + score(t2) + score(l r)

Collins algorithm Recap: Complexity analysis Space requirement: O( w 3 ) Runtime requirement: O( w 5 )

Collins algorithm Extension to the labeled case It is important to distinguish dependencies of different types between the same two words. Example: subj, dobj For this reason, practical systems typically deal with labeled arcs. The question then arises how to extend Collins algorithm to the labeled case.

Collins algorithm Naive approach Add an innermost loop that iterates over all edge labels in order to find the combination that maximizes the overall score. For each step of the original algorithm, we now need to make L steps, where L is the set of all labels.

Collins algorithm Smart approach Before parsing, compute a table that lists, for each head dependent pair (h, d), the label that maximizes the score of arcs h d. During parsing, simply look up the best label in the precomputed table. This adds (not multiplies!) a factor of L w 2 to the overall runtime of the algorithm.

Collins algorithm Summary Collins algorithm is a CKY-style algorithm for computing the highest-scoring dependency tree under an arc-factored scoring model. It runs in time O( w 5 ). This may not be practical for long sentences.

Eisner s algorithm

Eisner s algorithm With its runtime of O( w 5 ), Collins algorithm may not be of much use in practice. With Eisner s algorithm we will be able to solve the same problem in O( w 3 ). Intuition: collect left and right dependents independently

Eisner s algorithm Basic idea l r min mid max In Collins algorithm, adding a left-to-right arc is done in one single step, specified by 5 positions.

Eisner s algorithm Basic idea l min max In Collins algorithm, adding a left-to-right arc is done in one single step, specified by 5 positions.

Eisner s algorithm Basic idea l r min mid max In Eisner s algorithm, the same thing is done in three steps, each one specified by 3 positions.

Eisner s algorithm Basic idea l r min mid max In Eisner s algorithm, the same thing is done in three steps, each one specified by 3 positions.

Eisner s algorithm Basic idea l r min max In Eisner s algorithm, the same thing is done in three steps, each one specified by 3 positions.

Eisner s algorithm Basic idea l r min max In Eisner s algorithm, the same thing is done in three steps, each one specified by 3 positions.

Eisner s algorithm Basic idea l min max In Eisner s algorithm, the same thing is done in three steps, each one specified by 3 positions.

Eisner s algorithm Basic idea l min max In Eisner s algorithm, the same thing is done in three steps, each one specified by 3 positions.

Eisner s algorithm Basic idea l min max In Eisner s algorithm, the same thing is done in three steps, each one specified by 3 positions.

Eisner s algorithm Comparison

Eisner s algorithm Dynamic programming tables Collins : [min,max,head] Eisner s [min,max,head-side,complete] head-side, binary: is head to the left or right? complete, binary: is the non-head side still looking for dependents?

Eisner s algorithm Pseudo code for each i from 0 to n and all d,c do C[i][i][d][c] = 0.0 for each m from 1 to n do for each i from 0 to n-m do j = i+m C[i][j][ ][1] = maxi q<j(c[i][q][ ][0] + C[q+1][j][ ][0]+score(wj,wi) C[i][j][ ][1] = maxi q<j(c[i][q][ ][0] + C[q+1][j][ ][0]+score(wi,wj) C[i][j][ ][0] = maxi q<j(c[i][q][ ][0] + C[q][j][ ][1]) C[i][j][ ][0] = maxi q<j(c[i][q][ ][1] + C[q][j][ ][0]) return [0][n][ ][0]

Eisner s algorithm Summary Eisner s algorithm is an improvement over Collin s algorithm that runs in time O( w 3 ). The same scoring model can be used. The same technique for extending the parser to labeled parsing can be used, adding O( L w 2 ) to the run time. Eisner s algorithm is the basis of current arc-factored dependency parsers.

Evaluation of dependency parsing

Evaluation of dependency parsers labelled attachment score (LAS): percentage of correct arcs, relative to the gold standard labelled exact match (LEM): percentage of correct dependency trees, relative to the gold standard unlabelled attachment score/exact match (UAS/ UEM): the same, but ignoring arc labels

Word- vs sentence-level AS Example: 2 sentence corpus sentence 1: 9/10 arcs correct sentence 2: 15/45 arcs correct Word-level (micro-average): (9+15)/(10+45) = 24/55 = 0.436 Sentence-level (macro-average): (9/10+15/45)/2 = (0.9+0.33)/2 = 0.617 Word-level attachment score is normally used