Transition-based dependency parsing

Similar documents
Collins and Eisner s algorithms

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

The CKY algorithm part 1: Recognition

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Homework 2: Parsing and Machine Learning

Exam Marco Kuhlmann. This exam consists of three parts:

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

The CKY algorithm part 1: Recognition

Assignment 4 CSE 517: Natural Language Processing

Transition-based Parsing with Neural Nets

The CKY algorithm part 2: Probabilistic parsing

EDAN20 Language Technology Chapter 13: Dependency Parsing

A Quick Guide to MaltParser Optimization

The Application of Constraint Rules to Data-driven Parsing

Introduction to Data-Driven Dependency Parsing

Non-Projective Dependency Parsing in Expected Linear Time

Undirected Dependency Parsing

CS395T Project 2: Shift-Reduce Parsing

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

Context Encoding LSTM CS224N Course Project

Dependency Parsing. Johan Aulin D03 Department of Computer Science Lund University, Sweden

Stack- propaga+on: Improved Representa+on Learning for Syntax

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Parsing with Dynamic Programming

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11

A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

Dependency Parsing. Allan Jie. February 20, Slides: Allan Jie Dependency Parsing February 20, / 16

NLP in practice, an example: Semantic Role Labeling

Refresher on Dependency Syntax and the Nivre Algorithm

LR Parsing Techniques

Lexical and Syntax Analysis. Bottom-Up Parsing

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

Transition-based Dependency Parsing with Rich Non-local Features

Backpropagating through Structured Argmax using a SPIGOT

Transition-Based Dependency Parsing with MaltParser

Dependency Parsing domain adaptation using transductive SVM

UNIT III & IV. Bottom up parsing

Online Service for Polish Dependency Parsing and Results Visualisation

Exploring Automatic Feature Selection for Transition-Based Dependency Parsing

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Elements of Language Processing and Learning lecture 3

Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Monday, September 13, Parsers

Easy-First POS Tagging and Dependency Parsing with Beam Search

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

Supplementary A. Built-in transition systems

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Top down vs. bottom up parsing

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Syntax Analysis Part I

A Transition-based Algorithm for AMR Parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Modeling Sequence Data

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

How do LL(1) Parsers Build Syntax Trees?

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

LL(1) predictive parsing

UNIT-III BOTTOM-UP PARSING

The Expectation Maximization (EM) Algorithm

Wednesday, August 31, Parsers

Advanced PCFG Parsing

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Chapter 4: LR Parsing

CS 4120 Introduction to Compilers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Sorting Out Dependency Parsing

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Wednesday, September 9, 15. Parsers

Sorting Out Dependency Parsing

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Density-Driven Cross-Lingual Transfer of Dependency Parsers

Lecture Bottom-Up Parsing

Dynamic Programming Algorithms for Transition-Based Dependency Parsers

Projective Dependency Parsing with Perceptron

Dynamic Feature Selection for Dependency Parsing

Dependency Parsing. Ganesh Bhosale Neelamadhav G Nilesh Bhosale Pranav Jawale under the guidance of

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

LR Parsing Techniques

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Natural Language Processing with Deep Learning CS224N/Ling284

Tekniker för storskalig parsning: Dependensparsning 2

Syntax-Directed Translation. Lecture 14

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Machine Learning in GATE

Algorithms for NLP. LR Parsing. Reading: Hopcroft and Ullman, Intro. to Automata Theory, Lang. and Comp. Section , pp.

Intro to Bottom-up Parsing. Lecture 9

SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY. Parser Evaluation Approaches

Advanced PCFG Parsing

CS5371 Theory of Computation. Lecture 8: Automata Theory VI (PDA, PDA = CFG)

Transcription:

Transition-based dependency parsing Syntactic analysis (5LN455) 2014-12-18 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann

Overview Arc-factored dependency parsing Collins algorithm Eisner s algorithm Transition-based dependency parsing The arc-standard algorithm Evaluation of dependency parsers Projectivity

Transition-based dependency parsing

Transition-based dependency parsing Eisner s algorithm runs in time O( w 3 ). This may be too much if a lot of data is involved. Idea: Design a dumber but really fast algorithm and let the machine learning do the rest. Eisner s algorithm searches over many different dependency trees at the same time. A transition-based dependency parser only builds one tree, in one left-to-right sweep over the input.

Transition-based dependency parsing Transition-based dependency parsing The parser starts in an initial configuration. At each step, it asks a guide to choose between one of several transitions (actions) into new configurations. Parsing stops if the parser reaches a terminal configuration. The parser returns the dependency tree associated with the terminal configuration.

Transition-based dependency parsing Generic parsing algorithm Configuration c = parser.getinitialconfiguration(sentence) while c is not a terminal configuration do Transition t = guide.getnexttransition(c) c = c.maketransition(t) return c.getgraph()

Transition-based dependency parsing Variation Transition-based dependency parsers differ with respect to the configurations and the transitions that they use.

Transition-based dependency parsing Guides We need a guide that tells us what the next transition should be. The task of the guide can be understood as classification: Predict the next transition (class), given the current configuration.

Transition-based dependency parsing Training a guide We let the parser run on gold-standard trees. Every time there is a choice to make, we simply look into the tree and do the right thing. We collect all (configuration, transition) pairs and train a classifier on them. When parsing unseen sentences, we use the trained classifier as a guide.

Transition-based dependency parsing Training a guide The number of (configuration, transition) pairs is far too large. We define a set of features of configurations that we consider to be relevant for the task of predicting the next transition. Example: word forms of the topmost two words on the stack and the next two words in the buffer We can then describe every configuration in terms of a feature vector.

Transition-based dependency parsing Training a guide configurations in which we want to do la score for feature 2 configurations in which we want to do ra score for feature 1

Transition-based dependency parsing Training a guide score for feature 2 la ra classification function learned by the classifier score for feature 1

Transition-based dependency parsing Training a guide In practical systems, we have thousands of features and hundreds of transitions. There are several machine-learning paradigms that can be used to train a guide for such a task. Examples: perceptron, decision trees, support-vector machines, memory-based learning

Transition-based dependency parsing Example features Attributes Adress FORM LEMMA POS FEATS DEPREL Stack[0] X X X X Stack[1] X Ldep(Stack[0]) X Rdep(Stack[0]) X Buffer[0] X X X X Buffer[1] X Ldep(Buffer[0]) X Ldep(Buffer[0]) X... Combinations of addresses and attributes (e.g. those marked in the table) Other features, such as distances, number of children,...

The arc-standard algorithm

The arc-standard algorithm The arc-standard algorithm is a simple algorithm for transition-based dependency parsing. It is very similar to shift reduce parsing as it is known for context-free grammars. It is implemented in most practical transitionbased dependency parsers, including MaltParser.

The arc-standard algorithm Configurations A configuration for a sentence w = w1 wn consists of three components: a buffer containing words of w a stack containing words of w the dependency graph constructed so far

The arc-standard algorithm Configurations Initial configuration: All words are in the buffer. The stack is empty. The dependency graph is empty. Terminal configuration: The buffer is empty. The stack contains a single word.

The arc-standard algorithm Possible transitions shift (sh): push the next word in the buffer onto the stack left-arc (la): add an arc from the topmost word on the stack, s1, to the second-topmost word, s2, and pop s2 right-arc (ra): add an arc from the second-topmost word on the stack, s2, to the topmost word, s1, and pop s1

The arc-standard algorithm Example run Stack Buffer I booked a flight from LA I booked a flight from LA sh

The arc-standard algorithm Example run Stack Buffer I booked a flight from LA I booked a flight from LA sh

The arc-standard algorithm Example run Stack Buffer I booked a flight from LA I booked a flight from LA la-subj

The arc-standard algorithm Example run Stack Buffer booked a flight from LA subj I booked a flight from LA sh

The arc-standard algorithm Example run Stack Buffer booked a flight from LA subj I booked a flight from LA sh

The arc-standard algorithm Example run Stack Buffer booked a flight from LA subj I booked a flight from LA la-det

The arc-standard algorithm Example run Stack Buffer booked flight from LA subj det I booked a flight from LA sh

The arc-standard algorithm Example run Stack Buffer booked flight from LA subj det I booked a flight from LA ra-pmod

The arc-standard algorithm Example run Stack Buffer booked flight subj det pmod I booked a flight from LA ra-dobj

The arc-standard algorithm Example run Stack Buffer booked dobj subj det pmod I booked a flight from LA done!

Evaluation of dependency parsing

Evaluation of dependency parsers labelled attachment score: percentage of correct arcs, relative to the gold standard labelled exact match: percentage of correct dependency trees, relative to the gold standard unlabelled attachment score/exact match: the same, but ignoring arc labels

Word- vs sentence-level evaluation Example: 2 sentence corpus sentence 1: 9/10 arcs correct sentence 2: 15/45 arcs correct Word-level (micro-average): (9+15)/(10+45) = 24/55 = 0.436 Sentence-level (macro-average): (9/10+15/45)/2 = (0.9+0.33)/2 = 0.617

Projectivity

Projectivity A dependency tree is projective if: For every arc in the tree, there is a directed path from the head of the arc to all words occurring between the head and the dependent (that is, the arc (i,l,j) implies that i k for every k such that min(i, j) < k < max(i, j))

Projective and non-projective trees

Projectivity and dependency parsing Many dependency parsing algorithms can only handle projective trees Including all algorithms we have discussed Non-projective trees do occur in natural language How often depends on language (and treebank)

Non-projective dependency parsing Variants of transition-based parsing Using a swap-transition Pseudo-projective parsing Graph-based parsing Minimum spanning tree algorithms...

Summary In transition-based dependency parsing one does not score graphs but computations, sequences of (configuration, transition) pairs. In its simplest form, transition-based dependency parsing uses classification. One specific instance of transition-based dependency parsing is the arc-standard algorithm.