Introduction to Data-Driven Dependency Parsing

Similar documents
Transition-Based Dependency Parsing with MaltParser

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Sorting Out Dependency Parsing

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Sorting Out Dependency Parsing

Non-Projective Dependency Parsing in Expected Linear Time

Tekniker för storskalig parsning: Dependensparsning 2

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

The Application of Constraint Rules to Data-driven Parsing

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

A Quick Guide to MaltParser Optimization

CS395T Project 2: Shift-Reduce Parsing

Discriminative Classifiers for Deterministic Dependency Parsing

Dependency Parsing L545. With thanks to Joakim Nivre and Sandra Kübler. Dependency Parsing 1(70)

Improving Transition-Based Dependency Parsing with Buffer Transitions

Transition-based dependency parsing

Automatic Discovery of Feature Sets for Dependency Parsing

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

Undirected Dependency Parsing

Dependency Parsing domain adaptation using transductive SVM

A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

Dependency Parsing. Johan Aulin D03 Department of Computer Science Lund University, Sweden

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Refresher on Dependency Syntax and the Nivre Algorithm

Assignment 4 CSE 517: Natural Language Processing

Optimistic Backtracking A Backtracking Overlay for Deterministic Incremental Parsing

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Transition-based Dependency Parsing with Rich Non-local Features

Statistical Dependency Parsing

A Dynamic Confusion Score for Dependency Arc Labels

Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser

Dependency Parsing. Ganesh Bhosale Neelamadhav G Nilesh Bhosale Pranav Jawale under the guidance of

MaltOptimizer: A System for MaltParser Optimization

Managing a Multilingual Treebank Project

Transition-based Parsing with Neural Nets

NLP in practice, an example: Semantic Role Labeling

Incremental Integer Linear Programming for Non-projective Dependency Parsing

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Online Service for Polish Dependency Parsing and Results Visualisation

Base Noun Phrase Chunking with Support Vector Machines

Online Learning of Approximate Dependency Parsing Algorithms

EDAN20 Language Technology Chapter 13: Dependency Parsing

Homework 2: Parsing and Machine Learning

Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model

Supplementary A. Built-in transition systems

Dependency Parsing with Undirected Graphs

arxiv: v1 [cs.cl] 25 Apr 2017

Density-Driven Cross-Lingual Transfer of Dependency Parsers

An Integrated Digital Tool for Accessing Language Resources

Dynamic Programming Algorithms for Transition-Based Dependency Parsers

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Spanning Tree Methods for Discriminative Training of Dependency Parsers

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11

splitsvm: Fast, Space-Efficient, non-heuristic, Polynomial Kernel Computation for NLP Applications

Syntax Analysis Part I

Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Easy-First POS Tagging and Dependency Parsing with Beam Search

Dependency grammar and dependency parsing

Dynamic Feature Selection for Dependency Parsing

An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing

Exploring Automatic Feature Selection for Transition-Based Dependency Parsing

Voting between Multiple Data Representations for Text Chunking

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Parsing partially bracketed input

Chunking with Support Vector Machines

Non-Projective Dependency Parsing with Non-Local Transitions

Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

A Transition-based Algorithm for AMR Parsing

Incremental Integer Linear Programming for Non-projective Dependency Parsing

Text Classification and Clustering Using Kernels for Structured Data

A Deductive Approach to Dependency Parsing

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

Graphs. Part I: Basic algorithms. Laura Toma Algorithms (csci2200), Bowdoin College

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

ANC2Go: A Web Application for Customized Corpus Creation

Introduction to Lexical Analysis

Shared Task Introduction

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Lexicalized Semi-Incremental Dependency Parsing

Non-projective Dependency Parsing using Spanning Tree Algorithms

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Syntactic N-grams as Machine Learning. Features for Natural Language Processing. Marvin Gülzow. Basics. Approach. Results.

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

Graph Algorithms Using Depth First Search

Lexicalized Semi-Incremental Dependency Parsing

Testing parsing improvements with combination and translation in Evalita 2014

Exam Marco Kuhlmann. This exam consists of three parts:

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

October 19, 2004 Chapter Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Searn in Practice. Hal Daumé III, John Langford and Daniel Marcu

HC-search for Incremental Parsing

Langforia: Language Pipelines for Annotating Large Collections of Documents

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Structured Learning. Jun Zhu

WebAnno: a flexible, web-based annotation tool for CLARIN

Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging

Transcription:

Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University, Sweden E-mail: nivre@msi.vxu.se Introduction to Data-Driven Dependency Parsing 1(29)

Introduction Overview of the Course Dependency parsing (Joakim) Machine learning methods (Ryan) Transition-based models (Joakim) Graph-based models (Ryan) Loose ends (Joakim, Ryan): Other approaches Empirical results Available software Introduction to Data-Driven Dependency Parsing 2(29)

Introduction Notation Reminder Sentence x = w 0,w 1,...,w n, with w 0 = root L = {l 1,...,l L } set of permissible arc labels Let G = (V,A) be a dependency graph for sentence x where: V = {0, 1,..., n} is the vertex set A is the arc set, i.e., (i, j, k) A represents a dependency from w i to w j with label l k L By the usual definition, G is a tree Introduction to Data-Driven Dependency Parsing 3(29)

Introduction Data-Driven Parsing Goal: Learn a good predictor of dependency graphs Input: x Output: dependency graph/tree G This lecture: Parameterize parsing by transitions Learn to predict transitions given the input and a history Predict new graphs using deterministic parsing algorithm Next lecture: Parameterize parsing by dependency arcs Learn to predict entire graphs given the input Predict new graphs using spanning tree algorithms Introduction to Data-Driven Dependency Parsing 4(29)

Introduction Lecture 3: Outline Transition systems Deterministic classifier-based models Parsing algorithm Stack-based and list-based transition systems Classifier-based parsing Pseudo-projective parsing Introduction to Data-Driven Dependency Parsing 5(29)

Transitions Systems Transition Systems A transition system for dependency parsing is a quadruple S = (C,T,c s,c t ), where 1. C is a set of configurations, each of which contains a buffer β of (remaining) nodes and a set A of dependency arcs, 2. T is a set of transitions, each of which is a (partial) function t : C C, 3. c s is an initialization function, mapping a sentence x = w 0, w 1,...,w n to a configuration with β = [1,...,n], 4. C t C is a set of terminal configurations. Note: A configuration represents a parser state. A transition represents a parsing action (parser state update). Introduction to Data-Driven Dependency Parsing 6(29)

Transitions Systems Transition Sequences Let S = (C,T,c s,c t ) be a transition system. A transition sequence for a sentence x = w 0,w 1,...,w n in S is a sequence C 0,m = (c 0,c 1,...,c m ) of configurations, such that 1. c 0 = c s (x), 2. c m C t, 3. for every i (1 i m), c i = t(c i 1 ) for some t T. The parse assigned to x by C 0,m is the dependency graph G cm = ({0,1,...,n},A cm ), where A cm is the set of dependency arcs in c m. Introduction to Data-Driven Dependency Parsing 7(29)

Parsing Algorithm Deterministic Parsing An oracle for a transition system S = (C,T,c s,c t ) is a function o : C T. Given a transition system S = (C,T,c s,c t ) and an oracle o, deterministic parsing can be achieved by the following simple algorithm: Parse(x = (w 0,w 1,...,w n )) 1 c c s (x) 2 while c C t 3 c = [o(c)](c) 4 return G c NB: Oracles can be approximated by classifiers (cf. lecture 2). Introduction to Data-Driven Dependency Parsing 8(29)

Stack-Based Transition Systems A stack-based configuration for a sentence x = w 0,w 1,...,w n is a triple c = (σ,β,a), where 1. σ is a stack of tokens i m (for some m n), 2. β is a buffer of tokens j > m, 3. A is a set of dependency arcs such that G = ({0, 1,...,n}, A) is a dependency graph for x. A stack-based transition system is a quadruple S = (C,T,c s,c t ), where 1. C is the set of all stack-based configurations, 2. c s (x = w 0, w 1,... w n ) = ([0], [1,...,n], ), 3. T is a set of transitions, each of which is a function t : C C, 4. C t = {c C c = (σ, [ ], A)}. Notation: σ i = stack with top i ( left-associative) i β = buffer with next token i ( right-associative) Introduction to Data-Driven Dependency Parsing 9(29)

Shift-Reduce Dependency Parsing Transitions: Left-Arck : (σ i,j β, A) (σ,j β, A {(j, i,k)}) Right-Arck : (σ i,j β, A) (σ,i β, A {(i, j, k)}) Shift: (σ,i β, A) (σ i, β, A) Preconditions: Left-Arck : [i = 0] i k [(i, i,k ) A] Right-Arck : i k [(i, j,k ) A] Introduction to Data-Driven Dependency Parsing 10(29)

Example: Shift-Reduce Parsing [root 0 ] σ [Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing [root 0 Economic 1 ] σ [news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing [root 0 ] σ Economic 1 [news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Left-Arc Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing [root 0 Economic 1 news 2 ] σ [had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 ] σ Economic 1 news 2 [had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Left-Arc sbj Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 Economic 1 news 2 had 3 ] σ [little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 Economic 1 news 2 had 3 little 4 ] σ [effect 5 on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 Economic 1 news 2 had 3 ] σ little 4 [effect 5 on 6 financial 7 markets 8. 9 ] β Left-Arc Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 ] σ [on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 ] σ [financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 ] σ [markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 ] σ financial 7 [markets 8. 9 ] β Left-Arc Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing pc sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 ] σ [on 6 financial 7 markets 8. 9 ] β Right-Arc pc Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing pc sbj [root 0 Economic 1 news 2 had 3 ] σ little 4 [effect 5 on 6 financial 7 markets 8. 9 ] β Right-Arc Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing obj pc sbj [root 0 ] σ Economic 1 news 2 [had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Right-Arc obj Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing pred obj pc sbj [root 0 ] σ Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8 [. 9 ] β Right-Arc pred Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing p pred obj pc sbj [] σ [root 0 ] β Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 Right-Arc p Introduction to Data-Driven Dependency Parsing 11(29)

Example: Shift-Reduce Parsing p pred obj pc sbj [root 0 ] σ [ ] β Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 Shift Introduction to Data-Driven Dependency Parsing 11(29)

Theoretical Results Complexity: Deterministic shift-reduce parsing has time and space complexity O(n), where n is the length of the input sentence. Correctness: For every transition sequence C0,m, G cm is a projective dependency forest (soundness). For every projective dependency forest G, there is a transition sequence C 0,m such that G cm = G (completeness). Note: A dependency forest is (here) a dependency graph satisfying Root, Single-Head, and Acyclicity (but not Connectedness). A dependency forest G = (V, A) can be transformed into a dependency tree by adding arcs of the form (0, i, k) (for some l k L) for every root i V (i 0). Introduction to Data-Driven Dependency Parsing 12(29)

Variations on Shift-Reduce Parsing Empty stack initialization: If we can assume that there is only one node i such that (0, i, k) A, then we can reduce ambiguity by starting with an empty stack (and adding the arc (0, i, k) after termination). Iterative parsing [Yamada and Matsumoto 2003]: Same transition system (with empty stack initialization) 1 Given a terminal configuration: (σ,[], A) = ([], σ, A) Terminate when A has not been modified in the last iteration. Modified transition systems: Arc-eager parsing [Nivre 2003] Non-projective parsing [Attardi 2006] 1 NB: Left-Arc Right, Right-Arc Left Introduction to Data-Driven Dependency Parsing 13(29)

Arc-Eager Parsing Transitions: Left-Arck : (σ i,j β, A) (σ,j β, A {(j, i,k)}) Right-Arck : (σ i,j β, A) (σ i j, β,a {(i, j, k)}) Reduce: (σ i, β, A) (σ, β, A) Shift: (σ,i β, A) (σ i, β, A) Preconditions: Left-Arck : [i = 0] i k [(i, i,k ) A] Right-Arck : i k [(i, j,k ) A] Reduce: i k [(i,i, k ) A] Introduction to Data-Driven Dependency Parsing 14(29)

Example: Arc-Eager Parsing [root 0 ] σ [Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing [root 0 Economic 1 ] σ [news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing [root 0 ] σ Economic 1 [news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Left-Arc Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing [root 0 Economic 1 news 2 ] σ [had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing sbj [root 0 ] σ Economic 1 news 2 [had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Left-Arc sbj Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred sbj [root 0 Economic 1 news 2 had 3 ] σ [little 4 effect 5 on 6 financial 7 markets 8. 9 ] β Right-Arc pred Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred sbj [root 0 Economic 1 news 2 had 3 little 4 ] σ [effect 5 on 6 financial 7 markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred sbj [root 0 Economic 1 news 2 had 3 ] σ little 4 [effect 5 on 6 financial 7 markets 8. 9 ] β Left-Arc Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred sbj obj [root 0 Economic 1 news 2 had 3 little 4 effect 5 ] σ [on 6 financial 7 markets 8. 9 ] β Right-Arc obj Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred sbj obj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 ] σ [financial 7 markets 8. 9 ] β Right-Arc Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred sbj obj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 ] σ [markets 8. 9 ] β Shift Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred obj sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 ] σ financial 7 [markets 8. 9 ] β Left-Arc Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred obj pc sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8 ] σ [. 9 ] β Right-Arc pc Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred obj pc sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 ] σ financial 7 markets 8 [. 9 ] β Reduce Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred obj sbj pc [root 0 Economic 1 news 2 had 3 little 4 effect 5 ] σ on 6 financial 7 markets 8 [. 9 ] β Reduce Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred obj pc sbj [root 0 Economic 1 news 2 had 3 ] σ little 4 effect 5 on 6 financial 7 markets 8 [. 9 ] β Reduce Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing pred obj pc sbj [root 0 ] σ Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8 [. 9 ] β Reduce Introduction to Data-Driven Dependency Parsing 15(29)

Example: Arc-Eager Parsing p pred obj pc sbj [root 0 Economic 1 news 2 had 3 little 4 effect 5 on 6 financial 7 markets 8. 9 ] σ [] β Right-Arc p Introduction to Data-Driven Dependency Parsing 15(29)

Non-Projective Parsing New transitions: NP-Left-Arck : (σ i i, j β,a) (σ i,j β, A {(j, i,k)}) NP-Right-Arck : (σ i i, j β,a) (σ i, i β,a {(i, j, k)}) Handles most naturally occurring non-projective dependency relations (94% in the Prague Dependency Treebank). root0 AuxK AuxP AuxP Pred Sb Atr AuxZ Z1 nich2 je3 jen4 jedna5 (Out-of them is only one-fem-sg ( Only one of them concerns quality. ) na6 to Adv kvalitu7 quality.8.) More expressive extensions are possible [Attardi 2006]. Introduction to Data-Driven Dependency Parsing 16(29)

Comparing Algorithms Expressivity: Arc-standard and arc-eager shift-reduce parsing is limited to projective depedendency graphs. Simple extensions can handle a subset of non-projective dependency graphs. Complexity: Space complexity is O(n) for all deterministic parsers (even with simple extensions). Time complexity is O(n) for single-pass parsers, O(n 2 ) for iterative parsers. More complex extensions to handle non-projective dependency graphs will affect time complexity. Introduction to Data-Driven Dependency Parsing 17(29)

List-Based Transition Systems List-Based Transition Systems A list-based configuration for a sentence x = w 0,w 1,...,w n is a quadruple c = (λ 1,λ 2,β,A), where 1. λ 1 is a list of tokens i 1 m 1 (for some m 1 n), 2. λ 2 is a list of tokens i 2 m 2 (for some m 2, m 1 < m 2 n), 3. β is a buffer of tokens j > m 2, 4. A is a set of dependency arcs such that G = ({0, 1,...,n), A) is a dependency graph for x. A list-based transition system is a quadruple S = (C,T,c s,c t ), where 1. C is the set of all list-based configurations, 2. c s (x = w 0, w 1,... w n ) = ([0], [ ], [1,...,n], ), 3. T is a set of transitions, each of which is a function t : C C, 4. C t = {c C c = (λ 1, λ 2, [ ], A)}. Notation: λ1 i = list with head i and tail λ 1 ( left-associative) i λ2 = i and tail λ 2 ( right-associative) Introduction to Data-Driven Dependency Parsing 18(29)

List-Based Transition Systems Non-Projective Parsing Transitions: Left-Arck : (λ 1 i, λ 2, j β,a) (λ 1,i λ 2,j β, A {(j, i, k)}) Right-Arck : (λ 1 i, λ 2, j β,a) (λ 1,i λ 2,j β, A {(i, j,k)}) No-Arc: (λ 1 i, λ 2, β,a) (λ 1,i λ 2, β, A) Shift: (λ 1, λ 2, i β, A) (λ 1.λ 2 i, [], β,a) Preconditions: Left-Arc: [i = 0] i k [(i, k,i) A] [i j] A Right-Arc: i k [(i, k,j) A] [j i] A Introduction to Data-Driven Dependency Parsing 19(29)

List-Based Transition Systems Projective Parsing Transitions: Left-Arck : (λ 1 i, λ 2, j β,a) (λ 1, λ 2,j β, A {(j, i,k)}) Right-Arck : (λ 1 i, λ 2, j β,a) (λ 1 i j, [], β,a {(i, k,j)}) No-Arc: (λ 1 i, λ 2, β,a) (λ 1,i λ 2, β, A) Shift: (λ 1, λ 2, i β, A) (λ 1.λ 2 i, [], β,a) Preconditions: Left-Arc: [i = 0] i k [(i, k,i) A] Right-Arc: i k [(i, k,j) A] No-Arc: i k[(i,k,i) A] Introduction to Data-Driven Dependency Parsing 20(29)

List-Based Transition Systems Theoretical Results Complexity: Deterministic list-based parsing has time complexity O(n 2 ) and space complexity O(n), where n is the length of the input sentence. Correctness: For every transition sequence C0,m, G cm is a (projective) dependency forest (soundness). For every (projective) dependency forest G, there is a transition sequence C 0,m such that G cm = G (completeness). Introduction to Data-Driven Dependency Parsing 21(29)

Classifier-Based Parsing Classifier-Based Parsing Data-driven deterministic parsing: Deterministic parsing requires an oracle. An oracle can be approximated by a classifier. A classifier can be trained using treebank data. Learning problem: Approximate a function from configurations (represented by feature vectors) to transitions, given a training set of (gold standard) transition sequences. Three issues: How do we represent configurations by feature vectors? How do we derive training data from treebanks? How do we learn classifiers? Introduction to Data-Driven Dependency Parsing 22(29)

Classifier-Based Parsing Feature Representations A feature representation f(c) of a configuration c is a vector of simple features f i (c). Typical features are defined in terms of attributes of nodes in the dependency graph. Nodes: Target nodes (top of σ, head of λ 1, λ 2, β) Linear context (neighbors in σ, λ 1, λ 2, or β) Structural context (parents, children, siblings given A) Attributes: Word form (and/or lemma) Part-of-speech (and morpho-syntactic features) Dependency type (if labeled) Distance (between target tokens) Introduction to Data-Driven Dependency Parsing 23(29)

Classifier-Based Parsing A Typical Model [Nivre et al. 2006] h j [... w i w j ] σ [ w k w k+1 w k+2 w k+3...] β l j r j l k FORM + + + + LEMMA + + CPOS + + POS + + + + + + FEATS + + DEPREL + + + + Introduction to Data-Driven Dependency Parsing 24(29)

Classifier-Based Parsing Training Data Training instances have the form (f(c),t), where 1. f(c) is a feature representation of a configuration c, 2. t is the correct transition out of c (i.e., o(c) = t). Given a dependency treebank, we can sample the oracle function o as follows: For each sentence x with (gold standard) dependency graph G x, we construct a transition sequence C 0,m = (c 0, c 1,...,c m ) such that 1. c 0 = c s(x), 2. G cm = G x, For each configuration ci (i < m), we construct a training instance (f(c i ), t i ), where t i (c i ) = c i+1. Introduction to Data-Driven Dependency Parsing 25(29)

Classifier-Based Parsing Learning Classifiers Learning methods: Support vector machines (SVM) [Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Isozaki et al. 2004, Cheng et al. 2004, Nivre et al. 2006] Polynomial kernel (d 2) Different techniques for multiclass classification Training efficiency problematic for large data sets Memory-based learning (MBL) [Nivre et al. 2004, Nivre and Scholz 2004, Attardi 2006] k-nn classification Different distance functions Parsing efficiency problematic for large data sets Maximum entropy modeling (MaxEnt) [Cheng et al. 2005, Attardi 2006] Extremely efficient parsing Slightly less accurate Introduction to Data-Driven Dependency Parsing 26(29)

Pseudo-Projective Parsing Pseudo-Projective Parsing Technique for non-projective dependency parsing with a data-driven projective parser [Nivre and Nilsson 2005]. Four steps: 1. Projectivize dependency graphs in training data, encoding information about transformations in augmented arc labels. 2. Train projective parser (as usual). 3. Parse new sentences using projective parser (as usual). 4. Deprojectivize output dependency graphs by heuristic transformations guided by augmented arc labels. Introduction to Data-Driven Dependency Parsing 27(29)

Pseudo-Projective Parsing Pseudo-Projective Parsing Projectivize training data: Projective head nearest permissible ancestor of real head Arc label extended with dependency type of real head AuxP AuxK Pred AuxP Atr Sb AuxZ Adv root Z nich je jen jedna na kvalitu. (out-of) (them) (is) (only) (one) (to) (quality) Introduction to Data-Driven Dependency Parsing 28(29)

Pseudo-Projective Parsing Pseudo-Projective Parsing Projectivize training data: Projective head nearest permissible ancestor of real head Arc label extended with dependency type of real head AuxP AuxK Pred AuxP Sb Atr AuxP Sb AuxZ Adv root Z nich je jen jedna na kvalitu. (out-of) (them) (is) (only) (one) (to) (quality) Introduction to Data-Driven Dependency Parsing 28(29)

Pseudo-Projective Parsing Pseudo-Projective Parsing Deprojectivize parser output: Top-down, breadth-first search for real head Search constrained by extended arc label AuxK Pred AuxP Sb Atr AuxP Sb AuxZ Adv root Z nich je jen jedna na kvalitu. (out-of) (them) (is) (only) (one) (to) (quality) Introduction to Data-Driven Dependency Parsing 28(29)

Pseudo-Projective Parsing Pseudo-Projective Parsing Deprojectivize parser output: Top-down, breadth-first search for real head Search constrained by extended arc label AuxP AuxK Pred AuxP Sb Atr AuxP Sb AuxZ Adv root Z nich je jen jedna na kvalitu. (out-of) (them) (is) (only) (one) (to) (quality) Introduction to Data-Driven Dependency Parsing 28(29)

Summary Summary Transition-based Methods Transition systems Deterministic classifier-based parsing Parsing algorithm Stack-based and list-based transitions systems Classifier-based parsing Pseudo-projective parsing Introduction to Data-Driven Dependency Parsing 29(29)

References and Further Reading References and Further Reading Giuseppe Attardi. 2006. Experiments with a multilanguage non-projective dependency parser. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), pages 166 170. Yuchang Cheng, Masayuki Asahara, and Yuji Matsumoto. 2004. Deterministic dependency structure analyzer for Chinese. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP), pages 500 508. Yuchang Cheng, Masayuki Asahara, and Yuji Matsumoto. 2005. Machine learning-based dependency analyzer for Chinese. In Proceedings of International Conference on Chinese Computing (ICCC), pages 66 73. Hideki Isozaki, Hideto Kazawa, and Tsutomu Hirao. 2004. A deterministic word dependency analyzer enhanced with preference learning. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), pages 275 281. Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of the Sixth Workshop on Computational Language Learning (CoNLL), pages 63 69. Joakim Nivre and Jens Nilsson. 2005. Introduction to Data-Driven Dependency Parsing 29(29)

References and Further Reading Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 99 106. Joakim Nivre and Mario Scholz. 2004. Deterministic dependency parsing of English text. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), pages 64 70. Joakim Nivre, Johan Hall, and Jens Nilsson. 2004. Memory-based dependency parsing. In Hwee Tou Ng and Ellen Riloff, editors, Proceedings of the 8th Conference on Computational Natural Language Learning (CoNLL), pages 49 56. Joakim Nivre, Johan Hall, Jens Nilsson, Gülsen Eryiugit, and Svetoslav Marinov. 2006. Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL), pages 221 225. Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. In Gertjan Van Noord, editor, Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pages 149 160. Hiroyasu Yamada and Yuji Matsumoto. 2003. Introduction to Data-Driven Dependency Parsing 29(29)

References and Further Reading Statistical dependency analysis with support vector machines. In Gertjan Van Noord, editor, Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pages 195 206. Introduction to Data-Driven Dependency Parsing 29(29)