Dependency grammar and dependency parsing

Similar documents
Dependency grammar and dependency parsing

Dependency grammar and dependency parsing

Collins and Eisner s algorithms

Transition-based dependency parsing

The CKY algorithm part 1: Recognition

The CKY algorithm part 2: Probabilistic parsing

The CKY algorithm part 1: Recognition

Elements of Language Processing and Learning lecture 3

Ling 571: Deep Processing for Natural Language Processing

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11

Computational Linguistics: Feature Agreement

Assignment 4 CSE 517: Natural Language Processing

Exam Marco Kuhlmann. This exam consists of three parts:

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Parsing with Dynamic Programming

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Dependency Parsing. Allan Jie. February 20, Slides: Allan Jie Dependency Parsing February 20, / 16

CS 224N Assignment 2 Writeup

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Trees Rooted Trees Spanning trees and Shortest Paths. 12. Graphs and Trees 2. Aaron Tan November 2017

Tekniker för storskalig parsning: Dependensparsning 2

Transition-based Parsing with Neural Nets

2 Ambiguity in Analyses of Idiomatic Phrases

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

11-682: Introduction to IR, NLP, MT and Speech. Parsing with Unification Grammars. Reading: Jurafsky and Martin, Speech and Language Processing

Ling 571: Deep Processing for Natural Language Processing

Context-Free Grammars

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

Refresher on Dependency Syntax and the Nivre Algorithm

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

CKY algorithm / PCFGs

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

COMPUTATIONAL SEMANTICS WITH FUNCTIONAL PROGRAMMING JAN VAN EIJCK AND CHRISTINA UNGER. lg Cambridge UNIVERSITY PRESS

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/

SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY. Parser Evaluation Approaches

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Incremental Integer Linear Programming for Non-projective Dependency Parsing

The Application of Constraint Rules to Data-driven Parsing

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Advanced PCFG Parsing

Syntax Analysis. Chapter 4

Online Learning of Approximate Dependency Parsing Algorithms

Advanced PCFG Parsing

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Klein & Manning, NIPS 2002

Projective Dependency Parsing with Perceptron

Context-Free Grammars. Carl Pollard Ohio State University. Linguistics 680 Formal Foundations Tuesday, November 10, 2009

Ortolang Tools : MarsaTag

Stack- propaga+on: Improved Representa+on Learning for Syntax

View and Submit an Assignment in Criterion

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

LL(1) predictive parsing

Homework 2: Parsing and Machine Learning

AUTOMATIC LFG GENERATION

STRUCTURES AND STRATEGIES FOR STATE SPACE SEARCH

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

I Know Your Name: Named Entity Recognition and Structural Parsing

Parsing partially bracketed input

Advanced Topics in Information Retrieval Natural Language Processing for IR & IR Evaluation. ATIR April 28, 2016

The Expectation Maximization (EM) Algorithm

Student Guide for Usage of Criterion

A Collaborative Annotation between Human Annotators and a Statistical Parser

Natural Language Processing

Parsing. Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing. Bottom Up Parsing

Intro to XML. Borrowed, with author s permission, from:

Statistical Parsing for Text Mining from Scientific Articles

Introduction to Lexical Functional Grammar. Wellformedness conditions on f- structures. Constraints on f-structures

Dowty Friday, July 22, 11

A Quick Guide to MaltParser Optimization

Iterative CKY parsing for Probabilistic Context-Free Grammars

COSE312: Compilers. Lecture 1 Overview of Compilers

Introduction to Parsing

Context-Free Grammars

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

Computational Linguistics

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014

Spanning Tree Methods for Discriminative Training of Dependency Parsers

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

An Efficient Implementation of PATR for Categorial Unification Grammar

Proseminar on Semantic Theory Fall 2013 Ling 720 An Algebraic Perspective on the Syntax of First Order Logic (Without Quantification) 1

Grammar Knowledge Transfer for Building RMRSs over Dependency Parses in Bulgarian

Parsing Chart Extensions 1

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

More Theories, Formal semantics

Wh-questions. Ling 567 May 9, 2017

Computational Linguistics (INF2820 TFSs)

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Homework & Announcements

CSE 3302 Programming Languages Lecture 2: Syntax

Transcription:

Dependency grammar and dependency parsing Syntactic analysis (5LN455) 2015-12-09 Sara Stymne Department of Linguistics and Philology Based on slides from Marco Kuhlmann

Activities - dependency parsing 3 lectures (December) 1 literature seminar (January 13) 2 assignments (DL: January 17) Written assignment Try and evaluate a state-of-the-art system, MaltParser Supervision on demand, by email or book a meeting

Overview Dependency parsing in general Arc-factored dependency parsing Collins algorithm Eisner s algorithm Transition-based dependency parsing The arc-standard algorithm Evaluation of dependency parsers

Dependency grammar

Dependency grammar The term dependency grammar does not refer to a specific grammar formalism. Rather, it refers to a specific way to describe the syntactic structure of a sentence.

Dependency grammar The notion of dependency The basic observation behind constituency is that groups of words may act as one unit. Example: noun phrase, prepositional phrase The basic observation behind dependency is that words have grammatical functions with respect to other words in the sentence. Example: subject, modifier

Dependency grammar Phrase structure trees S NP VP Pro Verb NP I booked Det Nom a Nom PP Noun from LA flight

Dependency grammar Dependency trees dobj subj det pmod I booked a flight from LA In an arc h d, the word h is called the head, and the word d is called the dependent. The arcs form a rooted tree. Each arc has a label, l, and an arc can be described as (h, d, l)

Dependency grammar Heads in phrase structure grammar In phrase structure grammar, ideas from dependency grammar can be found in the notion of heads. Roughly speaking, the head of a phrase is the most important word of the phrase: the word that determines the phrase function. Examples: noun in a noun phrase, preposition in a prepositional phrase

Dependency grammar Heads in phrase structure grammar S NP VP Pro Verb NP I booked Det Nom a Nom PP Noun from LA flight

Dependency grammar The history of dependency grammar The notion of dependency can be found in some of the earliest formal grammars. Modern dependency grammar is attributed to Lucien Tesnière (1893 1954). Recent years have seen a revived interest in dependency-based description of natural language syntax.

Dependency grammar Linguistic resources Descriptive dependency grammars exist for some natural languages. Dependency treebanks exist for a wide range of natural languages. These treebanks can be used to train accurate and efficient dependency parsers.

Projectivity An important characteristic of dependency trees is projectivity A dependency tree is projective if: For every arc in the tree, there is a directed path from the head of the arc to all words occurring between the head and the dependent (that is, the arc (i,l,j) implies that i k for every k such that min(i, j) < k < max(i, j))

Projective and non-projective trees

Projectivity and dependency parsing Many dependency parsing algorithms can only handle projective trees Non-projective trees do occur in natural language How often depends on the language (and treebank)

Projectivity in the course The algorithms we will discuss in detail during the lectures will only concern projective parsing Non-projective parsing: Seminar 2: Pseudo-projective parsing Other variants mentioned briefly during the lectures You can read more about it in the course book!

Arc-factored dependency parsing

Ambiguity Just like phrase structure parsing, dependency parsing has to deal with ambiguity. dobj subj det pmod I booked a flight from LA

Ambiguity Just like phrase structure parsing, dependency parsing has to deal with ambiguity. dobj pmod subj det I booked a flight from LA

Disambiguation We need to disambiguate between alternative analyses. We develop mechanisms for scoring dependency trees, and disambiguate by choosing a dependency tree with the highest score.

Scoring models and parsing algorithms Distinguish two aspects: Scoring model: How do we want to score dependency trees? Parsing algorithm: How do we compute a highest-scoring dependency tree under the given scoring model?

The arc-factored model Split the dependency tree t into parts p1,..., pn, score each of the parts individually, and combine the score into a simple sum. score(t) = score(p1) + + score(pn) The simplest scoring model is the arc-factored model, where the scored parts are the arcs of the tree.

Arc-factored dependency parsing Features subj dobj det pmod I booked a flight from LA To score an arc, we define features that are likely to be relevant in the context of parsing. We represent an arc by its feature vector.

Arc-factored dependency parsing Examples of features

Arc-factored dependency parsing Examples of features The head is a verb.

Arc-factored dependency parsing Examples of features The head is a verb. The dependent is a noun.

Arc-factored dependency parsing Examples of features The head is a verb. The dependent is a noun. The head is a verb and the dependent is a noun.

Arc-factored dependency parsing Examples of features The head is a verb. The dependent is a noun. The head is a verb and the dependent is a noun. The head is a verb and the predecessor of the head is a pronoun.

Arc-factored dependency parsing Examples of features The head is a verb. The dependent is a noun. The head is a verb and the dependent is a noun. The head is a verb and the predecessor of the head is a pronoun. The arc goes from left to right.

Arc-factored dependency parsing Examples of features The head is a verb. The dependent is a noun. The head is a verb and the dependent is a noun. The head is a verb and the predecessor of the head is a pronoun. The arc goes from left to right. The arc has length 2.

Arc-factored dependency parsing Feature vectors 1 Feature: The dependent is a noun. 0 0 Feature: The head is a verb. 1

Arc-factored dependency parsing Feature vectors 1 booked flight Feature: The dependent is a noun. 0 flight from LA flight a booked I 0 Feature: The head is a verb. 1

Arc-factored dependency parsing Implementation of feature vectors We assign each feature a unique number. For each arc, we collect the numbers of those features that apply to that arc. The feature vector of the arc is the list of those numbers. Example: [1, 2, 42, 313, 1977, 2008, 2010]

Arc-factored dependency parsing Feature weights Arc-factored dependency parsers require a training phase. During training, our goal is to assign, to each feature fi, a feature weight wi. Intuitively, the weight wi quantifies the effect of the feature fi on the likelihood of the arc. How likely is it that we will see an arc with this feature in a useful dependency tree?

Arc-factored dependency parsing Feature weights We define the score of an arc h d as the weighted sum of all features of that arc: score(h d) = f1w1 + + fnwn

Arc-factored dependency parsing Training using structured prediction Take a sentence w and a gold-standard dependency tree g for w. Compute the highest-scoring dependency tree under the current weights; call it p. Increase the weights of all features that are in g but not in p. Decrease the weights of all features that are in p but not in g.

Arc-factored dependency parsing Training using structured prediction Training involves repeatedly parsing (treebank) sentences and refining the weights. Hence, training presupposes an efficient parsing algorithm.

Arc-factored dependency parsing Higher order models The arc-factored model is a first-order model, because scored subgraphs consist of a single arc. An nth-order model scores subgraphs consisting of (at most) n arcs. Second-order: siblings, grand-parents Third-order: tri-siblings, grand-siblings Higher-order models capture more linguistic structure and give higher parsing accuracy, but are less efficient

Arc-factored dependency parsing Parsing algorithms Projective parsing Inspired by the CKY algorithm Collins algorithm Eisner s algorithm Non-projective parsing: Minimum spanning tree (MST) algorithms

Arc-factored dependency parsing Graph-based parsing Arc-factored parsing is an instance of graph-based dependency parsing Because it scores the dependency graph (tree) Graph-based models are often contrasted with transition-based models (next Wednesday) There are also grammar-based methods, which we will not discuss

Arc-factored dependency parsing Summary The term arc-factored dependency parsing refers to dependency parsers that score a dependency tree by scoring its arcs. Arcs are scored by defining features and assigning weights to these features. The resulting parsers can be trained using structured prediction. More powerful scoring models exist.

Overview Arc-factored dependency parsing Collins algorithm Eisner s algorithm Transition-based dependency parsing The arc-standard algorithm Dependency treebanks Evaluation of dependency parsers

Collins algorithm

Collins algorithm Collin s algorithm is a simple algorithm for computing the highest-scoring dependency tree under an arc-factored scoring model. It can be understood as an extension of the CKY algorithm to dependency parsing. Like the CKY algorithm, it can be characterized as a bottom-up algorithm based on dynamic programming.

Collins algorithm Signatures, CKY C min max [min, max, C]

Collins algorithm Signatures, Collins root min max [min, max, root]

Collins algorithm Initialization I booked a flight from LA 0 I 2 3 4 5

Collins algorithm Initialization I booked a flight from LA 0 I 2 3 4 5 [0, 1, I] [1, 2, booked] [2, 3, a] [3, 4, flight] [4, 5, from LA]

Collins algorithm Adding a left-to-right arc I booked a flight from LA 0 I 2 3 4 5

Collins algorithm Adding a left-to-right arc I booked a flight from LA 0 I 2 3 4 5 [3, 4, flight] [4, 5, from LA]

Collins algorithm Adding a left-to-right arc pmod I booked a flight from LA 0 I 2 3 4 5 [3, 4, flight] [4, 5, from LA]

Collins algorithm Adding a left-to-right arc pmod I booked a flight from LA 0 I 2 3 4 5 [3, 5, flight]

Collins algorithm Adding a left-to-right arc

Collins algorithm Adding a left-to-right arc l r t 1 t 2 min mid max

Collins algorithm Adding a left-to-right arc l r t 1 t 2 min mid max

Collins algorithm Adding a left-to-right arc l t min max

Collins algorithm Adding a left-to-right arc l t min max score(t) = score(t1) + score(t2) + score(l r)

Collins algorithm Adding a left-to-right arc for each [min, max] with max - min > 1 do for each l from min to max - 2 do double best = score[min][max][l] for each r from l + 1 to max - 1 do for each mid from l + 1 to r do t1 = score[min][mid][l] t2 = score[mid][max][r] double current = t1 + t2 + score(l! r) if current > best then best = current score[min][max][l] = best

Collins algorithm Adding a right-to-left arc pmod I booked a flight from LA 0 I 2 3 4 5

Collins algorithm Adding a right-to-left arc pmod I booked a flight from LA 0 I 2 3 4 5 [0, 1, I] [1, 2, booked]

Collins algorithm Adding a right-to-left arc subj pmod I booked a flight from LA 0 I 2 3 4 5 [0, 1, I] [1, 2, booked]

Collins algorithm Adding a right-to-left arc subj pmod I booked a flight from LA 0 I 2 3 4 5 [0, 2, booked]

Collins algorithm Adding a right-to-left arc

Collins algorithm Adding a right-to-left arc l r t 1 t 2 min mid max

Collins algorithm Adding a right-to-left arc l r t 1 t 2 min mid max

Collins algorithm Adding a right-to-left arc r t min max

Collins algorithm Adding a right-to-left arc r t min max score(t) = score(t1) + score(t2) + score(r l)

Collins algorithm Adding a right-to-left arc for each [min, max] with max - min > 1 do for each r from min + 1 to max - 1 do double best = score[min][max][r] for each l from min to r - 1 do for each mid from l + 1 to r do t1 = score[min][mid][l] t2 = score[mid][max][r] double current = t1 + t2 + score(r! l) if current > best then best = current score[min][max][r] = best

Collins algorithm Finishing up subj pmod I booked a flight from LA 0 I 2 3 4 5

Collins algorithm Finishing up subj det pmod I booked a flight from LA 0 I 2 3 4 5 [2, 3, a] [3, 5, flight]

Collins algorithm Finishing up subj det pmod I booked a flight from LA 0 I 2 3 4 5 [2, 5, flight]

Collins algorithm Finishing up dobj subj det pmod I booked a flight from LA 0 I 2 3 4 5 [0, 2, booked] [2, 5, flight]

Collins algorithm Finishing up dobj subj det pmod I booked a flight from LA 0 I 2 3 4 5 [0, 5, booked]

Collins algorithm Complexity analysis Runtime? Space? for each [min, max] with max - min > 1 do for each r from min + 1 to max - 1 do double best = score[min][max][r] for each l from min to r - 1 do for each mid from l + 1 to r do t1 = score[min][mid][l] t2 = score[mid][max][r] double current = t1 + t2 + score(r! l) if current > best then best = current score[min][max][r] = best

Collins algorithm Complexity analysis Runtime? Space? for each [min, max] with max - min > 1 do l r for each r from min + 1 to max - 1 do double best = score[min][max][r] for each l from min to r - 1 do for each mid from l + 1 to r do t 1 t 2 t1 = score[min][mid][l] t2 = score[mid][max][r] min mid max double current = t1 + t2 + score(r! l) if current > best then best = current score[min][max][r] = best

Collins algorithm Complexity analysis Runtime? Space? for each [min, max] with max - min > 1 do l r for each r from min + 1 to max - 1 do double best = score[min][max][r] for each l from min to r - 1 do for each mid from l + 1 to r do t 1 t 2 t1 = score[min][mid][l] t2 = score[mid][max][r] min mid max double current = t1 + t2 + score(r! l) if current > best then best = current score[min][max][r] = best

Collins algorithm Complexity analysis Space requirement: O( w 3 ) Runtime requirement: O( w 5 )

Collins algorithm Summary Collins algorithm is a CKY-style algorithm for computing the highest-scoring dependency tree under an arc-factored scoring model. It runs in time O( w 5 ). This may not be practical for long sentences. We have not discussed labels yet - we will do that next lecture