Refresher on Dependency yntax and Nivre Algorithm Richard Johansson 1 Introduction This document gives more details about some important topics that re discussed very quickly during lecture: dependency syntax and Nivre algorithm for automatic dependency parsing 2 Refresher on dependency syntax In a dependency representation of syntax, represent grammatical structure of sentence as a tree, where have links dependency arcs beten word tokens in sentence If have a dependency arc going from word H to word D, n say that H is head of D, and that D is dependent ofh Typically, if have a link fromh tod, n this means thath grammatically dominates D somehow, such as a verb being head of its subject, and a noun being head of its determiner To simplify algorithmic processing, add a special dummy root token before sentence We connect main word of sentence (typically a finite verb) to this dummy We may have a dependency label on each dependency arc These labels represent grammatical functions such as subject, object, adverbial, determiner, and so on Although will refer to se grammatical functions (such as saying that noun is subject of verb ), will not discuss how to assign labels in automatic parsing algorithm Here is an example of a dependency tree representing grammatical analysis of sentence In figure,<d> represents dummy root token Arrows are drawn from heads to dependents The links in this dependency tree can be interpreted as follows: is main verb of sentence, so it is a dependent of dummy root token is a temporal adverbial of is subject of is object of is determiner of We typically connect punctuation to main verb, so make period a dependent of ome important properties of a dependency tree: Every token has exactly one head, except dummy root token which has none Every token can be reached if start at dummy root and follow arcs down from head to dependent There are no cycles this is why may say that it s a tree
3 The Nivre algorithm for incremental dependency parsing As discussed in lectures, Nivre algorithm is incremental: it processes tokens in order y appear in sentence This means that can start parsing while input is still being produced, for instance in a spoken dialogue system or a T9 input system for mobile phones The algorithm that will study was defined by Nivre (2003), and is one of several incremental algorithms (Nivre, 2008) The algorithm is initialized by creating a stack and a queue, and all words in sentence (including dummy root) are inserted into queue If you haven t taken a course in data structures, here s a short introduction: stacks and queues are both lists of items to process, and difference beten m is order in which process items: A queue is a work list where items are processed in a first-in-first-out order, so item to be processed first item is one that was inserted first A stack is a work list where items are processed in a last-in-first-out order, so item that was most recently inserted top item is available to algorithm for processing The algorithm n processes tokens in stack and queue, and gradually adds dependencies beten tokens that appear This process goes on until queue is empty The four actions that can be performed on stack and queue can be summarized as follows, and are described in detail in next section HIFT: move a token from queue to stack; REDUCE: remove a token from stack; LEFT-ARC: top token of stack becomes dependent of first item of queue, and is removed from stack RIGHT-ARC: first item of queue becomes dependent of top token of stack, and is n moved from queue onto stack 31 Detailed description of parsing actions Now give detailed descriptions of four actions of algorithm In descriptions, assume that have a stack where top item is T, and a queue where first item is F Be careful to understand preconditions of each action, ie circumstances in which it is legal to carry out a particular action, and effects, ie how stack and queue are affected and if any dependencies are added by action For each action, re is a figure that exemplifies situation before and after have applied action In figure, stack is drawn with top item to right, and queue with first item to left HIFT must not be empty F is removed from F becomes first item of
REDUCE must not be empty T must have a head T is removed from LEFT-ARC must not be empty must not be empty T must not have a head T must not be dummy root token T is removed from An dependency arc is added, withf as head and T as dependent Typical cases: RIGHT-ARC T is a noun and F is a verb We make a LEFT-ARC to connect noun to verb as a subject T is an article such as, and F is a noun We make a LEFT-ARC to connect article to noun as its determiner must not be empty must not be empty F must not have a head F is removed from F becomes first item of An dependency arc is added, witht as head and F as dependent
Typical cases: T is a verb and F is a noun We make a RIGHT-ARC to connect noun to verb as an object T is a preposition and F is a noun We make a RIGHT-ARC to connect noun to preposition as a prepositional complement T is a noun and F is a verb We make a RIGHT-ARC to connect verb to noun as head of a relative clause 4 Building an automatic parser using Nivre algorithm Until now described how Nivre algorithm proceeds through sentence, gradually adding arcs until sentence ends and all arcs have been added The obvious followup question is: when parse a sentence using an automatic parser, how do know which action to carry out? The ansr is: train a statistical classifier, and when come to a new sentence, apply Nivre algorithm and ask classifier for advice at each step The classifier is a function F that takes a stack and a queue and returns action it thinks should be carried out How do n build a training set that can give to machine learning software such as your Naïve Bayes implemenation, NLTK, or cikit-learn? To address this, need to do following: First collect a set of sentences where some linguist has annotated grammatical structure manually We call such a collection a treebank For each sentence in treebank, go through sentence using Nivre algorithm and determine correct sequence of actions (see next section) Now can build a training set for our classifiers At each step of Nivre algorithm, extract training features describing stack and queue; in NLTK and cikit-learn, this will be an attribute-value dictionary add features and corresponding action to training set 41 Finding correct sequence of actions if know true tree Assume that are given a dependency parse treegand want to determine sequence of parsing actions needed to produce that tree, go through Nivre algorithm step by step, and at each step use following decision rules to select action Again, T means top item of stack, and F means first item of queue 1 If stack is empty, select HIFT 2 If G contains a dependency arc with head F and dependent T, n select LEFT-ARC 3 If G contains a dependency arc with head T and dependent F, n select RIGHT-ARC 4 If stack contains a token T such that T is head or dependent of F in G, n select REDUCE 5 Orwise, select HIFT
>D< >D< 42 Walkthrough of example sentence Now will determine complete sequence of steps required to parse sentence The tree want to build is this one: We initialize by putting all tokens, including dummy root token, into working queue We also create an empty stack ince stack is empty, don t have any choice but HIFT (case 1): There is no dependency arc beten dummy root and, so HIFT from queue onto stack (case 5): and also HIFT (case 5): In tree want to build, is a dependent (a subject) of, so now make a LEFT-ARC is dropped from stack (case 2):
>D< imilarly, is a dependent (a temporal adverbial) of, so now again make a LEFT-ARC is dropped from stack (case 2): is main verb in this sentence, so make a RIGHT-ARC from dummy root to, which is moved from onto (case 3): The gold-standard tree contains no dependency arc beten and, so HIFT token from onto (case 5): is determiner of, so make a LEFT-ARC and drop from (case 2): We want to have as object of, so make a RIGHT-ARC beten those words moves from onto (case 3):
Now, top T of stack is and first item F in queue is period ; se two words are not connected in gold-standard tree, but stack contains anor word that is head of, namely o need to REDUCE, and remove from stack (case 4): Now that have reduced, have prepared for RIGHT-ARC beten and period The period moves from onto (case 3): The input queue is now empty, so are done! The dependency arcs produced by algorithm correspond to those in given tree 43 Making a training set for action classifier Our action classifier needs to apply a feature extraction function to state of parser (ie stack and queue) before making a guess about which parsing action to carry out Assume that are using a feature extraction function that extracts PO tag oft and off, if our training corpus consists of example sentence, would generate following training set for training our action classifier: [ ( {"T_pos":"(none)", "F_pos":"<D>" }, "shift" ), ( {"T_pos":"<D>", "F_pos":"adverb" }, "shift" ), ( {"T_pos":"adverb", "F_pos":"pronoun" }, "shift" ), ( {"T_pos":"pronoun", "F_pos":"finite_verb" }, "left-arc" ), ( {"T_pos":"adverb", "F_pos":"finite_verb" }, "left-arc" ), ( {"T_pos":"<D>", "F_pos":"finite_verb" }, "right-arc" ), ( {"T_pos":"finite_verb", "F_pos":"article" }, "shift" ), ( {"T_pos":"article", "F_pos":"noun" }, "left-arc" ), ( {"T_pos":"finite_verb", "F_pos":"noun" }, "right-arc" ), ( {"T_pos":"noun", "F_pos":"punctuation" }, "reduce" ), ( {"T_pos":"finite_verb", "F_pos":"punctuation" }, "right-arc" ) ] Obviously need many more examples to be able to train a good action classifier, but this is general idea! References Joakim Nivre 2003 An efficient algorithm for projective dependency parsing In Proceedings of 8th International Workshop on Parsing Technologies (IWPT 03), pages 149 160, Nancy, France Joakim Nivre 2008 Algorithms for deterministic incremental dependency parsing Computational Linguistics, 34(4):513 553