Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

Similar documents
Transition-Based Dependency Parsing with MaltParser

Non-Projective Dependency Parsing in Expected Linear Time

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Incremental Integer Linear Programming for Non-projective Dependency Parsing

Transition-based Dependency Parsing with Rich Non-local Features

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Introduction to Data-Driven Dependency Parsing

NLP in practice, an example: Semantic Role Labeling

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Undirected Dependency Parsing

Transition-based dependency parsing

Sorting Out Dependency Parsing

Sorting Out Dependency Parsing

On maximum spanning DAG algorithms for semantic DAG parsing

CS395T Project 2: Shift-Reduce Parsing

A Quick Guide to MaltParser Optimization

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Automatic Discovery of Feature Sets for Dependency Parsing

SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY. Parser Evaluation Approaches

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

Dependency grammar and dependency parsing

Parsing with Dynamic Programming

Assignment 4 CSE 517: Natural Language Processing

Transition-based Parsing with Neural Nets

Stack- propaga+on: Improved Representa+on Learning for Syntax

Tekniker för storskalig parsning: Dependensparsning 2

Dependency grammar and dependency parsing

Advanced Search Algorithms

Dependency grammar and dependency parsing

Final Project Discussion. Adam Meyers Montclair State University

Graph-Based Parsing. Miguel Ballesteros. Algorithms for NLP Course. 7-11

Dependency Parsing domain adaptation using transductive SVM

Convolutional Networks for Text

Guiding Semi-Supervision with Constraint-Driven Learning

A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser

Computationally Efficient M-Estimation of Log-Linear Structure Models

Backpropagating through Structured Argmax using a SPIGOT

Dependency Parsing. Ganesh Bhosale Neelamadhav G Nilesh Bhosale Pranav Jawale under the guidance of

Unsupervised Semantic Parsing

Easy-First POS Tagging and Dependency Parsing with Beam Search

Homework 2: Parsing and Machine Learning

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

The Application of Constraint Rules to Data-driven Parsing

Generalized Higher-Order Dependency Parsing with Cube Pruning

Statistical Dependency Parsing

Semantics as a Foreign Language. Gabriel Stanovsky and Ido Dagan EMNLP 2018

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

Refresher on Dependency Syntax and the Nivre Algorithm

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Background and Context for CLASP. Nancy Ide, Vassar College

Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Semantic Dependency Graph Parsing Using Tree Approximations

Projective Dependency Parsing with Perceptron

Context Encoding LSTM CS224N Course Project

Lexicalized Semi-Incremental Dependency Parsing

Managing a Multilingual Treebank Project

Fully Delexicalized Contexts for Syntax-Based Word Embeddings

Propositional Logic Formal Syntax and Semantics. Computability and Logic

COURSE: DATA STRUCTURES USING C & C++ CODE: 05BMCAR17161 CREDITS: 05

An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing

Lexicalized Semi-Incremental Dependency Parsing

Main Goal. Language-independent program verification framework. Derive program properties from operational semantics

Generating FrameNets of various granularities: The FrameNet Transformer

Stanford s System for Parsing the English Web

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Chapter 4: LR Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Exam Marco Kuhlmann. This exam consists of three parts:

K-best Parsing Algorithms

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

STRUCTURES AND STRATEGIES FOR STATE SPACE SEARCH

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Non-Projective Dependency Parsing with Non-Local Transitions

Meaning Banking and Beyond

Treewidth and graph minors

Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model

Reachability and error diagnosis in LR(1) parsers

Shannon capacity and related problems in Information Theory and Ramsey Theory

Supplementary Notes on Abstract Syntax

ANC2Go: A Web Application for Customized Corpus Creation

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Online Service for Polish Dependency Parsing and Results Visualisation

WebAnno: a flexible, web-based annotation tool for CLARIN

Discrete mathematics

Dynamic Feature Selection for Dependency Parsing

EDAN20 Language Technology Chapter 13: Dependency Parsing

Greedy algorithms is another useful way for solving optimization problems.

Incremental Joint POS Tagging and Dependency Parsing in Chinese

Chapter 4. Relations & Graphs. 4.1 Relations. Exercises For each of the relations specified below:

LinearFold GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA. Liang Huang. Oregon State University & Baidu Research

Using Relations for Identification and Normalization of Disorders: Team CLEAR in the ShARe/CLEF 2013 ehealth Evaluation Lab

Compilers and Interpreters

Efficient Dependency-Guided Named Entity Recognition

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

A Structural Operational Semantics for JavaScript

Chapter 12 and 11.1 Planar graphs, regular polyhedra, and graph colorings

TTIC 31190: Natural Language Processing

Transcription:

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies Ivan Titov University of Illinois at Urbana-Champaign James Henderson, Paola Merlo, Gabriele Musillo University of Geneva

Motivation / Problem Statement NLP applications will require a shallow representation of meaning Often shallow semantic structures can be regarded as labeled directed graphs PATIENT PATIENT Semantic structure Can we apply methods for syntactic dependency parsing to predict these trees?

Motivation / Problem Statement NLP applications will require a shallow representation of meaning Often shallow semantic structures can be regarded as labeled directed graphs SBJ COORD CONJ PATIENT PATIENT NMOD OBJ Not necessary a tree. Many crossing arcs (only 22% are trees in ( non-planarity ) CoNLL-08 dataset) Semantic structure Syntactic structure Often little or no Tree structured crossing links 3

Motivation / Problem Statement NLP applications will require a shallow representation of meaning Often shallow semantic structures can be regarded as labeled directed graphs PATIENT - How can we deal with more general graphs Semantic representing semantic structures? structure PATIENT Sequa - How makes can we and construct repairs an effective jet engines semantic parser on the basis of an existing syntactic parser? NMOD Syntactic SBJ COORD CONJ structure OBJ Not necessary a tree. Many crossing arcs (only 22% are trees in ( non-planarity ) CoNLL-08 dataset) Often little or no Tree structured crossing links 4

Outline Motivation / Problem Statement Background Dependency parsing Properties of dependency graphs Non-Planar Parsing using Swapping Synchronous Parsing of Semantic and Syntactic Dependencies Synchronization Statistical Model Experiments Conclusions and Future Directions 5

Dependency Parsing Problem Parsing: given sentence predict structure : Graph-based methods: assume that features factorize over subgraphs of (e.g., edges) [Eisner, 96; McDonald et al., 05] Transition-based methods [Yamada and Matsumoto, 03; Nivre et al., 04]: Define derivation order for structures, i.e. mapping from to sequences of decisions Model estimated on a labeled dataset (treebank) That will be the main focus of the talk Beam-search can be used instead of greedy decoding Learn to score individual decision given preceding decisions: Decode greedily: 6

Properties of Semantic Structures Semantic structures are not trees Graph-based (GB) methods based on maximum spanning tree algorithms are not directly applicable Semantic structures are not planar Definition: planar graphs can be drawn in the semi-plane above the sentence without any two arcs crossing and without changing the order of words 7

Properties of Semantic Structures Semantic structures are not trees Graph-based (GB) methods based on maximum spanning tree algorithms are not directly applicable Semantic structures are not planar Definition: planar graphs can be drawn in the semi-plane above the sentence without any two arcs crossing and without changing the order of words Most transition-based (TB) algorithms handle only planar graphs Related work:... [Attardi, 06]: TB method with extended derivation order to handle non-planarity [Nivre, 08]: Assumes that a structure can be made planar by changing order of words (not true for general non-tree graphs) 8

Properties of Semantic Structures Semantic structures are not trees Graph-based (GB) methods based on maximum spanning tree algorithms are not directly applicable We will propose a very simple Semantic structures are not planar technique to extend standard TB Definition: planar methods graphs can to handle be drawn non-planar in the semi-plane and above the sentence without any not-tree two arcs structured crossing and graphs without changing the order of words Most transition-based (TB) algorithms handle only planar graphs Related work:... [Attardi, 06]: TB method with extended derivation order to handle non-planarity [Nivre, 08]: Assumes that a structure can be made planar by changing order of words (not true for general non-tree graphs) 9

Outline Motivation / Problem Statement Background Dependency parsing Properties of dependency graphs Non-Planar Parsing using Swapping Synchronous Parsing of Semantic and Syntactic Dependencies Synchronization Statistical Model Experiments Conclusions and Future Directions 10

Derivation order [Nivre, 04] State of the parser after steps is characterized by: current stack S (w j word on top of the stack) a queue I of remaining input words (w k next input word) partial dependency structure defined by New decision d i can be LeftArc r - adds a labeled dependency arc RightArc r - adds a labeled dependency arc Shift - moves w k from queue to the stack Reduce - remove w j from the stack Terminates when the queue is empty 11

Handling non-planar structures [Nivre, 04] order cannot handle non-planar structures: All the arcs are created between top of the stack and front of queue Words are stored in the stack in the same order as they appear in the sentence A single new decision: Swap - swaps 2 top words in the stack Stack before: Stack after: 12

Example PATIENT PATIENT 13

Example Partial Structure: S = [ ] I = [ Sequa makes and... ] Next action: Shift 14

Example Partial Structure: S = [ Sequa ] I = [ makes and repairs... ] Next action: LeftArc 15

Example Partial Structure: S = [ Sequa ] I = [ makes and repairs... ] Next action: Shift 16

Example Partial Structure: S = [ Sequa makes ] I = [ and repairs jet engines ] Next action: Shift 17

Example Partial Structure: S = [ Sequa makes and ] I = [ repairs jet engines ] Next action: Reduce 18

Example Partial Structure: S = [ Sequa makes ] I = [ repairs jet engines ] Next action: Swap Stalled if without swap: - repairs needs an arc to Sequa - but makes cannot be removed from stack 19

Example Partial Structure: S = [ makes Sequa ] I = [ repairs jet engines ] Next action: LeftArc 20

Example Partial Structure: S = [ makes Sequa ] I = [ repairs jet engines ] Next action: Reduce 21

Example Partial Structure: S = [ makes ] I = [ repairs jet engines ] Next action: Shift 22

Example Partial Structure: S = [ makes repairs ] I = [ jet engines ] Next action: Shift 23

Example Partial Structure: S = [ makes repairs jet ] I = [ engines ] Next action: Reduce 24

Example Partial Structure: S = [ makes repairs ] I = [ engines ] Next action: RightArc PATIENT 25

Example Partial Structure: PATIENT S = [ makes repairs ] I = [ engines ] Next action: Reduce 26

Example Partial Structure: PATIENT S = [ makes ] I = [ engines ] Next action: RightArc PATIENT 27

Example Partial Structure: PATIENT PATIENT S = [ makes ] I = [ engines ] Next action: Shift 28

Example Partial Structure: PATIENT PATIENT S = [ makes engines ] I = [ ] Next action: Stop 29

Canonical orders The algorithm allows the same structure to be parsed multiple ways Summing over all the possible derivation when parsing is not feasible Instead, models are trained to produce derivations in a canonical way In other words, canonical orders define how derivations should be produced for the training set We consider two canonical derivations which differ in when swapping is done last-resort: Swap is used as a last resort when no other operation is possible exhaustive: Swap is used pre-emptively 30

Last-Resort ordering Swap is used as a last resort when no operation is possible Drawback: not all the structures parsable with swapping have a lastresort derivation in CoNLL-2008 dataset 2.8% fewer structures are parsable with last-resort ordering An example of such a structure: Suddenly CDC and DEC have products Advantage: this canonical derivation is predictable and, therefore, supposedly easier to learn 31

Exhaustive ordering Algorithm for preemptive swapping: Ordering follows standard planar parser ordering until no other operation except Shift and Swap are possible Compute the ordered list of positions of words in the queue to which current top of the stack w j will be connected Compute a similar list for word w m under the top of the stack Swap if w m s list precedes w j s list in their lexicographical order Suddenly 1 CDC 2 and 3 DEC 4 have 5 products 6 S = [ Suddenly CDC ] I = [ DEC... ] List for Suddenly : {5} List for CDC : {5, 6} Swap 32

Exhaustive ordering Algorithm for preemptive swapping: Ordering follows standard planar parser ordering until no other operation except Shift and Swap are possible Compute the ordered list of positions of words in the queue to which current top of the stack w j will be connected Compute a similar list for word w m under the top of the stack Swap if w m s list precedes w j s list in their lexicographical order Theorem If the graph is parsable with the defined set of operations then the exhaustive ordering is guaranteed to find a derivation See the paper for the proof sketch 33

Structures Parsable with Swapping Not all the non-planar graphs are parsable In CoNLL-2008 ST dataset only 1% of semantic structures are not parsable whereas 44% are not planar (i.e., require swapping) Among common linguistic structures requiring Swap are coordinations E.g., Sequa makes, repairs and sells engines A frequent example of an unparsable structure: 2 predicates sharing 3 arguments Funds also might buy and sell 34

Structures Parsable with Swapping Any structures with isolated pairs of crossing arcs are parsable but they are more powerful than that Theorem A graph cannot be parsed with the defined set of parsing operations iff the graph contains at least one of the subgraphs presented below: the unspecified arc end points can be anywhere strictly following those specified circled pairs of endpoints can be either a single word or two distinct words See the paper for the proof sketch 35

Outline Motivation / Problem Statement Background Dependency parsing Properties of dependency graphs Non-Planar Parsing using Swapping Synchronous Parsing of Semantic and Syntactic Dependencies Synchronization Statistical Model Experiments Conclusions and Future Directions 36

Synchronization [Henderson et al., 2008] We define two separate derivations: one for semantics, one for syntax Instead of using pipelines we synchronize these two derivations joint learning and joint inference 37

Synchronization Example: Semantic structure Syntactic structure 38

Synchronization Example: SBJ Semantic structure Syntactic structure 39

Synchronization Example: SBJ Semantic structure Syntactic structure 40

Synchronization Example: SBJ COORD Semantic structure Syntactic structure 41

Synchronization Example: SBJ COORD Semantic structure Syntactic structure 42

Synchronization Example: SBJ COORD CONJ Semantic structure Syntactic structure 43

Synchronization Example: SBJ COORD CONJ Semantic structure Syntactic structure 44

Synchronization Example: SBJ COORD CONJ Semantic structure Syntactic structure 45

Synchronization Example: SBJ COORD CONJ Semantic structure Syntactic structure 46

Synchronization Example: SBJ COORD CONJ SBJ NMOD Semantic structure Syntactic structure 47

Synchronization Example: SBJ COORD CONJ PATIENT PATIENT SBJ NMOD Semantic structure Syntactic structure 48

Statistical Model Following [Henderson, et al.,08], the synchronous derivations are modeled with Incremental Sigmoid Belief Networks (ISBNs) [Titov and Henderson, 07] previously successfully applied to constituent and dependency syntactic parsing Vectors of latent variables are associated with each parsing decision Each vector is connected with previous vectors by a pattern of interconnections determined by the previous decisions 49

Outline Motivation / Problem Statement Background Dependency parsing Properties of dependency graphs Non-Planar Parsing using Swapping Synchronous Parsing of Semantic and Syntactic Dependencies Synchronization Statistical Model Experiments Conclusions and Future Directions 50

Empirical Evaluation CoNLL-2008 Shared Task data [Surdeanu et al., 08], merged dependency transformation of Penn Treebank WSJ (syntax) dependency representation of Propbank and Nombank (semantics) Data: 39,279/1,334/2,824 sentences for training/development/testing Systems Our models exhaustive order last-resort order planar order (can only process projective parts of derivations) [Henderson et al., 08] planarisation: a modification of [Nivre and Nilsson, 05] method for semantic graphs (HEAD) Crossing links are removed and encoded in labels of remaining arcs Model structure and estimation methods are kept constant 51

Results on Development Set Technique CoNLL Measures Crossing Paris (Sem) Syn LAS Sem F1 Aver F1 P R F1 Last resort 86.6 76.2 81.5 61.5 25.6 36.1 Exhaustive 86.8 76.0 81.4 59.7 23.5 33.8 HEAD 86.7 73.3 80.1 78.6 2.2 4.2 Planar 85.9 72.8 79.4 undef 0 undef 52

Results on Development Set Technique CoNLL Measures Crossing Paris (Sem) Syn LAS Sem F1 Aver F1 P R F1 Last resort 86.6 76.2 81.5 61.5 25.6 36.1 Exhaustive 86.8 76.0 81.4 59.7 23.5 33.8 HEAD 86.7 73.3 80.1 78.6 2.2 4.2 Planar 85.9 72.8 79.4 undef 0 undef 53

Results on Development Set Technique CoNLL Measures Crossing Pairs (Sem) Syn LAS Sem F1 Aver F1 P R F1 Last resort 86.6 76.2 81.5 61.5 25.6 36.1 Exhaustive 86.8 76.0 81.4 59.7 23.5 33.8 HEAD 86.7 73.3 80.1 78.6 2.2 4.2 Planar 85.9 72.8 79.4 undef 0 undef 54

Results on Development Set Technique CoNLL Measures Crossing Pairs (Sem) Syn LAS Sem F1 Aver F1 P R F1 Last resort 86.6 76.2 81.5 61.5 25.6 36.1 Exhaustive 86.8 76.0 81.4 59.7 23.5 33.8 HEAD 86.7 The model 73.3 is generative 80.1and, therefore, 78.6 decisions 2.2 are not conditioned on future words a likely 4.2 Planar reason why no improvement from using exhaustive 85.9 strategy 72.8 79.4 undef 0 undef 55

Test Set Results Model CoNLL Measures Crossing Pairs (Sem) Syn LAS Sem F1 Aver F1 P R F1 Johanssen 89.3 81.6 85.5 67.0 44.5 53.5 Ciaramita 87.4 78.0 82.7 59.9 34.2 43.5 Che 86.7 78.5 82.7 56.9 32.4 41.3 Zhao 87.7 76.7 82.2 58.5 36.1 44.6 This paper 87.5 76.1 81.8 62.1 29.4 39.9 Henderson+ 87.6 73.1 80.5 72.6 1.7 3.3 Lluis 85.8 70.3 78.1 53.8 19.2 28.3 3% improvement over the baseline on semantics graphs However, does not outperform reranking or ensemble techniques 56

Test Set Results Model CoNLL Measures Crossing Pairs (Sem) Syn LAS Sem F1 Aver F1 P R F1 Johanssen 89.3 81.6 85.5 67.0 44.5 53.5 Ciaramita 87.4 78.0 82.7 59.9 34.2 43.5 Che 86.7 78.5 82.7 56.9 32.4 41.3 Zhao 87.7 76.7 82.2 58.5 36.1 44.6 This paper 87.5 76.1 81.8 62.1 29.4 39.9 Henderson+ 87.6 73.1 80.5 72.6 1.7 3.3 Lluis 85.8 70.3 78.1 53.8 19.2 28.3 Recent results: 3 rd result in CoNLL-2009 Shared task (7 languages) 57

Conclusions Proposed a simple modification to handle semantic graphs with transitionbased parsers though not powerful enough to process all the semantic graphs it is able to handle vast majority proposed and analyzed two algorithms for canonical derivation induction theoretically characterized the class of parsable structures Showed improvements over previous methods Demonstrated state-of-the-art results without ensemble techniques or pipelines Future directions: applying this approach to parsing of language with highly non-planar syntactic structures 58