Transition-based Parsing with Neural Nets

Similar documents
Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Parsing with Dynamic Programming

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Context Encoding LSTM CS224N Course Project

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Transition-based dependency parsing

Dependency Parsing. Allan Jie. February 20, Slides: Allan Jie Dependency Parsing February 20, / 16

CS 224N Assignment 2 Writeup

Simple and Efficient Learning with Automatic Operation Batching

Stack- propaga+on: Improved Representa+on Learning for Syntax

CKY algorithm / PCFGs

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

CS395T Project 2: Shift-Reduce Parsing

Dependency grammar and dependency parsing

Structured Prediction Basics

CS5740: Natural Language Processing Spring Computation Graphs. Instructor: Yoav Artzi

Assignment 4 CSE 517: Natural Language Processing

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Dependency grammar and dependency parsing

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Dependency grammar and dependency parsing

Elements of Language Processing and Learning lecture 3

Advanced Search Algorithms

In-Order Transition-based Constituent Parsing

arxiv: v1 [cs.cl] 25 Apr 2017

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Syntax Analysis Part I

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

EDAN20 Language Technology Chapter 13: Dependency Parsing

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Advanced PCFG Parsing

Dependency Parsing. Ganesh Bhosale Neelamadhav G Nilesh Bhosale Pranav Jawale under the guidance of

The CKY algorithm part 1: Recognition

Natural Language Processing with Deep Learning CS224N/Ling284

Frameworks. Advanced Machine Learning for NLP Jordan Boyd-Graber INTRODUCTION. Slides adapted from Chris Dyer, Yoav Goldberg, Graham Neubig

Learning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu

Gated Recurrent Models. Stephan Gouws & Richard Klein

Introduction to Data-Driven Dependency Parsing

Refresher on Dependency Syntax and the Nivre Algorithm

arxiv: v1 [cs.cl] 13 Mar 2017

Encoder-Decoder Shift-Reduce Syntactic Parsing

Online Graph Planarisation for Synchronous Parsing of Semantic and Syntactic Dependencies

Differentiable Data Structures (and POMDPs)

The CKY algorithm part 1: Recognition

Convolutional Networks for Text

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

Forest-based Algorithms in

Machine Learning Practice and Theory

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Shallow Parsing Swapnil Chaudhari 11305R011 Ankur Aher Raj Dabre 11305R001

CSC 578 Neural Networks and Deep Learning

DRAGNN: A TRANSITION-BASED FRAMEWORK FOR DYNAMICALLY CONNECTED NEURAL NETWORKS

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Tekniker för storskalig parsning: Dependensparsning 2

Layerwise Interweaving Convolutional LSTM

Ling 571: Deep Processing for Natural Language Processing

Chapter 4: LR Parsing

Forest-Based Search Algorithms

Collins and Eisner s algorithms

(Multinomial) Logistic Regression + Feature Engineering

27: Hybrid Graphical Models and Neural Networks

Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model

Data Structures. Outline. Introduction Linked Lists Stacks Queues Trees Deitel & Associates, Inc. All rights reserved.

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

EDAN20 Final Examination

Ling 571: Deep Processing for Natural Language Processing

Modeling Sequences Conditioned on Context with RNNs

I Know Your Name: Named Entity Recognition and Structural Parsing

Accurate Parsing and Beyond. Yoav Goldberg Bar Ilan University

Wednesday, August 31, Parsers

2068 (I) Attempt all questions.

A simple pattern-matching algorithm for recovering empty nodes and their antecedents

Syntactic N-grams as Machine Learning. Features for Natural Language Processing. Marvin Gülzow. Basics. Approach. Results.

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Slide credit from Hung-Yi Lee & Richard Socher

Semantics as a Foreign Language. Gabriel Stanovsky and Ido Dagan EMNLP 2018

UNIT III & IV. Bottom up parsing

Transition-Based Dependency Parsing with MaltParser

Natural Language Processing (CSE 517): Dependency Syntax and Parsing

Question Answering System for Yioop

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Computational Linguistics

Lecture Notes for Advanced Algorithms

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

The Application of Constraint Rules to Data-driven Parsing

Syntax-Directed Translation. CS Compiler Design. SDD and SDT scheme. Example: SDD vs SDT scheme infix to postfix trans

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)

Principles of Programming Languages COMP251: Syntax and Grammars

NLP in practice, an example: Semantic Role Labeling

Table-Driven Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Easy-First POS Tagging and Dependency Parsing with Beam Search

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Transcription:

CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/

Two Types of Linguistic Structure Dependency: focus on relations between words ROOT I saw a girl with a telescope Phrase structure: focus on the structure of the sentence S NP VP PP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope

Parsing Predicting linguistic structure from input sentence Transition-based models step through actions one-by-one until we have output like history-based model for POS tagging Graph-based models calculate probability of each edge/constituent, and perform some sort of dynamic programming like linear CRF model for POS

Shift-reduce Dependency Parsing

Why Dependencies? Dependencies are often good for semantic tasks, as related words are close in the tree It is also possible to create labeled dependencies, that explicitly show the relationship between words prep nsubj det dobj pobj det I saw a girl with a telescope

Arc Standard Shift-Reduce Parsing (Yamada & Matsumoto 2003, Nivre 2003) Process words one-by-one left-to-right Two data structures Queue: of unprocessed words Stack: of partially processed words At each point choose shift: move one word from queue to stack reduce left: top word on stack is head of second word reduce right: second word on stack is head of top word Learn how to choose each action with a classifier

Shift Reduce Example Stack Buffer Stack Buffer ROOT ROOT I shift I saw a girl saw a girl ROOT I saw a girl left shift ROOT I saw a girl ROOT I saw a girl right shift shift ROOT I saw a girl ROOT I saw a girl right left ROOT I saw a girl ROOT I saw a girl

Classification for Shift-reduce Given a configuration Stack ROOT I saw a Buffer girl Which action do we choose? shift right ROOT I saw a girl left ROOT I saw a girl ROOT I saw a girl

Making Classification Decisions Extract features from the configuration what words are on the stack/buffer? what are their POS tags? what are their children? Feature combinations are important! Second word on stack is verb AND first is noun: right action is likely Combination features used to be created manually (e.g. Zhang and Nivre 2011), now we can use neural nets!

A Feed-forward Neural Model for Shift-reduce Parsing (Chen and Manning 2014)

A Feed-forward Neural Model for Shift-reduce Parsing (Chen and Manning 2014) Extract non-combined features (embeddings) Let the neural net do the feature combination

What Features to Extract? The top 3 words on the stack and buffer (6 features) s 1, s 2, s 3, b 1, b 2, b 3 The two leftmost/rightmost children of the top two words on the stack (8 features) lc 1 (s i ), lc 2 (s i ), rc 1 (s i ), rc 2 (s i ) i=1,2 leftmost and rightmost grandchildren (4 features) lc 1 (lc 1 (s i )), rc 1 (rc 1 (s i )) i=1,2 POS tags of all of the above (18 features) Arc labels of all children/grandchildren (12 features)

Non-linear Function: Cube Function Take the cube of the input value vector Why? Directly extracts feature combinations of up to three (similar to Polynomial Kernel in SVMs)

Result Faster than most standard dependency parsers (1000 words/second) Use pre-computation trick to cache matrix multiplies of common words Strong results, beating most existing transitionbased parsers at the time

Let s Try it Out! ff-depparser.py

Using Tree Structure in NNs: Syntactic Composition

Why Tree Structure?

Recursive Neural Networks (Socher et al. 2011) I hate this movie Tree-RNN Tree-RNN Tree-RNN tree-rnn(h 1, h 2 ) = tanh(w [h 1 ; h 2 ]+b) Can also parameterize by constituent type different composition behavior for NP, VP, etc.

Tree-structured LSTM Child Sum Tree-LSTM (Tai et al. 2015) Parameters shared between all children (possibly based on grammatical label, etc.) Forget gate value is different for each child the network can learn to ignore children (e.g. give less weight to non-head nodes) N-ary Tree-LSTM Different parameters for each child, up to N (like the Tree RNN)

Bi-LSTM Composition (Dyer et al. 2015) Simply read in the constituents with a BiLSTM The model can learn its own composition function! I hate this movie BiLSTM BiLSTM BiLSTM

Let s Try it Out! tree-lstm.py

Stack LSTM: Dependency Parsing w/ Less Engineering, Wider Context (Dyer et al. 2015)

Encoding Parsing Configurations w/ RNNs We don t want to do feature engineering (why leftmost and rightmost grandchildren only?!) Can we encode all the information about the parse configuration with an RNN? Information we have: stack, buffer, past actions

Encoding Stack Configurations w/ RNNs S SHIFT p t SHIFT RED-L(amod) REDUCE_L REDUCE_R B {z } {z } TOP TOP ; amod an decision was overhasty A TOP {z } REDUCE-LEFT(amod) SHIFT made root (Slide credits: Chris Dyer)

Transition-based parsing State embeddings We can embed words, and can embed tree fragments using syntactic compositon The contents of the buffer are just a sequence of embedded words which we periodically shift from The contents of the stack is just a sequence of embedded trees which we periodically pop from and push to Sequences -> use RNNs to get an encoding! But running an RNN for each state will be expensive. Can we do better? (Slide credits: Chris Dyer)

Transition-based parsing Stack RNNs Augment RNN with a stack pointer Three constant-time operations push - read input, add to top of stack pop - move stack pointer back embedding - return the RNN state at the location of the stack pointer (which summarizes its current contents) (Slide credits: Chris Dyer)

Transition-based parsing Stack RNNs y 0 DyNet: s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3) ; (Slide credits: Chris Dyer)

Transition-based parsing Stack RNNs y 0 y 1 DyNet: s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3) ; x 1 (Slide credits: Chris Dyer)

Transition-based parsing Stack RNNs y 0 y 1 DyNet: s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3) ; x 1 (Slide credits: Chris Dyer)

Transition-based parsing Stack RNNs y 0 y 1 y 2 DyNet: s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3) ; x 1 x 2 (Slide credits: Chris Dyer)

Transition-based parsing Stack RNNs y 0 y 1 y 2 DyNet: s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3) ; x 1 x 2 (Slide credits: Chris Dyer)

Transition-based parsing Stack RNNs y 0 y 1 y 2 y 3 DyNet: s=[rnn.inital_state()] s.append[s[-1].add_input(x1) s.pop() s.append[s[-1].add_input(x2) s.pop() s.append[s[-1].add_input(x3) ; x 1 x 2 x 3 (Slide credits: Chris Dyer)

Let s Try it Out! stacklstm-depparser.py

Shift-reduce Parsing for Phrase Structure

Shift-reduce Parsing for Phrase Structure (Sagae and Lavie 2005, Watanabe 2015) Shift, reduce-x (binary), unary-x (unary) where X is a label NP SBAR WHNP NNS WDT VBD S VP NP DT JJ NP people that saw the tall girl First, Binarize NP DT JJ NP the tall girl NP NP DT JJ NP the tall girl shift Stack the tall Buffer girl reduce-np Stack the tall girl unary-s Stack VP S VP the tall girl the NP tall girl saw NP saw NP

Recurrent Neural Network Grammars (Dyer et al. 2016) Top-down generative models for parsing Can serve as a language model as well Good parsing results Decoding is difficult: need to generate with discriminative model then rerank, importance sampling for LM evaluation

A Simple Approximation: Linearized Trees (Vinyals et al. 2015) Similar to RNNG, but generates symbols of linearized tree + Can be done with simple sequence-to-sequence models - No explicit composition function like StackLSTM/RNNG - Not guaranteed to output well-formed trees

Questions?