CKY algorithm / PCFGs

Similar documents
The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Ling 571: Deep Processing for Natural Language Processing

The CKY algorithm part 1: Recognition

Elements of Language Processing and Learning lecture 3

Advanced PCFG Parsing

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

The CKY algorithm part 1: Recognition

The CKY algorithm part 2: Probabilistic parsing

CS 224N Assignment 2 Writeup

Parsing Chart Extensions 1

Parsing with Dynamic Programming

Advanced PCFG algorithms

Transition-based Parsing with Neural Nets

Assignment 4 CSE 517: Natural Language Processing

Syntax Analysis Part I

Natural Language Processing

Syntax Analysis. Chapter 4

Phrase Structure Parsing. Statistical NLP Spring Conflicting Tests. Constituency Tests. Non-Local Phenomena. Regularity of Rules

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Dependency grammar and dependency parsing

COMS W4705, Spring 2015: Problem Set 2 Total points: 140

The Expectation Maximization (EM) Algorithm

1. Generalized CKY Parsing Treebank empties and unaries. Seven Lectures on Statistical Parsing. Unary rules: Efficient CKY parsing

BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling. Colin Dewey (adapted from slides by Mark Craven)

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Basic Parsing with Context-Free Grammars. Some slides adapted from Karl Stratos and from Chris Manning

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Advanced PCFG Parsing

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

I Know Your Name: Named Entity Recognition and Structural Parsing

Dependency grammar and dependency parsing

Iterative CKY parsing for Probabilistic Context-Free Grammars

Dependency grammar and dependency parsing

Natural Language Processing

Decision Properties for Context-free Languages

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

Lexical Semantics. Regina Barzilay MIT. October, 5766

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Computational Linguistics

Probabilistic Parsing of Mathematics

First Midterm Exam CS164, Fall 2007 Oct 2, 2007

LSA 354. Programming Assignment: A Treebank Parser

The Earley Parser

Algorithms for AI and NLP (INF4820 TFSs)

The CYK Parsing Algorithm

Parsing III. (Top-down parsing: recursive descent & LL(1) )

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing in Parallel on Multiple Cores and GPUs

Forest-Based Search Algorithms

Natural Language Dependency Parsing. SPFLODD October 13, 2011

CSE302: Compiler Design

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

3. Parsing. Oscar Nierstrasz

CS 406/534 Compiler Construction Parsing Part I

CS164: Midterm I. Fall 2003

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part IV Dependency Parsing 2 Winter 2019

CS 314 Principles of Programming Languages

Dependency Parsing. Allan Jie. February 20, Slides: Allan Jie Dependency Parsing February 20, / 16

BSCS Fall Mid Term Examination December 2012

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Midterm I - Solution CS164, Spring 2014

Klein & Manning, NIPS 2002

COP 3402 Systems Software Syntax Analysis (Parser)

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Week 2: Syntax Specification, Grammars

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

Modeling Sequence Data

Ling 571: Deep Processing for Natural Language Processing

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Syntactic Analysis. Top-Down Parsing

Homework & Announcements

Natural Language Processing

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

AUTOMATIC LFG GENERATION

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

CYK parsing. I love automata theory I love love automata automata theory I love automata love automata theory I love automata theory

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

2 Ambiguity in Analyses of Idiomatic Phrases

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Natural Language Processing

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Announcements. CS 188: Artificial Intelligence Spring Quadruped. Today. High-Level Control. Lecture 27: Conclusion 4/28/2010.

LL(1) predictive parsing

Solving systems of regular expression equations

Midterm I (Solutions) CS164, Spring 2002

Context-Free Grammars

Formal Languages and Compilers Lecture I: Introduction to Compilers

Vorlesung 7: Ein effizienter CYK Parser

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

Semantics of programming languages

Transcription:

CKY algorithm / PCFGs CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst some slides from Brendan O Connor

questions from last time project proposal due tmrw! midterm next week (30% of final grade)! 8.5 x 11 cheat sheet allowed, both sides, handwritten only breakdown: 20% text classification (NB, LR, NN) 20% language modeling 20% POS tagging / HMMs 20% word embeddings 20% machine translation can we get more challenging assignments? fine! HW3 will be more open-ended 2

today we ll be doing parsing: given a CFG, how do we use it to parse a sentence? 3

Formal Definition of Context-Free Grammar A context-free grammar G is defined by four parameters:!, ", #, $ 4

let s start with a simple CFG S > VP NN > dog > DT JJ NN 5

first, let s convert this to Chomsky Normal Form (CNF) β is either a single terminal from Σ or a pair of non-terminals from N 6

converting the simple CFG S > VP NN > dog > DT JJ NN > X NN X > DT JJ we can convert any CFG to a CNF. this is a necessary preprocessing step for the basic CKY alg., produces binary trees! 7

Parsing! Given a sentence and a CNF, we want to search through the space of all possible parses for that sentence to find: any valid parse for that sentence all valid parses the most probable parse Two approaches Pros and cons of each? bottom-up: start from the words and attempt to construct the tree top-down: start from START symbol and keep expanding until you can construct the sentence 8

Ambiguity in parsing Syntactic ambiguity is endemic to natural language: 1 I Attachment ambiguity: we eat sushi with chopsticks,. I Modifier scope: southern food store I Particle versus preposition: The puppy tore up the staircase. I Complement structure: The tourists objected to the guide that they couldn t hear. I Coordination scope: I see, said the blind man, as he picked up the hammer and saw. I Multiple gap constructions: The chicken is ready to eat 1 Examples borrowed from Dan Klein

today: CKY algorithm Cocke-Kasami-Younger (independently discovered, also known as CYK) a bottom-up parser for CFGs (and PCFGs). How he got into my pajamas, I'll never know. Groucho Marx 10

let s say I have this CNF S VP PP IN DET PP VP VBD VP VP PP PRP$ DET an VBD shot pajamas elephant I PRP I IN in PRP$ my 11 example adapted from David Bamman

build a chart! top-right is root 12

/ PRP VBD DET fill in first level (words) with possible derivations IN PRP$ 13

/ PRP VBD DET IN onto the second level! PRP$ 14

/ PRP VBD DET IN onto the second level! PRP$ this cell spans the phrase I shot 15

/ PRP VBD DET IN onto the second level! PRP$ what does this cell span? 16

/ PRP VBD DET onto the second level! do any rules produce VBD or PRP VBD? S VP PP IN DET 17 IN PP VP VBD VP VP PP PRP$ PRP$

/ PRP VBD DET onto the second level! do any rules produce VBD DET? S VP PP IN DET 18 IN PP VP VBD VP VP PP PRP$ PRP$

/ PRP VBD DET S VP onto the second level! do any rules produce DET? PP IN 19 IN DET PP VP VBD VP VP PP PRP$ PRP$

/ PRP VBD DET S VP onto the second level! do any rules produce DET? Yes! PP IN IN DET PP VP VBD VP VP PP PRP$ DET PRP$ 20

/ PRP VBD DET IN onto the third level! PRP$ 21

/ PRP VBD DET IN onto the third level! two ways to form I shot an : PRP$ I + shot an 22

/ PRP VBD DET onto the third level! IN two ways to form I shot an : I + shot an PRP$ I shot + an 23

/ PRP VBD DET onto the third level! IN what about this cell? PRP$ 24

S VP PP IN DET PP VP VBD VP VP PP PRP$ DET an VBD shot pajamas elephant I PRP I IN in PRP$ my 25

/ PRP VBD VP DET onto the third level! IN PRP$ 26

/ PRP VBD VP DET IN PP PRP$ 27

/ PRP VBD VP DET onto the fourth level! IN PP what are our options here? PRP$ 28

/ PRP VBD VP DET onto the fourth level! IN PP what are our options here? VP PRP$ 29

S VP PP IN DET PP VP VBD VP VP PP PRP$ DET an VBD shot pajamas elephant I PRP I IN in PRP$ my 30

/ PRP S VBD VP DET onto the fourth level! IN PP PRP$ 31

/ PRP S VBD VP DET S VP PP IN DET IN PP PP VP VBD PRP$ VP VP PP PRP$ 32

/ PRP S VBD VP DET S VP PP IN DET PP VP VBD VP VP PP PRP$ 33 IN PP PRP$

/ PRP S VBD VP DET 1 / 2 IN PP PRP$ 34

/ PRP S VBD VP S VP PP IN DET PP VP VBD VP VP PP PRP$ DET 1 / 35 2 IN PP PRP$

/ PRP S VBD VP S VP PP IN DET PP VP VBD VP VP PP PRP$ DET 1 / 36 2 IN PP PRP$

/ PRP S VBD VP VP1 / VP2 / VP3 S VP PP IN DET PP VP VBD VP VP PP PRP$ DET 1 / 37 2 IN PP PRP$

/ PRP S VBD VP VP1 / VP2 / VP3 finally, the root! S VP PP IN DET PP VP VBD VP VP PP PRP$ DET 1 / 38 2 IN PP PRP$

/ PRP S VBD VP VP 1 / VP2 / VP3 finally, the root! S VP PP IN DET PP VP VBD VP VP PP PRP$ DET 1 / 39 2 IN PP S > VP1 S > VP2 S > VP3 PRP$

/ PRP S S1 / S2 / S3 VBD VP VP1 / VP2 / VP3 finally, the root! S VP PP IN DET PP VP VBD VP VP PP PRP$ DET 1 / 40 2 IN PP S > VP1 S > VP2 S > VP3 PRP$ three valid parses!

how do we recover the full derivation of the valid parses S1 / S2 / S3? 41

CKY runtime? three nested loops, each O(n) where n is # words O(n 3 ) 42

how to find best parse? use PCFG (probabilistic CFG): same as CFG except each rule A > β in the grammar is associated with a probability p(β A) can compute probability of a parse T by just multiplying rule probabilities of the rules r that make up T p(t) = p(β r T r A r )

S VP, 0.4 PP IN, 0.1 DET, 0.3 PP, 0.1 VP VBD, 0.2 VP VP PP, 0.3 PRP$, 0.5 DET an, 0.9 VBD shot, 0.3 pajamas, 0.8 elephant, 0.9 I, 0.2 PRP I, 0.6 IN in, 0.9 PRP$ my, 0.8 44

(0.2) / PRP (0.6) VBD (0.3) DET (0.9) (0.8) fill in first level (words) with possible derivations and probabilities IN (0.9) PRP$ (0.8) (0.8) 45

(0.2) / PRP (0.6) VBD (0.3) DET (0.9) (0.8) how do we compute this cell s probability? IN (0.9) PRP$ (0.8) (0.8) 46

(0.2) / PRP (0.6) VBD (0.3) DET (0.9) (0.22) (0.8) how do we compute this cell s probability? p(det ) * P(cellDET) * P(cell) = 0.3 * 0.9 * 0.8 = 0.22 47 IN (0.9) PRP$ (0.8) (0.32) (0.8)

(-1.6) / S (-6.8) PRP (-0.51) VBD (-1.2) VP (-4.3) DET (-0.11) (-1.5) 1 / 2 (-0.22) (-6.0) IN (-0.11) PP (-3.5) let s switch to log space and fill out the table some more PRP$ (-0.22) (-1.1) (-0.22) 48

(-1.6) / S (-6.8) PRP (-0.51) VBD (-1.2) VP (-4.3) DET (-0.11) (-1.5) 1 / 2 (-0.22) (-6.0) IN (-0.11) PP (-3.5) p(1) =? p(2) =? PRP$ (-0.22) (-1.1) (-0.22) 49

(-1.6) / S (-6.8) PRP (-0.51) VBD (-1.2) VP (-4.3) DET (-0.11) (-1.5) 1 (-7.31) / 2 (-7.30) (-0.22) (-6.0) do we have to store both s? IN (-0.11) PP (-3.5) PRP$ (-0.22) (-1.1) (-0.22) 50

(-1.6) / S (-6.8) PRP (-0.51) VBD (-1.2) VP (-4.3) VP1 / VP2 DET (-0.11) (-1.5) (-7.3) (-0.22) (-6.0) IN (-0.11) PP (-3.5) p(vp1) =? p(vp2) =? PRP$ (-0.22) (-1.1) (-0.22) 51

(-1.6) / S (-6.8) PRP (-0.51) VBD VP (-4.3) VP 1 (-10.1) (-1.2) / VP2 (-9.0) DET (-0.11) (-1.5) (-7.3) (-0.22) (-6.0) IN (-0.11) PP (-3.5) do we need to store both VPs? 52 PRP$ (-0.22) (-1.1) (-0.22)

(-1.6) / PRP (-0.51) S (-6.8) S (-11.5) VBD (-1.2) VP (-4.3) VP (-9.0) DET (-0.11) (-1.5) (-7.3) (-0.22) (-6.0) IN (-0.11) PP (-3.5) PRP$ (-0.22) (-1.1) (-0.22) 53

(-1.6) / PRP (-0.51) S (-6.8) S (-11.5) VBD (-1.2) VP (-4.3) VP (-9.0) DET (-0.11) (-1.5) (-7.3) (-0.22) (-6.0) IN (-0.11) PP (-3.5) recover most probable parse by looking at back pointers PRP$ (-0.22) (-1.1) (-0.22) 54

issues w/ PCFGs independence assumption: each rule s probability is independent of the rest of the tree!!! doesn t take into account location in the tree or what words are involved (for A>BC) John saw the man with the hat John saw the moon with the telescope 55

add more info to PCFG! How to make good attachment decisions? Enrich PCFG with parent information: what s above me? lexical information via head rules VP[fight]: a VP headed by fight (or better, word/phrase embedding-based generalizations: e.g. recurrent neural network grammars (RNNGs)) 56

Lexicalization S VP DT the NN lawyer Vt questioned DT NN the witness + S(questioned) any issues with doing this? (lawyer) VP(questioned) DT(the) the NN(lawyer) lawyer Vt(questioned) questioned DT(the) (witness) NN(witness) 57 the witness

where do we get the PCFG probabilities? given a treebank, we can just compute the MLE estimate by counting and normalizing without a treebank, we can use the inside-outside algorithm to estimate probabilities by 1. randomly initializing probabilities 2. computing parses 3. computing expected counts for rules 4. re-estimate probabilities 5. repeat! this should sound familiar 58

exercise! 59