The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Similar documents
The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

Ling 571: Deep Processing for Natural Language Processing

CKY algorithm / PCFGs

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

The CKY algorithm part 1: Recognition

Parsing Chart Extensions 1

Natural Language Processing

The CKY algorithm part 1: Recognition

The CYK Parsing Algorithm

Assignment 4 CSE 517: Natural Language Processing

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Elements of Language Processing and Learning lecture 3

Advanced PCFG Parsing

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Advanced PCFG algorithms

Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

The CKY algorithm part 2: Probabilistic parsing

CS 224N Assignment 2 Writeup

Advanced PCFG Parsing

Probabilistic Parsing of Mathematics

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Decision Properties for Context-free Languages

Context Free Grammars. CS154 Chris Pollett Mar 1, 2006.

Computational Linguistics

CSE302: Compiler Design

Lecture 8: Context Free Grammars

The Expectation Maximization (EM) Algorithm

Solving systems of regular expression equations

Syntax Analysis Part I

Principles of Programming Languages COMP251: Syntax and Grammars

Parsing III. (Top-down parsing: recursive descent & LL(1) )

BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling. Colin Dewey (adapted from slides by Mark Craven)

Vorlesung 7: Ein effizienter CYK Parser

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Algorithms for AI and NLP (INF4820 TFSs)

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Dependency grammar and dependency parsing

Parsing in Parallel on Multiple Cores and GPUs

Dependency grammar and dependency parsing

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

Dependency grammar and dependency parsing

COP 3402 Systems Software Syntax Analysis (Parser)

Syntactic Analysis. Top-Down Parsing

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

Natural Language Processing

CS 406/534 Compiler Construction Parsing Part I

CS 314 Principles of Programming Languages

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

Parsing partially bracketed input

Syntax Analysis. Chapter 4

LR Parsing LALR Parser Generators

CSCI312 Principles of Programming Languages

CYK parsing. I love automata theory I love love automata automata theory I love automata love automata theory I love automata theory

Syntax. In Text: Chapter 3

Yet More CFLs; Turing Machines. CS154 Chris Pollett Mar 8, 2006.

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

CYK- and Earley s Approach to Parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

3. Parsing. Oscar Nierstrasz

Homework & Announcements

Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

October 19, 2004 Chapter Parsing

Compiler Design Concepts. Syntax Analysis

LR Parsing LALR Parser Generators

Bottom-Up Parsing. Lecture 11-12

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

COMS W4705, Spring 2015: Problem Set 2 Total points: 140

Introduction to parsers

Context-Free Grammars

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

LSA 354. Programming Assignment: A Treebank Parser

Introduction to Parsing. Comp 412

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Computer Science 160 Translation of Programming Languages

Context-Free Grammars. Carl Pollard Ohio State University. Linguistics 680 Formal Foundations Tuesday, November 10, 2009

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Iterative CKY parsing for Probabilistic Context-Free Grammars

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Chapter 4: Syntax Analyzer

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CMPT 755 Compilers. Anoop Sarkar.

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

Context-Free Grammars

CSCE 314 Programming Languages

Languages and Compilers

Monday, September 13, Parsers

Habanero Extreme Scale Software Research Project

CS502: Compilers & Programming Systems

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

Parsing II Top-down parsing. Comp 412

Wednesday, August 31, Parsers

Introduction to Parsing. Lecture 8

Homework. Announcements. Before We Start. Languages. Plan for today. Chomsky Normal Form. Final Exam Dates have been announced

Academic Formalities. CS Modern Compilers: Theory and Practise. Images of the day. What, When and Why of Compilers

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Syntax-Directed Translation. Lecture 14

Chapter 4. Syntax - the form or structure of the expressions, statements, and program units

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

LL(1) predictive parsing

Transcription:

The CKY Parsing Algorithm and PCFGs COMP-599 Oct 12, 2016

Outline CYK parsing PCFGs Probabilistic CYK parsing 2

CFGs and Constituent Trees Rules/productions: S VP this VP V V is rules jumps rocks Trees: S S VP VP Non-terminals this V this V rules rocks Terminals 3

Parsing into a CFG Given: 1. CFG 2. A sentence made up of words that are in the terminal vocabulary of the CFG Task: Recover all possible parses of the sentence. Why all possible parses? 4

Syntactic Ambiguity I shot the elephant in my pyjamas. S S VP VP I VP PP I V V in shot PP shot the elephant my pyjamas the elephant in my pyjamas 5

CYK Algorithm Cocke-Younger-Kasami algorithm Steps: A dynamic programming algorithm partial solutions are stored and efficiently reused to find all possible parses for the entire sentence. Also known as the CKY algorithm 1. Convert CFG to an appropriate form 2. Set up a table of possible constituents 3. Fill in table 4. Read table to recover all possible parses 6

Chomsky Normal Form To make things easier later, need all productions to be in one of these forms: 1. A B C, where A, B, C are nonterminals 2. A s, where A is a non-terminal s is a terminal This is actually not a big problem. 7

Converting to CNF (1) Rule of type A B C D Rewrite into: A X1 D and X1 B C Rule of type A s B Rewrite into: A X2 B and X2 s Rule of type A B Everywhere in which we see B on the LHS, replace it with A 8

Examples of Conversion Let s convert the following grammar fragment into CNF: S VP N I elephant pyjamas VP V PP V shot VP V Det my the N Det N Det N PP PP in 9

Next: Set Up a Table This table will store all of the constituents that can be built from contiguous spans within the sentence. Let sentence have N words. w[0], w[1], w[n-1] Create table, such that a cell in row i column j corresponds to the span from w[i:j+1], zero-indexed. Since i < j, we really just need half the table. The entry at each cell is a list of non-terminals that can span those words according to the grammar. 10

Parse Table I 0 shot 1 the 2 elephant 3 in 4 my 5 pyjamas 6 [0:1] [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] [2:3] [2:4] [2:5] [2:6] [2:7] S VP VP X1 PP X1 V VP V Det N X2 PP X2 Det N PP P P in I elephant pyjamas N I elephant pyjamas V shot Det my the [3:4] [3:5] [3:6] [3:7] [4:5] [4:6] [4:7] [5:6] [5:7] [6:7] 11

Filling in Table: Base Case One word (e.g., cell [0:1]) Easy add all the lexical rules that can generate that word 12

Base Case Examples (First 3 Words) I 0 shot 1 the 2 elephant 3 in 4 my 5 pyjamas 6 [0:1] N [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] V [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] [2:3] Det [2:4] [2:5] [2:6] [2:7] S VP VP X1 PP X1 V VP V Det N X2 PP X2 Det N PP P P in I elephant pyjamas N I elephant pyjamas V shot Det my the [3:4] [3:5] [3:6] [3:7] [4:5] [4:6] [4:7] [5:6] [5:7] [6:7] 13

Filling in Table: Recursive Step Cell corresponding to multiple words eg., cell for span [0:3] I shot the Key idea: all rules that produce phrases are of the form A B C So, check all the possible break points m in between the start i and the end j, and see if we can build a constituent with a rule in the form, A [i:j] B [i:m] C [m:j] 14

Recurrent Step Example 1 I 0 shot 1 the 2 elephant 3 in 4 my 5 pyjamas 6? [0:1] N [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] V [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] [2:3] [2:4] [2:5] [2:6] [2:7] S VP VP X1 PP X1 V VP V Det N X2 PP X2 Det N PP P P in I elephant pyjamas N I elephant pyjamas V shot Det my the [3:4] [3:5] [3:6] [3:7] [4:5] [4:6] [4:7] [5:6] [5:7] [6:7] 15

Recurrent Step Example 2 I 0 shot 1 the 2 elephant 3 in 4 my 5 pyjamas 6 [0:1] N [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] V [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] [2:3] Det [2:4]? [2:5] [2:6] [2:7] S VP VP X1 PP X1 V VP V Det N X2 PP X2 Det N PP P P in I elephant pyjamas N I elephant pyjamas V shot Det my the [3:4] N [3:5] [3:6] [3:7] [4:5] [4:6] [4:7] [5:6] [5:7] [6:7] 16

Backpointers I 0 shot 1 the 2 elephant 3 in 4 my 5 pyjamas 6 [0:1] N [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] V [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] [2:3] Det [2:4] [2:5] [2:6] [2:7] [3:4] N [3:5] [3:6] [3:7] S VP VP X1 PP X1 V VP V Det N X2 PP X2 Det N PP P P in I elephant pyjamas N I elephant pyjamas V shot Det my the [4:5] [4:6] [4:7] [5:6] [5:7] [6:7] Store where you came from! 17

Putting It Together I 0 shot 1 the 2 elephant 3 in 4 my 5 pyjamas 6 [0:1] N [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] V [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] S VP VP X1 PP X1 V VP V Det N X2 PP X2 Det N PP P P in I elephant pyjamas N I elephant pyjamas V shot Det my the (Det 2:3,N 3:4) Det [2:3] [2:4] [2:5] [2:6] [2:7] [3:4] N [3:5] [3:6] [3:7] [4:5] [4:6] [4:7] [5:6] [5:7] [6:7] Fill the table in the correct order! 18

Finish the Example Let s finish the example together for practice How do we reconstruct the parse trees from the table? 19

Dealing with Syntactic Ambiguity In practice, one of these is more like than the other: S S VP VP I VP PP I V V in shot PP shot the elephant my pyjamas the elephant in my pyjamas How to distinguish them? 20

Probabilistic CFGs Associate each rule with a probability: e.g., PP 0.2 Det N 0.4 I 0.1 V shot 0.005 Probability of a parse tree for a sentence is the product of the probabilities of the rules in the tree. 21

Formally Speaking For each nonterminal A N, Pr α β = 1 α β R s.t.α=a i.e., rules for each LHS form a probability distribution If a tree t contains rules α 1 β 1, α 2 β 2,, Pr t = Pr(α i β i ) Tree probability is product of rule probabilities i 22

Probabilistic Parsing Goal: recover the best parse for a sentence, along with its probability For a sentence, sent, let τ(sent) be the set of possible parses for it, we want to find argmax Pr(t) t τ(sent) Idea: extend CYK to keep track of probabilities in table 23

Extending CYK to PCFGs Previously, cell entries are nonterminals (+ backpointer) e.g., table[2:4] = {{, Det[2:3] N[3:4] }} table[3:4] = {{, } {N, }} Now, cell entries include the (best) probability of generating the constituent with that non-terminal e.g., table[2:4] = {{, Det[2:3] N[3:4], 0.215}} table[3:4] = {{,, 0.022} {N,, 0.04}} Equivalently, write as 3-dimensional array table[2, 4, ] = 0.215 (Det[2:3], N[3:4]) table[3, 4, ] = 0.022 table[3, 4, N] = 0.04 24

New Recursive Step Filling in dynamic programming table proceeds almost as before. During recursive step, compute probability of new constituents to be constructed: Pr(A[i:j] B[i:m] C[m:j]) = Pr(A BC) table[i,m,b] table[m,j,c] From PCFG From previously filled cells There could be multiple rules that form constituent A for span [i:j]. Take max: table[i,j,a] = max Pr(A i: j B i: m C m: j ) A BC, break at m 25

Example I 0 shot 1 the 2 elephant 3 in 4 my 5 pyjamas 6, 0.25 [0:1] N, 0.625 [0:2] [0:3] [0:4] [0:5] [0:6] [0:7] V, 1.0 [1:2] [1:3] [1:4] [1:5] [1:6] [1:7] [2:3] Det, 0.6 [2:4],? [2:5] [2:6] [2:7], 0.1 [3:4] N, 0.25 [3:5] [3:6] [3:7] New value: 0.6 * 0.25 * Pr( Det N) [4:5] [4:6] [4:7] [5:6] [5:7] [6:7] 26

Bottom-Up vs. Top-Down CYK algorithm is bottom-up Starting from words, build little pieces, then big pieces Alternative: top-down parsing Starting from the start symbol, expand non-terminal symbols according to rules in the grammar. Doing this efficiently can also get us all the parses of a sentence (Earley algorithm) 27

How to Train a PCFG? Derive from a treebank, such as WSJ. Simplest version: each LHS corresponds to a categorical distribution outcomes of the distributions are the RHS MLE estimates: #(α β) Pr α β = #α Can smooth these estimates in various ways, some of which we ve discussed 28