Computational Linguistics

Similar documents
Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

Ling 571: Deep Processing for Natural Language Processing

Assignment 4 CSE 517: Natural Language Processing

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Advanced PCFG Parsing

Parsing Chart Extensions 1

11-682: Introduction to IR, NLP, MT and Speech. Parsing with Unification Grammars. Reading: Jurafsky and Martin, Speech and Language Processing

October 19, 2004 Chapter Parsing

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Iterative CKY parsing for Probabilistic Context-Free Grammars

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

Lexical and Syntax Analysis. Top-Down Parsing

Natural Language Processing

Monday, September 13, Parsers

CS 224N Assignment 2 Writeup

Vorlesung 7: Ein effizienter CYK Parser

Advanced PCFG Parsing

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

Squibs and Discussions

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Advanced PCFG algorithms

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

Principles of Programming Languages COMP251: Syntax and Grammars

Principles of Programming Languages COMP251: Syntax and Grammars

CSE431 Translation of Computer Languages

S Y N T A X A N A L Y S I S LR

Homework & Announcements

CS 842 Ben Cassell University of Waterloo

Types of parsing. CMSC 430 Lecture 4, Page 1

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Bottom-Up Parsing. Lecture 11-12

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Lecture Notes on Shift-Reduce Parsing

What is Parsing? NP Det N. Det. NP Papa N caviar NP NP PP S NP VP. N spoon VP V NP VP VP PP. V spoon V ate PP P NP. P with.

Language Processing note 12 CS

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Parsing with Dynamic Programming

Computer Science 160 Translation of Programming Languages

Parsing in Parallel on Multiple Cores and GPUs

Introduction to Bottom-Up Parsing

Context-free grammars (CFG s)

Lecture 8: Deterministic Bottom-Up Parsing

22 23 Earley s Algorithm Drew McDermott , 11-30, Yale CS470/570

CS 170 DISCUSSION 8 DYNAMIC PROGRAMMING. Raymond Chan raychan3.github.io/cs170/fa17.html UC Berkeley Fall 17

CYK parsing. I love automata theory I love love automata automata theory I love automata love automata theory I love automata theory

6.863J Natural Language Processing Lecture 7: parsing with hierarchical structures context-free parsing. Robert C. Berwick

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CKY algorithm / PCFGs

CSE 3302 Programming Languages Lecture 2: Syntax

Lecture 7: Deterministic Bottom-Up Parsing

CYK- and Earley s Approach to Parsing

Introduction to Lexing and Parsing

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Compiler Design Concepts. Syntax Analysis

Lexical and Syntax Analysis

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

Intro to Bottom-up Parsing. Lecture 9

Syntax Analysis, III Comp 412

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

CS 230 Programming Languages

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

Bottom-Up Parsing II. Lecture 8

3. Parsing. Oscar Nierstrasz

The CKY algorithm part 2: Probabilistic parsing

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Outline. The strategy: shift-reduce parsing. Introduction to Bottom-Up Parsing. A key concept: handles

LSA 354. Programming Assignment: A Treebank Parser

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES

Wednesday, September 9, 15. Parsers

CSCI312 Principles of Programming Languages

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CS W3134: Data Structures in Java

Wednesday, August 31, Parsers

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

CSE302: Compiler Design

Chapter 3. Parsing #1

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

Bottom-Up Parsing. Lecture 11-12

UNIT III & IV. Bottom up parsing

Part 3. Syntax analysis. Syntax analysis 96

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

CS164: Midterm I. Fall 2003

LR Parsing Techniques

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Tekniker för storskalig parsning: Dependensparsning 2

PL Revision overview

Syntax and Grammars 1 / 21

Transcription:

Computational Linguistics CSC 2501 / 485 Fall 2018 3 3. Chart parsing Gerald Penn Department of Computer Science, University of Toronto Reading: Jurafsky & Martin: 13.3 4. Allen: 3.4, 3.6. Bird et al: 8.4, online extras 8.2 to end of section Chart Parsing in NLTK. Copyright 2017 Suzanne Stevenson, Graeme Hirst and Gerald Penn. All rights reserved.

Efficient parsing Want to avoid problems of blind search: Avoid redoing analyses that are identical in more than one path of the search. Guide the analysis with both the actual input the expectations that follow from the choice of a grammar rule. Combine strengths of both top-down and bottom-up methods. 2

Efficient parsing Want to avoid problems of blind search: Avoid redoing analyses that are identical in more than one path of the search. Guide the analysis with both the actual input the expectations that follow from the choice of a grammar rule. Combine strengths of both top-down and bottom-up methods. 3

Chart parsing Main idea: Use data structures to maintain information: a chart and an agenda Agenda: List of constituents that need to be processed. Chart: Records ( memoizes ) work; obviates repetition. Related ideas: Well-formed substring table (WFST); CKY parsing; Earley parsing; dynamic programming. 4

Charts 1 Notation for positions in sentence from 0 to n (length of sentence): 0 The 1 kids 2 opened 3 the 4 box 5 RHS not really there From: Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing in Python, v. 9.5.3, July 2008. Used under Creative Commons licence. 5

Charts 2 Contents of chart: 1. Completed constituents (inactive arcs). Representation: Labelled arc (edge) from one point in sentence to another (or same point). Directed; always left-to-right (or to self). Label is the left nonterminal of the grammar rule that derived it. 6

Charts 2 Contents of chart: 2. Partially built constituents (also called active arcs). Can think of them as hypotheses. Representation: Labelled arc (edge) from one point in sentence to another (or same point). Directed; always left-to-right (or to self). Label is grammar rule used for arc. 7

Notation for arc labels Notation: means complete to here. A X Y Z In parsing an A, we ve so far seen an X and a Y, and our A will be complete once we ve seen a Z. A X Y Z We have seen an X, a Y, and a Z, and hence completed the parse of an A. A X Y Z In parsing an A, so far we haven t seen anything. 8

From: Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing in Python, v. 9.5.3, July 2008. Used under Creative Commons licence. VP V NP VP V NP 9

Fundamental rule of chart parsing Arc extension: Let X, Y, Z be sequences of symbols, where X and Y are possibly empty. If the chart contains an active arc from i to j of the form A X B Y and a completed arc from j to k of the form B Z or B word then add an arc from i to k A X B Y 10

A X B Y A X B Y B Z Adapted from: Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing in Python, v. 9.5.3, July 2008. Used under Creative Commons licence. 11

Part of a chart from the NLTK chart parser demo, nltk.app.chartparser() 12

Part of a chart from the NLTK chart parser demo, nltk.app.chartparser() 13

Charts 3 An arc can connect any positions i, j (0 i j n). Can have > 1 arc on any i,j... But only one label for any i-j arc! Can associate all arcs on positions i,j with cell ij of upper-triangular matrix. 14

0 1 2 Arcs in top right corner cell cover the whole sentence. Those for S are parse edges. 3 4 5 6 7 0 1 2 3 4 5 6 7 The matrix for a seven-word sentence from the NLTK chart parser demo nltk.app.chartparser() 15

Bottom-up arc-addition rule Arc addition (or prediction): If the chart contains an completed arc from i to j of the form A X and the grammar contains a rule B A Z then add an arc from i to i B A Z or an arc B A Z from i to j. 17

B A Z B A Z A X Adapted from: Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing in Python, v. 9.5.3, July 2008. Used under Creative Commons licence. 18

Bottom-up chart parsing BKL s view Initialize chart with each word in the input sentence (and, in effect, with their lexical categories). Loop until nothing more happens: Apply the bottom-up prediction rule wherever you can. Apply the fundamental rule wherever you can. Return the trees corresponding to the parse edges in the chart. Implies that trees are built as parse progresses and are associated with each arc, or that each arc keeps pointers to the arcs of its constituents to allow post hoc reconstruction of trees. 19

>>> nltk.app.chartparser() Top-down Init Rule Top-down Predict Rule Top-down Strategy Bottom-up Strategy Bottom-up Left-Corner Strategy Bottom-up Predict Rule Bottom-up Left-Corner Predict Rule Fundamental Rule Reset Parser 20

Observations Builds all constituents exactly once (almost at least it won t add more than one inactive edge with the same label and i-j). Never re-computes the prefix of an RHS (of the same rule it will if two rules share the same prefix). Exploits context-free nature of rules to reduce the search. How? 21

Controlling the process Wherever you can : too uncontrolled. Try to avoid predictions and expansions that will lead nowhere. So use agenda a list of completed arcs. When an arc is completed, it is initially added to the agenda, not the chart. Agenda rules decide which completed arc to move to the chart next. E.g., treat agenda as stack or as queue; or pick item that looks most efficient or most likely ; or pick NPs first; or. 22

Bottom-up chart parsing J&M s view Initialize agenda with the list of lexical categories of each word in the input sentence. Until agenda is empty, repeat: Move next constituent C from agenda to chart. a. Find rules whose RHS starts with C and add corresponding active arcs to the chart. b. Find active arcs that continue with C and extend them; add the new active arcs to the chart. c. Find active arcs that have been completed; add their LHS as a new constituent to the agenda. 23

Bottom-up chart parsing algorithm 1 INITIALIZE: set Agenda = list of all possible categories of each input word (in order of input); set n = length of input; set Chart = (); ITERATE: loop if Agenda = () then if there is at least one S constituent from 0 to n then return SUCCESS else return FAIL end if else 33

Bottom-up chart parsing algorithm 2 */ Set Ci,j = First(Agenda); /* Remove first item from agenda. */ /* Ci,j is a completed constituent of type C from position i to position j Add Ci,j to Chart; ARC UPDATE: a. BOTTOM-UP ARC ADDITION (PREDICTION): for each grammar rule X C X1 XN do Add arc X C X1 XN, from i to j, to Chart; b. ARC EXTENSION (FUNDAMENTAL RULE): for each arc X X1 C XN, from k to i, do Add arc X X1 C XN, from k to j, to Chart; c. ARC COMPLETION: for each arc X X1 XN C added in step (a) or step (b) do Move completed constituent X to Agenda; end if end loop 34

Problem with bottom-up chart parsing Ignores useful top-down knowledge (rule contexts). 35

>>> nltk.app.chartparser() Add ambiguity to lexicon: N saw V dog NP N Parse bottom-up: the dog saw John 36

Top-down chart parsing Same as bottom-up, except new arcs are added to chart only if based on predictions from existing arcs. Initialize chart with unstarted active arcs for S. S X Y S Z Q Whenever an active arc is added, also add unstarted arcs for its next needed constituent. 38

>>> nltk.app.chartparser() Add ambiguity to lexicon: N saw V dog NP N Parse top-down: the dog saw John 39

Top-down chart parsing algorithm 1 INITIALIZE: set Agenda = list of all possible categories of each input word (in order of input); set n = length of input; set Chart = (); for each grammar rule S X1 XN do Add arc S X1 XN to Chart at position 0; apply TOP-DOWN ARC ADDITION [step (a ) below] to the new arc; end for ITERATE: loop if Agenda = () then if there is at least one S constituent from 0 to n then return SUCCESS else return FAIL end if else 41

Top-down chart parsing algorithm 2 Set Ci,j = First(Agenda); /* Remove first item from agenda. */ /* Ci,j is a completed constituent of type C from position i to position j */ Add Ci,j to Chart; ARC UPDATE: b. ARC EXTENSION (FUNDAMENTAL RULE): for each arc X X1 C XN, from k to i, do Add arc X X1 C XN, from k to j, to Chart; a. TOP-DOWN ARC ADDITION (PREDICTION): /* Recursive: until no new arcs can be added */ for each arc X X1 XL XN, from k to j, added in step (b) or (a ), do Add arc XL Y1 YM, from j to j, to Chart; c. ARC COMPLETION: for each arc X X1 XN C added in step (b) do Move completed constituent X to Agenda; end if end loop 42

Notes on chart parsing Chart parsing separates: 1. Policy for selecting constituent from agenda; 2. Policy for adding new arcs to chart; 3. Policy for initializing chart and agenda. Top-down and bottom-up now refer to arcaddition rule. Initialization rule gives bottom-up aspect in either case. Polynomial algorithm (around O(n 3 )), instead of exponential. 43