CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

Similar documents
Assignment 4 CSE 517: Natural Language Processing

Syntax Analysis, III Comp 412

CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees

The CKY algorithm part 1: Recognition

Syntax Analysis, III Comp 412

The CKY algorithm part 1: Recognition

Syntax Analysis Part I

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Natural Language Processing

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntactic Analysis. Top-Down Parsing

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Chapter 2 :: Programming Language Syntax

Ling 571: Deep Processing for Natural Language Processing

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

CS 406/534 Compiler Construction Parsing Part I

Lecture Notes for Advanced Algorithms

CSCI312 Principles of Programming Languages

Bottom-Up Parsing II. Lecture 8

CMSC 330: Organization of Programming Languages

There are many other applications like constructing the expression tree from the postorder expression. I leave you with an idea as how to do it.

Stating the obvious, people and computers do not speak the same language.

CSE 417 Network Flows (pt 4) Min Cost Flows

Project Compiler. CS031 TA Help Session November 28, 2011

Homework & NLTK. CS 181: Natural Language Processing Lecture 9: Context Free Grammars. Motivation. Formal Def of CFG. Uses of CFG.

Compilers and computer architecture From strings to ASTs (2): context free grammars

CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion

3. Parsing. Oscar Nierstrasz

CS52 - Assignment 8. Due Friday 4/15 at 5:00pm.

Lexical and Syntax Analysis. Top-Down Parsing

Week - 03 Lecture - 18 Recursion. For the last lecture of this week, we will look at recursive functions. (Refer Slide Time: 00:05)

Lecture 7. Binary Search Trees and Red-Black Trees

LR Parsing E T + E T 1 T

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Computer Science 160 Translation of Programming Languages

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Chapter 4 :: Semantic Analysis

COMP 250 Fall recurrences 2 Oct. 13, 2017

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

The CKY Parsing Algorithm and PCFGs. COMP-550 Oct 12, 2017

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

CSC D70: Compiler Optimization Register Allocation

Normal Forms and Parsing. CS154 Chris Pollett Mar 14, 2007.

The CKY algorithm part 2: Probabilistic parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

Homework & Announcements

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

Lecture 6. Binary Search Trees and Red-Black Trees

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

Semantic Analysis. Lecture 9. February 7, 2018

Ling/CSE 472: Introduction to Computational Linguistics. 5/4/17 Parsing

Introduction to Parsing. Lecture 8

Lexical and Syntax Analysis

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Syntax-Directed Translation. Lecture 14

CIS 341 Midterm March 2, 2017 SOLUTIONS

COP 3402 Systems Software Syntax Analysis (Parser)

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

CSE 417 Dynamic Programming (pt 5) Multiple Inputs

Outline. Lecture 17: Putting it all together. Example (input program) How to make the computer understand? Example (Output assembly code) Fall 2002

Context-Free Grammars

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Parsing II Top-down parsing. Comp 412

CS101 Introduction to Programming Languages and Compilers

6.001 Notes: Section 6.1

CSE 417 Network Flows (pt 3) Modeling with Min Cuts

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Decision Properties for Context-free Languages

Adam Blank Lecture 1 Winter 2017 CSE 332. Data Abstractions

Logistics. Half of the people will be in Wu & Chen The other half in DRLB A1 Will post past midterms tonight (expect similar types of questions)

CSE 5317 Midterm Examination 4 March Solutions

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

Lecture 8: Deterministic Bottom-Up Parsing

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Week 2: Syntax Specification, Grammars

Admin PARSING. Backoff models: absolute discounting. Backoff models: absolute discounting 3/4/11. What is (xy)?

CSE 417 Branch & Bound (pt 4) Branch & Bound

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 24: Online Algorithms

Introduction to Parsing. Comp 412

The Earley Parser

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Part 3. Syntax analysis. Syntax analysis 96

CS 842 Ben Cassell University of Waterloo

Transcription:

CSE 417 Dynamic Programming (pt 6) Parsing Algorithms

Reminders > HW9 due on Friday start early program will be slow, so debugging will be slow... should run in 2-4 minutes > Please fill out course evaluations

Dynamic Programming Review > Apply the steps... optimal substructure: (small) set of solutions, constructed from solutions to sub-problems that is guaranteed to include the optimal one 1. Describe solution in terms of solution to any sub-problems 2. Determine all the sub-problems you ll need to apply this recursively 3. Solve every sub-problem (once only) in an appropriate order > Key question: 1. Can you solve the problem by combining solutions from sub-problems? > Count sub-problems to determine running time total is number of sub-problems times time per sub-problem

Review From Previous Lectures > Previously... > Find opt substructure by considering how opt solution could use the last input given multiple inputs, consider how opt uses last of either or both given clever choice of sub-problems, find opt substructure by considering new options > Alternatively, consider the shape of the opt solution in general: e.g., tree structured

Today > Dynamic programming algorithms for parsing CKY is an important algorithm and should be understandable (everything after that is out of scope) > If you want to see more examples, my next two favorites are... 1. Optimal code generation (compilers) 2. System R query optimization (databases)

Outline for Today > Grammars > CKY Algorithm > Earley s Algorithm > Leo Optimization

Grammars > Grammars are used to understand languages > Important examples: natural languages programming languages

Natural Language Grammar > Example:

Natural Language Grammar > Input is a list of parts of speech noun (N), verb (V), preposition (P), determiner (D), conjunction (C), etc. N V N P N D N C D N Rachael Ray finds inspiration in cooking her family and her dog

Natural Language Grammar > Output is a tree showing structure S NP V NP NP PP P NP NP NP NP N N N D N C D N Rachael Ray finds inspiration in cooking her family and her dog

Programming Language Grammar > Input is a list of tokens identifiers, numbers, +, -, *, /, etc. N * N N * N + 3 * 4 + 5 * 6

Programming Language Grammar > Output is a tree showing structure + * * N N N N 3 * 4 + 5 * 6

Programming Language Grammar > Output is a tree showing structure T T F + F F * N F * N N N 3 * 4 + 5 * 6

Context Free Grammars > Definition: A context free grammar is a set of rules of the form A B 1 B 2... B k where each B i can be either a token (a terminal ) or another symbol appearing on the left-hand side of one of the rules (a non-terminal ) > The output of parsing is a tree with leaves labeled by terminals, internal nodes labeled by non-terminals, and the children of internal nodes matching some rule from the grammar e.g., can have a node labeled A with children B 1, B 2,..., B k want a specific non-terminal ( start symbol) as the root

Context Free Grammars > Example grammar for only multiplication: F F * N F N F F * N F * N F * N N 3 * 4 * 5 * 6

Context Free Grammars > Example grammar for simple arithmetic expressions: F F * N F N T F T + F T T + F T F F N * N F N * N 3 * 4 + 5 * 6

Context Free Grammars > Called context free because the rule A B 1 B 2... B k says that A look like B 1 B 2... B k anywhere > There are more general grammars called context sensitive parsing those grammars is harder than NP-complete (it is PSPACE-complete like generalized chess or go)

Context Free Grammars > We will limit the sorts of grammars we consider... > Definition: A grammar is in Chomsky normal form if every rule is in one of these forms: 1. A B, where B is a terminal 2. A B 1 B 2, where both B 1 and B 2 are non-terminals > In particular, this rules out empty rules: A removal of those simplifies things a lot

Context Free Grammars > Definition: A grammar is in Chomsky normal form if every rule is in one of these forms: 1. A C, where C is a terminal 2. A B 1 B 2, where both B1 and B2 are non-terminals > Fact: Any context free grammar can be rewritten into an equivalent one in Chomsky normal form hence, we can assume this without loss of generality (there can be some blowup in the size of the grammar though...)

Context Free Grammars > Example grammar for arithmetic in Chomsky normal form step 1: remove terminals on right hand side F F * N F N T T + F T F T T + F T F F F * N F N

Context Free Grammars > Example grammar for arithmetic in Chomsky normal form step 1: remove terminals on right hand side F F * N F N T T + F T F T T P F F F M N T F F N M * P +

Context Free Grammars > Example grammar for arithmetic in Chomsky normal form step 2: introduce new non-terminals to replace 3+ on right hand side F F * N F N T T + F T F T T P F F F M N T F F N M * P +

Context Free Grammars > Example grammar for arithmetic in Chomsky normal form step 2: introduce new non-terminals to replace 3+ on right hand side F F * N F N T T + F T F T T 1 F F F 1 N T 1 T P F 1 F M T F F N M * P +

Context Free Grammars > Example grammar for arithmetic in Chomsky normal form step 3: eliminate 1 non-terminal on RHS by substitution F F * N F N T T + F T F T T 1 F F F 1 N T 1 T P F 1 F M T F F N M * P +

Context Free Grammars > Example grammar for arithmetic in Chomsky normal form step 3: eliminate 1 non-terminal on RHS by substitution F F * N F N T T + F T F T T 1 F T 1 T P T 1 F P T F 1 N T N F F 1 N F 1 F M F N M * P +

Outline for Today > Grammars > CKY Algorithm > Earley s Algorithm > Leo Optimization

Parsing Context Free Grammars > Trying to find a tree... > Q: What technique do we know that might be helpful? > A: Dynamic programming!

Parsing Context Free Grammars > Apply dynamic programming... to find any tree that matches the data (can be generalized to find the most likely parse also...) > Think about what the parse tree for tokens 1.. n might look like root corresponds to some rule A B 1 B 2 (Chomsky Normal Form) child B 1 is root of parse tree for some 1.. k child B 2 is root of parse tree for k+1.. n (or it could be a leaf A C, where C is a terminal, if n=1)

Parsing Context Free Grammars > In general, parse tree for tokens i.. j might look like A C if i = j OR A B 1 B 2 where > child B 1 is root of parse tree for some i.. k > child B 2 is root of parse tree for k+1.. j > Try each of those possibilities (at most G ) for each (i,j) pair each requires checking j i + 1 possibilities for k need answers to sub-problem with j i smaller > can fill in the table along the diagonals, for example

Cocke-Kasami-Younger (CKY) > Example table from arithmetic example: 3 * 4 + 5 * 6 3 F/T * M 4 F/T + P 5 F/T T T 1 F T F 1 N T 1 T P T 1 F P F F 1 N F 1 F M T N F N M * P + * M 6 F/T

Cocke-Kasami-Younger (CKY) > Example table from arithmetic example: 3 * 4 + 5 * 6 3 F/T F 1 * M 4 F/T T 1 + P T T 1 F T F 1 N T 1 T P T 1 F P F F 1 N F 1 F M T N F N M * P + 5 F/T F 1 * M 6 F/T

Cocke-Kasami-Younger (CKY) > Example table from arithmetic example: 3 * 4 + 5 * 6 3 F/T F 1 F/T * M 4 F/T T 1 T + P 5 F/T F 1 F/T T T 1 F T F 1 N T 1 T P T 1 F P F F 1 N F 1 F M T N F N M * P + * M 6 F/T

Cocke-Kasami-Younger (CKY) > Example table from arithmetic example: 3 * 4 + 5 * 6 3 F/T F 1 F/T T 1 * M 4 F/T T 1 T + P 5 F/T F 1 F/T T T 1 F T F 1 N T 1 T P T 1 F P F F 1 N F 1 F M T N F N M * P + * M 6 F/T

Cocke-Kasami-Younger (CKY) > Example table from arithmetic example: 3 * 4 + 5 * 6 3 F/T F 1 F/T T 1 T * M 4 F/T T 1 T + P 5 F/T F 1 F/T T T 1 F T F 1 N T 1 T P T 1 F P F F 1 N F 1 F M T N F N M * P + * M 6 F/T

Cocke-Kasami-Younger (CKY) > Example table from arithmetic example: 3 * 4 + 5 * 6 3 F/T F 1 F/T T 1 T T * M 4 F/T T 1 T + P 5 F/T F 1 F/T T T 1 F T F 1 N T 1 T P T 1 F P F F 1 N F 1 F M T N F N M * P + * M 6 F/T

Cocke-Kasami-Younger (CKY) > Can reconstruct the tree from the table as usual. 3 * 4 + 5 * 6 3 F/T F 1 F/T T 1 T T * M 4 F/T T 1 T + P 5 F/T F 1 F/T T T 1 F T F 1 N T 1 T P T 1 F P F F 1 N F 1 F M T N F N M * P + * M 6 F/T

Cocke-Kasami-Younger (CKY) > Running time is O( G n 3 ) in NLP, G >> n, so this is great in PL, G < n, so this is not great in algorithms, this is usually considered O(n 3 ) since G is a constant > I will follow this convention for the rest of the lecture... > Algorithm easily generalizes to find most likely parse tree frequently used in NLP case

Outline for Today > Grammars > CKY Algorithm > Earley s Algorithm > Leo Optimization

Improving CKY (out of scope) > CKY is not optimal even for general grammars... (can be improved using fast matrix multiplication) > PLUS we know that certain grammars can be parsed much faster > In particular, there exist O(n) algorithms for typical PL grammars O(n 3 ) was out of the question in 1965... > Arithmetic example is one of those notice how the table is mostly blank that s a lot of wasted effort

Improving CKY > To get to O(n), we cannot fill in an n x n table doing so always requires Ω(n 2 ) time

Improving CKY > Idea: coalesce columns... 3 * 4 + 5 * 6 3 N/F/T F 1 F/T T 1 T T * M 4 N/F/T + P 5 N/F/T F 1 F/T * M 6 N/F/T

Improving CKY > Idea: coalesce columns... let I j include everything in the column j these are rules that parse i.. j for some i > Need to remember i as well write entries of I j as A (i), recording both symbol and where parsing started

Improving CKY > Now we fill in the sets I 1, I 2,..., I n parsing left to right > If we are lucky enough to get I j = O(1) for all j, this could be a linear time algorithm assuming we can build I j in O(1) time > Latter means we cannot look at all previous I j s probably need to only look at I j-1

Improving CKY: False Start > Suppose I j is the set of A (i) where A matches i.. j > How do we build I j? > If N is the j-th symbol of input, add A (j) for every A N rule > What next? > If C (k) is in I j, we might need to add A (i) for any A B C... A (i) should be added if B (i) is in I k it takes O(n) time to try every k in 1.. j-1 so we are back to Ω(n 2 )

Improving CKY > To get O(n), we need to keep track of anything we might need to use later on in order to complete the parsing of a rule > Specifically, if we have parsed B (i), we need to keep track of the fact that it could be used to get an A B C (i) if we later see C > We write this fact as A B C (i), which, in I j, means that we have parsed the B part at i.. j (the C part can be missing here i.e., if the rule is A B, where B is a non-terminal)

Improved Parser > Let I j be the set of elements like A B C (i), where: 1. B matches input tokens i.. j 2. It is possible for A to follow something that matches input tokens 1.. i-1 > Note that can be at beginning, middle, or end (we may as well drop the limit of only 2 symbols on the RHS) > Second part is another optimization don t waste time trying to parse rules that aren t useful based on what came earlier

Improved Parser > Let I j be the set of elements like A B C (i), where: 1. B matches input tokens i.. j 2. It is possible for A to follow something that matches input tokens 1.. i-1 > Fill in I j as follows: add anything that could follow I j-1 and matches input token j > (if A B C (i) is in I j-1, then C could follow) for each added complete item A B C (i) added: > if I i contains A B A (i ), then add A B A (i ) to I j > (likewise for A A (i ) ) add all those items that could follow the ones already added only part that is potentially slow...

Improved Parser > Fill in I j as follows: add anything that could follow I j-1 and matches input token j > (if A B C (i) is in I j-1, then C could follow) for each added complete item A B C (i) added: > if I i contains A B A (i ), then add A B A (i ) to I j > (likewise for A A (i ) ) add all those items that could follow the ones already added > If all I j s are size O(1), then this is O(1) time per item hence, O(n) over all

Earley s algorithm > This version is called Earley s algorithm > It was developed independently of CKY by Earley (relation to CKY was noted by Ruzzo et al.) also considered a dynamic programming algorithm > the sub-problems being solved are not quite so obvious as in CKY

Earley s algorithm > Can be shown that Earley s algorithm runs in O(n 2 ) time for any unambiguous grammar meaning there is only one possible parse tree > typical of PL grammars (though not NLP grammars) > Can also be shown it runs in O(n) time for nice LR(k) grammars > BUT not for all LR(k) grammars latter can be parsed in O(n) time by other algorithms > The running time is at least the sum of sizes of the I j s...

Outline for Today > Grammars > CKY Algorithm > Earley s Algorithm > Leo Optimization

Bad Cases for Earley > Q: Can the Ij s be O(n) for some unambiguous grammar s?

Bad Cases for Earley > Q: Can the Ij s be O(n) for some unambiguous grammar s? > A: Unfortunately, yes A a B b B A B B B B B B B A A A A A A B > All B s completed in I n a a a a a a b

Bad Cases for Earley > Q: Can the Ij s be O(n) for some unambiguous grammar s? > A: Unfortunately, yes B B B B > This is a right recursive grammar > Fortunately, these are the only bad cases (O(n) otherwise) B B A A A A A A B > Grammars can be usually be rewritten to avoid it a a a a a a b

Joop Leo s Optimization > Alternatively, we can improve the algorithm to handle those... > Leo makes the following optimization: only record the top-most item in a tall stack like this (actually O(1) copies of it depending on how we might look for it later) > Can then show that the I j s are O(1) size number with dot not at end is O(1) due to LR(k) property clever argument shows number with dot at end is also O(1) > removing stacks leaves tree with all 2+ children and leaves those above > (furthermore, each is discovered only once for unambiguous grammars)

Joop Leo s Optimization > Alternatively, we can improve the algorithm to handle those... > Leo makes the following optimization: only record the top-most item in a tall stack like this (actually O(1) copies of it depending on how we might look for it later) > Result is O(n) in the worst case for LR(k) (i.e., for anything parsable by deterministic push-down automaton covers almost every PL grammar

Parsers in Practice > CKY and Earley are used in NLP recall that G is usually larger there > In PL, we typically use special grammars (e.g., LR(k)) that can be parsed in linear time LR(k) was invented by Don Knuth parses anything that can be parsed by a deterministic push-down automaton > Earley + Leo gives the same asymptotic performance expect it to see more use given speed of computers (LR parsing was developed for machines 10k x slower)