Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

Similar documents
Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Parsing. Top-Down Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 19

Types of parsing. CMSC 430 Lecture 4, Page 1

Parsing Chart Extensions 1

CA Compiler Construction

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Andreas Daul Natalie Clarius

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Parsing III. (Top-down parsing: recursive descent & LL(1) )

SLR parsers. LR(0) items

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

3. Parsing. Oscar Nierstrasz

Context-free grammars

Syntactic Analysis. Top-Down Parsing

Bottom-Up Parsing. Lecture 11-12

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Compiler Construction: Parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing Part II (Top-down parsing, left-recursion removal)

Bottom-Up Parsing. Lecture 11-12

UNIT-III BOTTOM-UP PARSING

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7


Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

CS 321 Programming Languages and Compilers. VI. Parsing

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Compiler Construction

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Lecture Bottom-Up Parsing

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

CS 314 Principles of Programming Languages

Algorithms for NLP. Chart Parsing. Reading: James Allen, Natural Language Understanding. Section 3.4, pp

CSCI312 Principles of Programming Languages

CS Parsing 1

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Compilers. Bottom-up Parsing. (original slides by Sam

In One Slide. Outline. LR Parsing. Table Construction

Vorlesung 7: Ein effizienter CYK Parser

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Lecture 8: Context Free Grammars

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CS 406/534 Compiler Construction Parsing Part I

Computer Science 160 Translation of Programming Languages

Chapter 4: LR Parsing

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Lecture 7: Deterministic Bottom-Up Parsing

Monday, September 13, Parsers

CMSC 330: Organization of Programming Languages

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form.

Lexical and Syntax Analysis. Bottom-Up Parsing

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

October 19, 2004 Chapter Parsing

Lecture 8: Deterministic Bottom-Up Parsing

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Lecture Notes on Top-Down Predictive LL Parsing

Parsing Techniques. CS152. Chris Pollett. Sep. 24, 2008.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

More Bottom-Up Parsing

A programming language requires two major definitions A simple one pass compiler

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Compilers. Predictive Parsing. Alex Aiken

Compiler Construction 2016/2017 Syntax Analysis

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

CMSC 330: Organization of Programming Languages

S Y N T A X A N A L Y S I S LR

CYK- and Earley s Approach to Parsing

UNIT III & IV. Bottom up parsing

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Wednesday, August 31, Parsers

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Solving systems of regular expression equations

Computational Linguistics

The CKY Parsing Algorithm and PCFGs. COMP-599 Oct 12, 2016

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

VIVA QUESTIONS WITH ANSWERS

Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5

CMSC 330: Organization of Programming Languages

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

LR Parsing LALR Parser Generators

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

shift-reduce parsing

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Principles of Programming Languages

Natural Language Processing

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Compila(on (Semester A, 2013/14)

Topdown parsing with backtracking

LR Parsing. Table Construction

Syntax Analysis, III Comp 412

Transcription:

Parsing Earley Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 39

Table of contents 1 Idea 2 Algorithm 3 Tabulation 4 Parsing 5 Lookaheads 2 / 39

Idea (1) Goal: overcome problems with pure TD/BU approaches. Earley s algorithm can be seen as a bottom-up parser with top-down control, i.e., a bottom-up parsing that does only reductions that can be top-down predicted from S, or a top-down parser with bottom-up recognition 3 / 39

Idea (2) At each time of parsing, one production A X 1... X k is considered such that some part X 1... X i has already been bottom-up recognized (completed) while some part X i+1... X k has been top-down predicted. As in the left-corner chart parser, this situation can be characterized by a dotted production (sometimes called Earley item) A X 1... X i X i+1... X k. Dotted productions are called active items. Productions of the form A α are called completed items. 4 / 39

Idea (3) The Earley parser simulates a top-down left-to-right depth-first traversal of the parse tree while moving the dot such that for each node first, the dot is to its left (the node is predicted), then the dot traverses the tree below, then the dot is to its right (the subtree below the node is completed) Each state of the parser can be characterized by a set of dotted productions A X 1... X i X i+1... X k. For each of these, one needs to keep track of the input part spanned by the completed part of the rhs, i.e., by X 1... X i. 5 / 39

Algorithm (1) The items describing partial results of the parser contain a dotted production and the start and end index of the completed part of the rhs: Item form: [A α β, i, j] with A αβ P, 0 i j n. Parsing starts with predicting all S-productions: Axioms: [S α, 0, 0] S α P 6 / 39

Algorithm (2) If the dot of an item is followed by a non-terminal symbol B, a new B-production can be predicted. The completed part of the new item (still empty) starts at the index where the completed part of the first item ends. Predict: [A α Bβ, i, j] [B γ, j, j] B γ P If the dot of an item is followed by a terminal symbol a that is the next input symbol, then the dot can be moved over this terminal (the terminal is scanned). The end position of the completed part is incremented. Scan: [A α aβ, i, j] [A αa β, i, j + 1] w j+1 = a 7 / 39

Algorithm (3) If the dot of an item is followed by a non-terminal symbol B and if there is a second item with a dotted B-production and a fully completed rhs and if, furthermore, the completed part of the second item starts at the position where the completed part of the first ends, then the dot in the first can be moved over the B while changing the end index to the end index of the completed B-production. Complete: [A α Bβ, i, j], [B γ, j, k] [A αb β, i, k] The parser is successfull if a completed S-production spanning the entire input can be deduced: Goal items: [S α, 0, n] for some S α P. 8 / 39

Algorithm (4) Note that this algorithm can deal with ɛ-productions; loops and left-recursions are no problem since an active item is generated only once; the algorithm works for any type of CFG. 9 / 39

Algorithm (5) Example Productions: S ab ba, A as baa a, B bs abb b. w = abab. Set of deduced items: 1. [S ab, 0, 0] axiom 2. [S ba, 0, 0] axiom 3. [S a B, 0, 1] scan with 1. 4. [B bs, 1, 1] predict with 3. 5. [B b, 1, 1] predict with 3. 6. [B abb, 1, 1] predict with 3. 7. [B b S, 1, 2] scan with 4. 8. [B b, 1, 2] scan with 5. 9. [S ab, 0, 2] complete with 3. and 8. 10 / 39

Algorithm (6) Example continued 10. [S ab, 2, 2] predict with 7. 11. [S ba, 2, 2] predict with 7. 12. [S a B, 2, 3] scan with 10. 13. [B bs, 3, 3] predict with 12. 14. [B b, 3, 3] predict with 12. 15. [B abb, 3, 3] predict with 12. 16. [B b S, 3, 4] scan with 13. 17. [B b, 3, 4] scan with 14. 18. [S ab, 4, 4] predict with 16. 19. [S ba, 4, 4] predict with 16. 20. [S ab, 2, 4] complete with 12. and 17. 21. [B bs, 1, 4] complete with 7. and 20. 22. [S ab, 0, 4] complete with 3. and 21. 11 / 39

Algorithm (7) Soundness and completeness: The following holds: [A α β, i, j] iff S w 1... w i Aγ w 1... w i αβγ w 1... w i w i+1... w j βγ for some γ (N T ). The algorithm is in particular prefix-valid: if there is an item with end position j, then there is a word in the language with prefix w 1... w j. 12 / 39

Algorithm (8) In addition, one can use passive items [A, i, j] with A N, 0 i j n. Then, we need an additional convert rule, that converts a completed active item into a passive one: Convert: [B γ, j, k] [B, j, k] The goal item is then [S, 0, n]. 13 / 39

Algorithm (9) The Complete rule can use passive items now: Complete: [A α Bβ, i, j], [B, j, k] [A αb β, i, k] The advantage is that we obtain a higher degree of factorization: A B-subtree might have different analyses. Complete can use this B category now independent from the concrete analyses, i.e., there is only one single application of Complete for all of them. 14 / 39

Tabulation (1) We can tabulate the dotted productions depending on the indices of the covered input. I.e., we adopt a (n + 1) (n + 1)-chart C with A α β C i,j iff [A α β, i, j]. The chart is initialized with C 0,0 := {S α S α P} and C i,j = for all i, j [0..n] with i 0 or j 0. It can then be filled in the following way: 15 / 39

Tabulation (2) Let us consider the version without passive items. The chart is filled row by row: for every end-of-span index k: we first compute all applications of predict and complete that yield new items with end-of-span index k; then, we compute all applications of scan which gives items with end-of-span index k + 1. 16 / 39

Tabulation (3) for all k [0..n]: do until chart does not change any more: for all j [0..k] and all p C j,k : if p = A α Bβ then add B γ to C k,k else if p = B γ then for all i [0..j]: if there is a A α Bβ C i,j then add A αb β to C i,k for all j [0..k] and for all p C j,k : if p = A α w k+1 β then add A αw k+1 β to C j,k+1 for all B γ P predict scan complete 17 / 39

Tabulation (4) Note that predict and complete do not increment the end of the spanned input, i.e., they add only elements to the fields C...,k (the k-th row of the chart). Scan however adds elements to the C...,k+1 (the k + 1-th row). This is why first, all possible predict and complete operations are performed to generate new chart entries in the k-th row. Then, scan is applied and one can move on to the next row k + 1. Since predict and complete are applied as often as possible, ɛ-productions and left recursion are no problem for this algorithm. 18 / 39

Tabulation (5) Implementation: Besides the chart, for every k, we keep an agenda A k of those items from the chart that still need to be processed. Initially, for k = 0, this agenda contains all S α, the other agendas are empty. We process the items in the agendas from k = 0 to k = n. For each k, we stop once the k-agenda is empty. 19 / 39

Tabulation (6) Items x of the form A α Bβ trigger a predict operation. The newly created items, if they are not yet in the chart, are added to chart and k-agenda. In addition, if ɛ-productions are allowed, x also triggers a complete where the chart is searched for a completed B-item ranging from k to k. The new items (if not in the chart yet) are added to the k-agenda and the chart. x is removed from the k-agenda. 20 / 39

Tabulation (8) Items x of the form B γ trigger a complete operation where the chart is searched for corresponding items A α Bβ. The newly created items are added to the chart and the k- agenda (if they are not yet in the chart), x is removed from the k-agenda. Items x of the form A α aβ trigger a scan operation. The newly created items (if not yet in the chart) are added to the chart and the k + 1-agenda, x is removed from the k-agenda. 21 / 39

Tabulation (9) Example without ɛ-productions Productions S ASB c, A a, B b, input w = acb A 0 = {[S ASB, 0, 0], [S c, 0, 0]} 2 1 0 S ASB S c 0 1 2 3 S ASB triggers a predict: 22 / 39

Tabulation (10) Example continued A 0 = {[S c, 0, 0], [A a, 0, 0]} 2 1 0 S ASB S c A a 0 1 2 3 S c triggers a scan that fails, A a triggers a successful scan: 23 / 39

Tabulation (11) Example continued A 0 = {}, A 1 = {[A a, 0, 1]} 2 1 A a 0 S ASB S c A a 0 1 2 3 A a triggers a complete: 24 / 39

Tabulation (12) Example continued A 0 = {}, A 1 = {[S A SB, 0, 1]} 2 1 S A SB A a 0 S ASB S c A a 0 1 2 3 S A SB triggers a predict: 25 / 39

Tabulation (13) Example continued A 0 = {}, A 1 = {[S ASB, 1, 1], [S c, 1, 1]} 2 1 S A SB S ASB A a S c 0 S ASB S c A a 0 1 2 3 S ASB triggers a predict: 26 / 39

Tabulation (14) Example continued A 0 = {}, A 1 = {[S c, 1, 1], [A a, 1, 1]} 2 1 S A SB S ASB A a S c A a 0 S ASB S c A a 0 1 2 3 S c triggers a scan: 27 / 39

Tabulation (15) Example continued A 0 = {}, A 1 = {[A a, 1, 1]}, A 2 = {[S c, 1, 2]} 2 S c 1 S A SB S ASB A a S c A a 0 S ASB S c A a 0 1 2 3 A a triggers a scan that fails, then S c triggers a complete: 28 / 39

Tabulation (16) Example continued A 0 = {}, A 1 = {}, A 2 = {[S AS B, 0, 2]} 2 S AS B S c 1 S A SB S ASB A a S c A a 0 S ASB S c A a 0 1 2 3 S AS B triggers the prediction of [B b, 2, 2], which triggers a successful scan: 29 / 39

Tabulation (17) Example continued A 0 = {}, A 1 = {}, A 2 = {}, A 3 = {[B b, 2, 3]} 3 B b 2 S AS B S c B b 1 S A SB S ASB A a S c A a 0 S ASB S c A a 0 1 2 3 B b triggers a complete which leads to a goal item: 30 / 39

Tabulation (18) Example continued A 0 = {}, A 1 = {}, A 2 = {}, A 3 = {} 3 S ASB B b 2 S AS B S c B b 1 S A SB S ASB A a S c A a 0 S ASB S c A a 0 1 2 3 31 / 39

Tabulation (19) Example with ɛ-productions Productions S ABB, A a, B ɛ, input w = a 1 A a S A BB B S AB B S ABB 0 S ABB A a 0 1 32 / 39

Tabulation (20) Overview: TD/BU and mixed approaches: TD/BU item form chart parsing Top-down TD [Γ, i] no Bottom-up BU [Γ, i] no CYK BU [A, i, l] yes Left corner mixed [Γ compl, Γ td, Γ lhs ] no [A α β, i, l] yes Earley mixed [A α β, i, j] yes 33 / 39

Parsing (1) So far, we have a recognizer. One way to extend it to a parser it to read off the parse tree in a top-down way from the chart. Alternatively, in every completion step, we can record in the chart the way the new item can be obtained by adding pointers to its pair of antecedents. Then, for constructing the parse tree, we only need to follow the pointers. 34 / 39

Parsing (2) First possibility (initial call parse-tree(s,0,n)): parse-tree(x,i,j) trees := ; if X = w j and j = i + 1 then trees := {w j } else for all X X 1... X r C i,j for all i 1,..., i r, i i 1 i r 1 i r = j and all t 1,... t r with t 1 parse-tree(x 1, i, i 1 ) and t l parse-tree (X l, i l 1, i l ) for 1 < l r: trees := trees {X(t 1,..., t r )}; output trees 35 / 39

Parsing (3) Second possibility: We equip items with an additional set of pairs of pointers to other items in the item set. Whenever an item [A αa β, i, k] is obtained in a complete operation from [A α Aβ, i, j] and [A γ, j, k], we add a pair of pointers to the two antecedent items to the pointer set of the consequent item. Obviously, items might have more than one element in their set if the grammar is ambiguous. 36 / 39

Parsing (4) Example: Backpointers Productions S AB, A Ac a, B cb b, input w = acb. Item set (with list of pointer pairs): 1 [S AB, 0, 0] 2 [A Ac, 0, 0] 3 [A a, 0, 0] 4 [A a, 0, 1] 5 [A A c, 0, 1], { 2, 4 } 6 [S A B, 0, 1], { 1, 4 } 7 [B cb, 1, 1] 8 [B b, 1, 1] 9 [A Ac, 0, 2] 10 [B c B, 1, 2] 11 [A A c, 0, 2], { 2, 9 } 12 [S A B, 0, 2], { 1, 9 } 13 [B cb, 2, 2] 14 [B b, 2, 2] 15 [B b, 2, 3] 16 [B cb, 1, 3], { 10, 15 } 17 [S AB, 0, 3], { 6, 16, 12, 15 } 37 / 39

Lookaheads Two kinds of lookaheads: a prediction lookahead and a reduction lookahead: Predict with lookahead: [A α Bβ, i, j] B γ P, w [B γ, j, j] j+1 First(γ) or ɛ First(γ) Complete with lookahead: [A α Bβ, i, j], [B γ, j, k] [A αb β, i, k] w k+1 First(β) or ɛ First(β) and w k+1 Follow(A) Instead of precomputing the Follow sets, one can compute the actual follows while predicting. 38 / 39

Conclusion Earley is a top-down restricted bottom-up parser. The three operations to compute new partial parsing results are predict, scan and complete. Earley is a chart parser. Earley can parse all CFGs. 39 / 39