Parsing. Top-Down Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 19

Similar documents
Parsing. Cocke Younger Kasami (CYK) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 35

Parsing. Earley Parsing. Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 39

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

PDA s. and Formal Languages. Automata Theory CS 573. Outline of equivalence of PDA s and CFG s. (see Theorem 5.3)

Multiple Choice Questions

Parsing. For a given CFG G, parsing a string w is to check if w L(G) and, if it is, to find a sequence of production rules which derive w.

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax Analysis Part I

Types of parsing. CMSC 430 Lecture 4, Page 1

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CT32 COMPUTER NETWORKS DEC 2015

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

JNTUWORLD. Code No: R

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CMSC 330: Organization of Programming Languages

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Models of Computation II: Grammars and Pushdown Automata

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS375: Logic and Theory of Computing

컴파일러입문 제 6 장 구문분석

Chapter 4: LR Parsing

CS 314 Principles of Programming Languages. Lecture 3

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Parsing Part II (Top-down parsing, left-recursion removal)

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

CMSC 330: Organization of Programming Languages. Context-Free Grammars Ambiguity

Introduction to Syntax Analysis

A programming language requires two major definitions A simple one pass compiler

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CMSC 330: Organization of Programming Languages

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSCI312 Principles of Programming Languages

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

CS 321 Programming Languages and Compilers. VI. Parsing

QUESTION BANK. Formal Languages and Automata Theory(10CS56)

LL(1) predictive parsing

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Introduction to Syntax Analysis. The Second Phase of Front-End

Building Compilers with Phoenix

R10 SET a) Construct a DFA that accepts an identifier of a C programming language. b) Differentiate between NFA and DFA?

Compilers. Predictive Parsing. Alex Aiken

Downloaded from Page 1. LR Parsing

CMSC 330: Organization of Programming Languages

CS 314 Principles of Programming Languages

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Context-free grammars

Lecture 8: Context Free Grammars

UNIT-III BOTTOM-UP PARSING

CS 314 Principles of Programming Languages

How do LL(1) Parsers Build Syntax Trees?

LANGUAGE PROCESSORS. Introduction to Language processor:

CMSC 330: Organization of Programming Languages. Context Free Grammars

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

CMSC 330: Organization of Programming Languages

Chapter 3. Parsing #1

Lecture 7: Deterministic Bottom-Up Parsing

LR Parsers. Aditi Raste, CCOEW

Context Free Languages and Pushdown Automata

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

CMSC 330, Fall 2009, Practice Problem 3 Solutions

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CS143 Midterm Fall 2008

LL(1) predictive parsing

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

CMSC 330: Organization of Programming Languages. Context Free Grammars

Grammars and Parsing. Paul Klint. Grammars and Parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Part 3. Syntax analysis. Syntax analysis 96

Lecture 8: Deterministic Bottom-Up Parsing

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

4. Lexical and Syntax Analysis

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Syntax-Directed Translation. Lecture 14

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Syntax Analysis, III Comp 412

CMSC 330 Practice Problem 4 Solutions

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

General Overview of Compiler

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

4. Lexical and Syntax Analysis

Decision Properties for Context-free Languages

COMP 330 Autumn 2018 McGill University

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

CS502: Compilers & Programming Systems

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

CS 406/534 Compiler Construction Parsing Part I

Chapter 4. Lexical and Syntax Analysis

Transcription:

Parsing Top-Down Parsing Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 19

Table of contents 1 Introduction 2 The recognizer 3 The parser 4 Control structures 5 Parser generators 6 Conclusion 2 / 19

Introduction CFG parser that is a top-down parser: we start with S and subsequently replace lefthand sides of productions with righthand sides. a directional parser: the expanding of non-terminals (with appropriate righthand sides) is ordered; we start with the leftmost non-terminal and go through the righthand sides of productions from left to right. In particular: we determine the start position of the span of the ith symbol in a rhs only after having processed the i 1 preceding symbols. a LL-parser: we process the input from left to right while constructing a leftmost derivation. First proposed by Sheila Greibach (for CFGs in GNF). Grune and Jacobs (2008) 3 / 19

The recognizer (1) Assume CFG without left recursion A + Aα. Function top-down with arguments w: remaining input; α: remaining sentential form (a stack). top-down(w,α) iff α w (for α (N T), w T ) Initial call: top-down(w,s) 4 / 19

The recognizer (2) Top-down recognizer def top-down(w,α): out = false if w = α = ɛ: out = true elif w = aw and α = aα : out = top-down(w,α ) elif α = Xα with X N: for X X 1... X k in P: if top-down(w, X 1... X k α ): out = true return out scan predict 5 / 19

The recognizer (3) This is exactly what the following PDA-construction for a CFG does: start with stack Z 0 and q 0. δ(q 0, ɛ, Z 0 ) = { q 1, SZ 0 } q 1, α δ(q 1, ɛ, A) for all A α q 1, ɛ δ(q 1, a, a) for all a T. δ(q 1, ɛ, Z 0 ) = { q f, ɛ } (LL-PDA construction in JFLAP) 6 / 19

The recognizer (4) Example: Top-down recognizer G = N, T, P, S, N = {A, B}, T = {a, b, c} P = {S ASB AASB c, A a, B b} Input w = aacb. Calls of top-down (order is depth-first) stack α w stack α w 1. S aacb 9. b pred 2. ASB aacb pred(1) 10. AASB aacb pred(1) 3. asb aacb pred(2) 11. aasb aacb pred(10) 4. SB acb scan(3) scan predict 5. ASBB acb pred(4) 12. SB cb scan 6. asbb acb pred(5) 4 unseccessful predicts 7. SBB cb scan(6) 13. cb cb pred(12) 4 unseccessful predicts 14. B b scan(13) 8. cbb cb pred(7) 15. b b pred(14) scan predict scan 16. scan(15) 7 / 19

The parser (1) How to turn the recognizer into a parser: Add an analysis stack to the parser that allows you to construct the parse tree. Assume that for each A N, the righthand sides of A-productions are numbered (have indices). Whenever a production is applied (prediction step), the lefthand side is pushed on the analysis stack together with the index of the righthand side; a terminal a is scanned, a is pushed on the analysis stack. (This is needed for backtracking in a depth-first strategy.) 8 / 19

The parser (2) Top-down parser def top-down(w,α,γ): out = false if w = α = ɛ: output Γ out = true elif w = aw and α = aα : out = top-down(w,α, aγ) elif α = Xα with X N: for X X 1... X k in P with rhs-index i: if top-down(w, X 1... X k α, X, i Γ): out = true return out 9 / 19

The recognizer (3) Example: Top-down parser G = N, T, P, S, N = {A, B}, T = {a, b, c} P = {S ASB AASB c, A a, B b} Input w = aacb. Consider only the successful predicts and scans (X i is a notation for X,i ): stack α w analysis stack S aacb AASB aacb S 2 aasb aacb A 1 S 2 ASB acb aa 1 S 2 asb acb A 1 aa 1 S 2 SB cb aa 1 aa 1 S 2 cb cb S 3 aa 1 aa 1 S 2 B b cs 3 aa 1 aa 1 S 2 b b B 1 cs 3 aa 1 aa 1 S 2 bb 1 cs 3 aa 1 aa 1 S 2 the analysis stack gives a leftmost derivation in reverse order. Leftmost derivation: S 2 A 1 A 1 S 3 B 1 10 / 19

The parser (4) Problematic grammars for this parser: CFGs that allow for left-recursions. Solutions: Eliminate the left-recursion. Drawback: derivation trees change considerably. Make sure, grammar does not contain ɛ-productions or loops. Then do an additional check (when predicting):... then for all X X 1... X k : if w X 1... X k α and top-down(w, X 1... X k α ) then out := true; This check is useful even for grammars that are not left-recursive. 11 / 19

An example (1) Grammar S AB A aab a B b S aabb? AB aabb? aabb aabb? ab aabb? ABB abb? B abb? aabbb abb? ABBB bb? abb abb? BB bb? b abb? aabbbb bb? abbb bb? bb bb? B b? b b? (basic algorithm) ɛ ɛ? 12 / 19

An example (2) (check that word length length of sentential form) S aabb? AB aabb? aabb aabb? ab aabb? ABB abb? abb abb? B abb? b abb? BB bb? bb bb? B b? b b? ɛ ɛ? 13 / 19

Control structures (1) In general, directional top-down parsing is non-deterministic because of multiple righthand sides for single non-terminals. Two different control strategies: You can either proceed depth-first (proceed the righthand sides one after the other, each time pursuing the possible derivation tree up to the moment where we either find a parse tree or fail) If we fail, we have to go back and try the next possibility (backtracking). For this, we have to reverse the operations made on the stacks. or proceed breadth-first (try all righthand sides in parallel) Usually, all possible predicts are done before scanning the next input symbol. These are different control structures, they are not part of the general top-down parsing algorithm. 14 / 19

Control structures (2) Advantages and disadvantages: Breadth-first: Needs a lot of memory. Depth-first (backtracking): Does not need much memory. If all parse trees are searched for and the grammar is known to be ambiguous, more time-consuming than breadth-first. No perfect solution. The best option depends on the grammar, the input, the task (exhaustive parsing or not), the programming language used... 15 / 19

Parser generators (1) In general, we can either implement a general CFG parser (perhaps for a restricted type of CFG) that takes G and w as input w, G Parser parse trees/no or generate a specific parser for a given grammar. The new parser receives only w as input. G Parser generator Parser parse trees/no w 16 / 19

Parser generators (2) Parser generators for top-down (LL) parsers often use a technique called recursive descent: for each non-terminal X, a procedure is generated that tries all rhs of X-productions with calls for all non-terminals it encounters (one procedure one production) procedures can call each other, in particular, they can call (directly or via other intermediate calls) itself again (recursive) Some recursive descent parser generators: JavaCC, Java Compiler Compiler: https://javacc.dev.java.net/ ANTLR, ANother Tool for Language Recognition (generates C++, Java, Python, C#): http://www.antlr.org/ 17 / 19

Conclusion Important features of directional top-down parsing: LL-parsing: input processed from left to right, constructs a leftmost derivation; parsing steps prediction and scan; non-deterministic in general; different control structures (breadth-first, depth-first); does not work for left-recursive CFGs; parser generation with recursive descent. 18 / 19

Grune, D. and Jacobs, C. (2008). Parsing Techniques. A Practical Guide. Monographs in Computer Science. Springer. Second Edition. 19 / 19