Compiler Construction

Similar documents
Compiler Construction

Compiler Construction

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Syntax Analysis Part IV

Context-free grammars

Compiler Construction: Parsing

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Bottom-Up Parsing. Lecture 11-12

Lecture 8: Deterministic Bottom-Up Parsing

CSE302: Compiler Design

Parsing How parser works?

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Conflicts in LR Parsing and More LR Parsing Types

Lecture 7: Deterministic Bottom-Up Parsing

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Principles of Programming Languages

In One Slide. Outline. LR Parsing. Table Construction

Bottom-Up Parsing. Lecture 11-12

Compilers. Bottom-up Parsing. (original slides by Sam

Lexical and Syntax Analysis. Bottom-Up Parsing

LR Parsing LALR Parser Generators

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Using an LALR(1) Parser Generator

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Yacc: A Syntactic Analysers Generator

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Compiler Construction

CSCI Compiler Design

Monday, September 13, Parsers

Wednesday, August 31, Parsers

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

LR Parsing LALR Parser Generators

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Syn S t yn a t x a Ana x lysi y s si 1

LR Parsing. Table Construction

Syntax-Directed Translation

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Compiler Construction

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

S Y N T A X A N A L Y S I S LR

CS 314 Principles of Programming Languages

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

TDDD55 - Compilers and Interpreters Lesson 3

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

COMPILER (CSE 4120) (Lecture 6: Parsing 4 Bottom-up Parsing )

Compiler Construction

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Concepts Introduced in Chapter 4

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Compiler Construction

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

TDDD55- Compilers and Interpreters Lesson 3

Compiler Construction 2016/2017 Syntax Analysis

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Compiler construction in4020 lecture 5


Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Compiler construction 2002 week 5

More Bottom-Up Parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CS415 Compilers. LR Parsing & Error Recovery

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

shift-reduce parsing

Lexical and Syntax Analysis

COMPILER CONSTRUCTION Seminar 02 TDDB44

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

Building Compilers with Phoenix

4. Lexical and Syntax Analysis

Compiler Lab. Introduction to tools Lex and Yacc

Configuration Sets for CSX- Lite. Parser Action Table

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Syntax Analyzer --- Parser

CPS 506 Comparative Programming Languages. Syntax Specification

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

FROWN An LALR(k) Parser Generator

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2004 Columbia University Department of Computer Science

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Transcription:

Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ws-1819/cc/

Recap: LR(1) Parsing Outline of Lecture 11 Recap: LR(1) Parsing LALR(1) Parsing Bottom-Up Parsing of Ambiguous Grammars Expressiveness of LL and LR Grammars Generating Top-Down Parsers Using ANTLR Generating Bottom-Up Parsers Using yacc and bison 2 of 37 Compiler Construction

Recap: LR(1) Parsing LR(1) Items and Sets Observation: not every element of fo(a) can follow every occurrence of A = refinement of LR(0) items by adding possible lookahead symbols Definition (LR(1) items and sets) Let G = N, Σ, P, S CFG Σ be start separated by S S. If S r αaaw r αβ 1 β 2 aw, then [A β 1 β 2, a] is called an LR(1) item for αβ 1. If S r αa r αβ 1 β 2, then [A β 1 β 2, ε] is called an LR(1) item for αβ 1. Given γ X, LR(1)(γ) denotes the set of all LR(1) items for γ, called the LR(1) set (or: LR(1) information) of γ. LR(1)(G) := {LR(1)(γ) γ X }. 3 of 37 Compiler Construction

Recap: LR(1) Parsing LR(1) Conflicts Definition (LR(1) conflicts) Let G = N, Σ, P, S CFG Σ and I LR(1)(G). I has a shift/reduce conflict if there exist A α 1 aα 2, B β P and x Σ ε such that [A α 1 aα 2, x], [B β, a] I. I has a reduce/reduce conflict if there exist x Σ ε and A α, B β P with A B or α β such that [A α, x], [B β, x] I. Lemma G LR(1) iff no I LR(1)(G) contains conflicting items. 4 of 37 Compiler Construction

Recap: LR(1) Parsing The LR(1) Action Function Definition (LR(1) action function) The LR(1) action function act : LR(1)(G) Σ ε {red i i [p]} {shift, accept, error} is defined by act(i, x) := red i if i 0, π i = A α and [A α, x] I shift if [A α 1 xα 2, y] I and x Σ accept if [S S, ε] I and x = ε error otherwise Corollary For every G CFG Σ, G LR(1) iff its LR(1) action function is well defined. 5 of 37 Compiler Construction

LALR(1) Parsing Outline of Lecture 11 Recap: LR(1) Parsing LALR(1) Parsing Bottom-Up Parsing of Ambiguous Grammars Expressiveness of LL and LR Grammars Generating Top-Down Parsers Using ANTLR Generating Bottom-Up Parsers Using yacc and bison 6 of 37 Compiler Construction

LALR(1) Parsing LALR(1) Parsing Motivation: resolving conflicts using LR(1) too expensive Example 10.11/10.17: LR(0)(G LR ) = 11, LR(1)(G LR ) = 15 7 of 37 Compiler Construction

LALR(1) Parsing LALR(1) Parsing Motivation: resolving conflicts using LR(1) too expensive Example 10.11/10.17: LR(0)(G LR ) = 11, LR(1)(G LR ) = 15 Empirical evaluations: A. Johnstone, E. Scott: Generalised Reduction Modified LR Parsing for Domain Specific Language Prototyping, HICSS 02, IEEE, 2002 X. Chen, D. Pager: Full LR(1) Parser Generator Hyacc and Study on the Performance of LR(1) Algorithms, C3S2E 11, ACM, 2011 Grammar LR(0)(G) LR(1)(G) Pascal 368 1395 Ansi-C 381 1788 C++ 1236 9723 7 of 37 Compiler Construction

LALR(1) Parsing LR(0) Equivalence Observation: potential redundancy by containment of LR(0) sets in LR(1) sets (cf. Corollary 10.13) Definition 11.1 (LR(0) equivalence) Let lr 0 : LR(1)(G) LR(0)(G) be defined by lr 0 (I) := {[A β 1 β 2 ] [A β 1 β 2, x] I}. Two sets I 1, I 2 LR(1)(G) are called LR(0)-equivalent (notation: I 1 0 I 2 ) if lr 0 (I 1 ) = lr 0 (I 2 ). 8 of 37 Compiler Construction

LALR(1) Parsing LALR(1) Sets Corollary 11.2 For every G CFG Σ, LR(1)(G)/ 0 = LR(0)(G). 9 of 37 Compiler Construction

LALR(1) Parsing LALR(1) Sets Corollary 11.2 For every G CFG Σ, LR(1)(G)/ 0 = LR(0)(G). Idea: merge LR(0)-equivalent LR(1) sets (maintaining the lookahead information, but possibly introducing conflicts) Definition 11.3 (LALR(1) sets) Let G CFG Σ. An information I LR(1)(G) determines the LALR(1) set [I] 0 = {I LR(1)(G) I 0 I}. The set of all LALR(1) sets of G is denoted by LALR(1)(G). 9 of 37 Compiler Construction

LALR(1) Parsing LALR(1) Sets Corollary 11.2 For every G CFG Σ, LR(1)(G)/ 0 = LR(0)(G). Idea: merge LR(0)-equivalent LR(1) sets (maintaining the lookahead information, but possibly introducing conflicts) Definition 11.3 (LALR(1) sets) Let G CFG Σ. An information I LR(1)(G) determines the LALR(1) set [I] 0 = {I LR(1)(G) I 0 I}. The set of all LALR(1) sets of G is denoted by LALR(1)(G). Remark: by Corollary 11.2, LALR(1)(G) = LR(0)(G) (but LALR(1) sets provide additional lookahead information) 9 of 37 Compiler Construction

LALR(1) Parsing LALR(1) Parsing Sketch of LALR(1) Parsing LALR(1) action function act : LALR(1)(G) Σ ε {red i i [p]} {shift, accept, error} defined in analogy to the LR(1) case (Definition 10.18) G CFG Σ has the LALR(1) property (G LALR(1)) if its LALR(1) action function is well defined Also LR(1) goto function (Definition 10.20) carries over to the LALR(1) case: goto : LALR(1)(G) X LALR(1)(G) act and goto form the LALR(1) parsing table But: merging of LR(1) sets can produce new conflicts LALR(1) used by yacc/bison parser generator (later) 10 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Outline of Lecture 11 Recap: LR(1) Parsing LALR(1) Parsing Bottom-Up Parsing of Ambiguous Grammars Expressiveness of LL and LR Grammars Generating Top-Down Parsers Using ANTLR Generating Bottom-Up Parsers Using yacc and bison 11 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Ambiguous Grammars Reminder (Definition 5.5): G CFG Σ is called unambiguous if every word w L(G) has exactly one syntax tree. Otherwise it is called ambiguous. 12 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Ambiguous Grammars Reminder (Definition 5.5): G CFG Σ is called unambiguous if every word w L(G) has exactly one syntax tree. Otherwise it is called ambiguous. Lemma 11.4 If G CFG Σ is ambiguous, then G / k N LR(k). 12 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Ambiguous Grammars Reminder (Definition 5.5): G CFG Σ is called unambiguous if every word w L(G) has exactly one syntax tree. Otherwise it is called ambiguous. Lemma 11.4 If G CFG Σ is ambiguous, then G / k N LR(k). Proof. Assume that there exist k N and G LR(k) such that G is ambiguous. 12 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Ambiguous Grammars Reminder (Definition 5.5): G CFG Σ is called unambiguous if every word w L(G) has exactly one syntax tree. Otherwise it is called ambiguous. Lemma 11.4 If G CFG Σ is ambiguous, then G / k N LR(k). Proof. Assume that there exist k N and G LR(k) such that G is ambiguous. Hence there exists w L(G) with different right derivations. Let αav be the last common sentence of the two derivations (i.e., β γ): S r αav { r αβv r w r αγv r w 12 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Ambiguous Grammars Reminder (Definition 5.5): G CFG Σ is called unambiguous if every word w L(G) has exactly one syntax tree. Otherwise it is called ambiguous. Lemma 11.4 If G CFG Σ is ambiguous, then G / k N LR(k). Proof. Assume that there exist k N and G LR(k) such that G is ambiguous. Hence there exists w L(G) with different right derivations. Let αav be the last common sentence of the two derivations (i.e., β γ): S r αav { r αβv r w r αγv r w But since first k (v) = first k (v) for every v Σ, Definition 9.3 yields β = γ. 12 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Ambiguous Grammars Reminder (Definition 5.5): G CFG Σ is called unambiguous if every word w L(G) has exactly one syntax tree. Otherwise it is called ambiguous. Lemma 11.4 If G CFG Σ is ambiguous, then G / k N LR(k). Proof. Assume that there exist k N and G LR(k) such that G is ambiguous. Hence there exists w L(G) with different right derivations. Let αav be the last common sentence of the two derivations (i.e., β γ): S r αav { r αβv r w r αγv r w But since first k (v) = first k (v) for every v Σ, Definition 9.3 yields β = γ. However ambiguity is a natural specification method which generally avoids involved syntactic constructs. 12 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars I Example 11.5 (Simple arithmetic expressions) G : E E (0) E E+E E*E a (1, 2, 3) 13 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars I Example 11.5 (Simple arithmetic expressions) G : E E (0) E E+E E*E a (1, 2, 3) Precedence: * > + Associativity: left (thus: a+a*a+a := (a+(a*a))+a) 13 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars I Example 11.5 (Simple arithmetic expressions) G : E E (0) E E+E E*E a (1, 2, 3) Precedence: * > + Associativity: left (thus: a+a*a+a := (a+(a*a))+a) LR(0)(G): I 0 := LR(0)(ε) : [E E] [E E+E] [E E*E] [E a] I 1 := LR(0)(E) : [E E ] [E E +E] [E E *E] I 2 := LR(0)(a) : [E a ] I 3 := LR(0)(E+) : [E E+ E] [E E+E] [E E*E] [E a] I 4 := LR(0)(E*) : [E E* E] [E E+E] [E E*E] [E a] I 5 := LR(0)(E+E) : [E E+E ] [E E +E] [E E *E] I 6 := LR(0)(E*E) : [E E*E ] [E E +E] [E E *E] 13 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars I Example 11.5 (Simple arithmetic expressions) G : E E (0) E E+E E*E a (1, 2, 3) Precedence: * > + Associativity: left (thus: a+a*a+a := (a+(a*a))+a) LR(0)(G): I 0 := LR(0)(ε) : [E E] [E E+E] [E E*E] [E a] I 1 := LR(0)(E) : [E E ] [E E +E] [E E *E] I 2 := LR(0)(a) : [E a ] I 3 := LR(0)(E+) : [E E+ E] [E E+E] [E E*E] [E a] I 4 := LR(0)(E*) : [E E* E] [E E+E] [E E*E] [E a] I 5 := LR(0)(E+E) : [E E+E ] [E E +E] [E E *E] I 6 := LR(0)(E*E) : [E E*E ] [E E +E] [E E *E] Conflicts: I 1 : SLR(1)-solvable (reduce on ε, shift on +/*) 13 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars I Example 11.5 (Simple arithmetic expressions) G : E E (0) E E+E E*E a (1, 2, 3) Precedence: * > + Associativity: left (thus: a+a*a+a := (a+(a*a))+a) LR(0)(G): I 0 := LR(0)(ε) : [E E] [E E+E] [E E*E] [E a] I 1 := LR(0)(E) : [E E ] [E E +E] [E E *E] I 2 := LR(0)(a) : [E a ] I 3 := LR(0)(E+) : [E E+ E] [E E+E] [E E*E] [E a] I 4 := LR(0)(E*) : [E E* E] [E E+E] [E E*E] [E a] I 5 := LR(0)(E+E) : [E E+E ] [E E +E] [E E *E] I 6 := LR(0)(E*E) : [E E*E ] [E E +E] [E E *E] Conflicts: I 1 : SLR(1)-solvable (reduce on ε, shift on +/*) I 5, I 6 : not SLR(1)-solvable (+, * fo(e)) 13 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars I Example 11.5 (Simple arithmetic expressions) G : E E (0) E E+E E*E a (1, 2, 3) Precedence: * > + Associativity: left (thus: a+a*a+a := (a+(a*a))+a) LR(0)(G): I 0 := LR(0)(ε) : [E E] [E E+E] [E E*E] [E a] I 1 := LR(0)(E) : [E E ] [E E +E] [E E *E] I 2 := LR(0)(a) : [E a ] I 3 := LR(0)(E+) : [E E+ E] [E E+E] [E E*E] [E a] I 4 := LR(0)(E*) : [E E* E] [E E+E] [E E*E] [E a] I 5 := LR(0)(E+E) : [E E+E ] [E E +E] [E E *E] I 6 := LR(0)(E*E) : [E E*E ] [E E +E] [E E *E] Conflicts: I 1 : SLR(1)-solvable (reduce on ε, shift on +/*) I 5, I 6 : not SLR(1)-solvable (+, * fo(e)) Solution: I 5 : * > + = act(i 5, *) := shift, + left assoc. = act(i 5, +) := red 1 13 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars I Example 11.5 (Simple arithmetic expressions) G : E E (0) E E+E E*E a (1, 2, 3) Precedence: * > + Associativity: left (thus: a+a*a+a := (a+(a*a))+a) LR(0)(G): I 0 := LR(0)(ε) : [E E] [E E+E] [E E*E] [E a] I 1 := LR(0)(E) : [E E ] [E E +E] [E E *E] I 2 := LR(0)(a) : [E a ] I 3 := LR(0)(E+) : [E E+ E] [E E+E] [E E*E] [E a] I 4 := LR(0)(E*) : [E E* E] [E E+E] [E E*E] [E a] I 5 := LR(0)(E+E) : [E E+E ] [E E +E] [E E *E] I 6 := LR(0)(E*E) : [E E*E ] [E E +E] [E E *E] Conflicts: I 1 : SLR(1)-solvable (reduce on ε, shift on +/*) I 5, I 6 : not SLR(1)-solvable (+, * fo(e)) Solution: I 5 : * > + = act(i 5, *) := shift, + left assoc. = act(i 5, +) := red 1 I 6 : * > + = act(i 6, +) := red 2, * left assoc. = act(i 6, *) := red 2 13 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars II Example 11.6 ( Dangling else ) G : S S S ises is a 14 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars II Example 11.6 ( Dangling else ) G : S S S ises is a Ambiguity: iiaea := (1) i(iaea) (common) or (2) i(ia)ea 14 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars II Example 11.6 ( Dangling else ) G : S S S ises is a Ambiguity: iiaea := (1) i(iaea) (common) or (2) i(ia)ea LR(0)(G): I 0 := LR(0)(ε) : [S S] [S ises] [S is] [S a] I 1 := LR(0)(S) : [S S ] I 2 := LR(0)(i) : [S i SeS] [S i S] [S ises] [S is] [S a] I 3 := LR(0)(a) : [S a ] I 4 := LR(0)(iS) : [S is es] [S is ] I 5 := LR(0)(iSe) : [S ise S] [S ises] [S is] [S a] I 6 := LR(0)(iSeS) : [S ises ] 14 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars II Example 11.6 ( Dangling else ) G : S S S ises is a Ambiguity: iiaea := (1) i(iaea) (common) or (2) i(ia)ea LR(0)(G): I 0 := LR(0)(ε) : [S S] [S ises] [S is] [S a] I 1 := LR(0)(S) : [S S ] I 2 := LR(0)(i) : [S i SeS] [S i S] [S ises] [S is] [S a] I 3 := LR(0)(a) : [S a ] I 4 := LR(0)(iS) : [S is es] [S is ] I 5 := LR(0)(iSe) : [S ise S] [S ises] [S is] [S a] I 6 := LR(0)(iSeS) : [S ises ] Conflict in I 4 : e fo(s) = not SLR(1)-solvable 14 of 37 Compiler Construction

Bottom-Up Parsing of Ambiguous Grammars Bottom-Up Parsing of Ambiguous Grammars II Example 11.6 ( Dangling else ) G : S S S ises is a Ambiguity: iiaea := (1) i(iaea) (common) or (2) i(ia)ea LR(0)(G): I 0 := LR(0)(ε) : [S S] [S ises] [S is] [S a] I 1 := LR(0)(S) : [S S ] I 2 := LR(0)(i) : [S i SeS] [S i S] [S ises] [S is] [S a] I 3 := LR(0)(a) : [S a ] I 4 := LR(0)(iS) : [S is es] [S is ] I 5 := LR(0)(iSe) : [S ise S] [S ises] [S is] [S a] I 6 := LR(0)(iSeS) : [S ises ] Conflict in I 4 : e fo(s) = not SLR(1)-solvable Solution (1): act(i 4, e) := shift 14 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Outline of Lecture 11 Recap: LR(1) Parsing LALR(1) Parsing Bottom-Up Parsing of Ambiguous Grammars Expressiveness of LL and LR Grammars Generating Top-Down Parsers Using ANTLR Generating Bottom-Up Parsers Using yacc and bison 15 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Overview of Grammar Classes LL(1) G AE (Ex. 7.4) LL(0) LR(0) G (Ex. 9.11) SLR(1) G AE (Ex. 10.9) Moreover: LL(k) LL(k + 1) for every k N LR(k) LR(k + 1) for every k N LL(k) LR(k) for every k N LALR(1) G LR LR(1) G LR (Ex. 10.11) 16 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Overview of Language Classes (cf. O. Mayer: Syntaxanalyse, BI-Verlag, 1978, p. 409ff) L(LL(1)) REG L(LL(0)) L(LR(0)) L(SLR(1)) = L(LALR(1)) = L(LR(1)) = det. CFL unambiguous CFL CFL Moreover: L(LL(k)) L(LL(k + 1)) L(LR(1)) for every k N L(LR(k)) = L(LR(1)) for every k 1 17 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Prefix-Free Languages Definition 11.7 A language L Σ is called prefix-free if L Σ + L =, i.e., if no proper prefix of an element of L is again in L. 18 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LR(0))? Lemma 11.8 G LR(0) = L(G) prefix-free 19 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LR(0))? Lemma 11.8 G LR(0) = L(G) prefix-free Proof. Assumption: G LR(0) and L(G) not prefix-free. Then there ex. v Σ and w Σ + such that v, vw L(G). 19 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LR(0))? Lemma 11.8 G LR(0) = L(G) prefix-free Proof. Assumption: G LR(0) and L(G) not prefix-free. Then there ex. v Σ and w Σ + such that v, vw L(G). Thus, the LR(0) parsing automaton for G (Def. 10.2) has an accepting run on v: (v, I 0, ε) (ε, I 0 I, z) (ε, ε, z 0) where act(i) = accept 19 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LR(0))? Lemma 11.8 G LR(0) = L(G) prefix-free Proof. Assumption: G LR(0) and L(G) not prefix-free. Then there ex. v Σ and w Σ + such that v, vw L(G). Thus, the LR(0) parsing automaton for G (Def. 10.2) has an accepting run on v: (v, I 0, ε) (ε, I 0 I, z) (ε, ε, z 0)... and an accepting run on vw (deterministic!): (vw, I 0, ε) (w, I 0 I, z) + (ε, I 0 I, zz ) (ε, ε, zz 0) which contradicts act(i) = accept. where act(i) = accept where act(i ) = accept 19 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LR(0))? Lemma 11.8 G LR(0) = L(G) prefix-free Proof. Assumption: G LR(0) and L(G) not prefix-free. Then there ex. v Σ and w Σ + such that v, vw L(G). Thus, the LR(0) parsing automaton for G (Def. 10.2) has an accepting run on v: (v, I 0, ε) (ε, I 0 I, z) (ε, ε, z 0)... and an accepting run on vw (deterministic!): (vw, I 0, ε) (w, I 0 I, z) + (ε, I 0 I, zz ) (ε, ε, zz 0) which contradicts act(i) = accept. Corollary 11.9 {a, aa} REG \ L(LR(0)) where act(i) = accept where act(i ) = accept 19 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LR(0))? Lemma 11.8 G LR(0) = L(G) prefix-free Proof. Assumption: G LR(0) and L(G) not prefix-free. Then there ex. v Σ and w Σ + such that v, vw L(G). Thus, the LR(0) parsing automaton for G (Def. 10.2) has an accepting run on v: (v, I 0, ε) (ε, I 0 I, z) (ε, ε, z 0)... and an accepting run on vw (deterministic!): (vw, I 0, ε) (w, I 0 I, z) + (ε, I 0 I, zz ) (ε, ε, zz 0) which contradicts act(i) = accept. Corollary 11.9 {a, aa} REG \ L(LR(0)) where act(i) = accept Conjecture: L REG \ L(LR(0)) = L(G) not prefix-free? where act(i ) = accept 19 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LL(1))? Lemma 11.10 (cf. Lecture 2) Every L REG can be recognized by a DFA. 20 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LL(1))? Lemma 11.10 (cf. Lecture 2) Every L REG can be recognized by a DFA. Lemma 11.11 Every DFA can be transformed into an equivalent LL(1) grammar. 20 of 37 Compiler Construction

Expressiveness of LL and LR Grammars Why REG L(LL(1))? Lemma 11.10 (cf. Lecture 2) Every L REG can be recognized by a DFA. Lemma 11.11 Every DFA can be transformed into an equivalent LL(1) grammar. Proof. see Exercise 4.2 20 of 37 Compiler Construction

Generating Top-Down Parsers Using ANTLR Outline of Lecture 11 Recap: LR(1) Parsing LALR(1) Parsing Bottom-Up Parsing of Ambiguous Grammars Expressiveness of LL and LR Grammars Generating Top-Down Parsers Using ANTLR Generating Bottom-Up Parsers Using yacc and bison 21 of 37 Compiler Construction

Generating Top-Down Parsers Using ANTLR Overview of ANTLR ANother Tool for Language Recognition Input: language description using EBNF grammars Output: recogniser for the language Supports recognisers for three kinds of input: character streams (generation of scanner) token streams (generation of parser) node streams (generation of tree walker) Current version: ANTLR 4.7.1 generates LL( ) recognisers: flexible choice of lookahead length applies longest match principle supports ambiguous grammars by using first match principle for rules supports direct left recursion targets Java, C++, C#, Python, JavaScript, Go, Swift Details: http://www.antlr.org/ T. Parr: The Definitive ANTLR 4 Reference, Pragmatic Bookshelf, 2013 3rd Programming Exercise 22 of 37 Compiler Construction

Generating Top-Down Parsers Using ANTLR Example: Infix Postfix Translator (Simple.g4) grammar Simple; // productions for syntax analysis program returns [String s]: e=expr EOF {$s = $e.s;}; expr returns [String s]: t=term r=rest {$s = $t.s + $r.s;}; rest returns [String s]: PLUS t=term r=rest {$s = $t.s + "+" + $r.s;} MINUS t=term r=rest {$s = $t.s + "-" + $r.s;} /* empty */ {$s = "";}; term returns [String s]: DIGIT {$s = $DIGIT.text;}; // productions for lexical analysis PLUS : + ; MINUS : - ; DIGIT : [0-9]; 23 of 37 Compiler Construction

Generating Top-Down Parsers Using ANTLR Java Code for Using Translator import java.io.*; import org.antlr.v4.runtime.*; public class SimpleMain { public static void main(final String[] args) throws IOException { String printsource = null, printsymtab = null, printir = null, printasm = null; SimpleLexer lexer = /* Create instance of lexer */ new SimpleLexer(new ANTLRInputStream(args[0])); SimpleParser parser = /* Create instance of parser */ new SimpleParser(new CommonTokenStream(lexer)); String postfix = parser.program().s; /* Run translator */ System.out.println(postfix); } } 24 of 37 Compiler Construction

Generating Top-Down Parsers Using ANTLR An Example Run 1. After installation, invoke ANTLR: $ java -jar /usr/local/lib/antlr-4.7.1-complete.jar Simple.g4 (will generate SimpleLexer.java, SimpleParser.java, Simple.tokens, and SimpleLexer.tokens) 2. Use Java compiler: $ javac -cp /usr/local/lib/antlr-4.7.1-complete.jar Simple*.java 3. Run translator: $ java -cp.:/usr/local/lib/antlr-4.7.1-complete.jar SimpleMain 9-5+2 95-2+ 25 of 37 Compiler Construction

Generating Top-Down Parsers Using ANTLR Advantages of ANTLR Advantages of ANTLR Generated (Java) code is similar to hand-written code possible (and easy) to read and debug generated code Syntax for specifying scanners, parsers and tree walkers is identical Support for many target programming languages ANTLR is well supported and has an active user community 26 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Outline of Lecture 11 Recap: LR(1) Parsing LALR(1) Parsing Bottom-Up Parsing of Ambiguous Grammars Expressiveness of LL and LR Grammars Generating Top-Down Parsers Using ANTLR Generating Bottom-Up Parsers Using yacc and bison 27 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison The yacc and bison Tools Usage of yacc ( yet another compiler compiler ): yacc [f]lex spec.y y.tab.c lex.yy.c spec.l yacc specification Parser source Scanner source [f]lex specification cc a.out Executable LALR(1) parser 28 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison The yacc and bison Tools Usage of yacc ( yet another compiler compiler ): yacc [f]lex spec.y y.tab.c lex.yy.c spec.l yacc specification Parser source Scanner source [f]lex specification cc a.out Executable LALR(1) parser Like for [f]lex, a yacc specification is of the form Declarations (optional) %% Rules %% Auxiliary procedures (optional) 28 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison The yacc and bison Tools Usage of yacc ( yet another compiler compiler ): yacc [f]lex spec.y y.tab.c lex.yy.c spec.l yacc specification Parser source Scanner source [f]lex specification cc a.out Executable LALR(1) parser Like for [f]lex, a yacc specification is of the form Declarations (optional) %% Rules %% Auxiliary procedures (optional) bison: upward-compatible GNU implementation of yacc (more flexible w.r.t. file names,...) 28 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison yacc Specifications Declarations: Token definitions: %token Tokens Not every token needs to be declared ( +, =,...) Start symbol: %start Symbol (optional) C code for declarations etc.: %{ Code %} 29 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison yacc Specifications Declarations: Token definitions: %token Tokens Not every token needs to be declared ( +, =,...) Start symbol: %start Symbol (optional) C code for declarations etc.: %{ Code %} Rules: context-free productions and semantic actions A α 1 α 2... α n represented as A : α 1 {Action 1 } α 2. {Action 2 } α n {Action n }; Semantic actions = C statements for computing attribute values $$ = attribute value of A $i = attribute value of i-th symbol on right-hand side Default action: $$ = $1 29 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison yacc Specifications Declarations: Token definitions: %token Tokens Not every token needs to be declared ( +, =,...) Start symbol: %start Symbol (optional) C code for declarations etc.: %{ Code %} Rules: context-free productions and semantic actions A α 1 α 2... α n represented as A : α 1 {Action 1 } α 2. {Action 2 } α n {Action n }; Semantic actions = C statements for computing attribute values $$ = attribute value of A $i = attribute value of i-th symbol on right-hand side Default action: $$ = $1 Auxiliary procedures: scanner (if not generated by [f]lex), error routines,... 29 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Example: Simple Desk Calculator I %{/* SLR(1) grammar for arithmetic expressions (Example 10.5) */ #include <stdio.h> #include <ctype.h> %} %token DIGIT %% line : expr \n { printf("%d\n", $1); }; expr : expr + term { $$ = $1 + $3; } term { $$ = $1; }; term : term * factor { $$ = $1 * $3; } factor { $$ = $1; }; factor : ( expr ) { $$ = $2; } DIGIT { $$ = $1; }; %% yylex() { int c; c = getchar(); if (isdigit(c)) yylval = c - 0 ; return DIGIT; return c; } 30 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Example: Simple Desk Calculator II $ yacc calc.y $ cc y.tab.c -ly $ a.out 2+3 5 $ a.out 2+3*5 17 31 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison An Ambiguous Grammar I %{/* Ambiguous grammar for arithmetic expressions (Example 11.5) */ #include <stdio.h> #include <ctype.h> %} %token DIGIT %% line : expr \n { printf("%d\n", $1); }; expr : expr + expr { $$ = $1 + $3; } expr * expr { $$ = $1 * $3; } DIGIT { $$ = $1; }; %% yylex() { int c; c = getchar(); if (isdigit(c)) {yylval = c - 0 ; return DIGIT;} return c; } 32 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison An Ambiguous Grammar II Invoking yacc with the option -v produces a report y.output: State 8 2 expr: expr. + expr 2 expr + expr. 3 expr. * expr + shift and goto state 6 * shift and goto state 7 + [reduce with rule 2 (expr)] * [reduce with rule 2 (expr)] State 9 2 expr: expr. + expr 3 expr. * expr 3 expr * expr. + shift and goto state 6 * shift and goto state 7 + [reduce with rule 3 (expr)] * [reduce with rule 3 (expr)] 33 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Conflict Handling in yacc Default conflict resolving strategy in yacc: reduce/reduce: choose first conflicting production in specification 34 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Conflict Handling in yacc Default conflict resolving strategy in yacc: reduce/reduce: choose first conflicting production in specification shift/reduce: prefer shift resolves dangling-else ambiguity (Example 11.6) correctly also adequate for strong following weak operator (* after +; Example 11.5) and for right-associative operators not appropriate for weak following strong operator and for left-associative operators ( = reduce; see Example 11.5) 34 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Conflict Handling in yacc Default conflict resolving strategy in yacc: reduce/reduce: choose first conflicting production in specification shift/reduce: prefer shift resolves dangling-else ambiguity (Example 11.6) correctly also adequate for strong following weak operator (* after +; Example 11.5) and for right-associative operators not appropriate for weak following strong operator and for left-associative operators ( = reduce; see Example 11.5) For ambiguous grammar: $ yacc ambig.y conflicts: 4 shift/reduce $ cc y.tab.c -ly $ a.out 2+3*5 17 $ a.out 2*3+5 16 34 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Precedences and Associativities in yacc I General mechanism for resolving conflicts: %[left right] Operators 1. %[left right] Operators n operators in one line have given associativity and same precedence precedence increases over lines 35 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Precedences and Associativities in yacc I General mechanism for resolving conflicts: %[left right] Operators 1. %[left right] Operators n operators in one line have given associativity and same precedence precedence increases over lines Example 11.12 %left + - %left * / %right ^ ^ (right associative) binds stronger than * and / (left associative), which in turn bind stronger than + and - (left associative) 35 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Precedences and Associativities in yacc II %{/* Ambiguous grammar for arithmetic expressions with precedences and associativities */ #include <stdio.h> #include <ctype.h> %} %token DIGIT %left + %left * %% line : expr \n { printf("%d\n", $1); }; expr : expr + expr { $$ = $1 + $3; } expr * expr { $$ = $1 * $3; } DIGIT { $$ = $1; }; %% yylex() { int c; c = getchar(); if (isdigit(c)) {yylval = c - 0 ; return DIGIT;} return c; } 36 of 37 Compiler Construction

Generating Bottom-Up Parsers Using yacc and bison Precedences and Associativities in yacc III $ yacc ambig-prio.y $ cc y.tab.c -ly $ a.out 2*3+5 11 $ a.out 2+3*5 17 37 of 37 Compiler Construction