LL parsing Nullable, FIRST, and FOLLOW

Similar documents
EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

EDA180: Compiler Construc6on. More Top- Down Parsing Abstract Syntax Trees Görel Hedin Revised:

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CA Compiler Construction

Types of parsing. CMSC 430 Lecture 4, Page 1

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS502: Compilers & Programming Systems

Topic 3: Syntax Analysis I

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Compilers. Predictive Parsing. Alex Aiken

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Abstract Syntax Trees & Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Top down vs. bottom up parsing

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

3. Parsing. Oscar Nierstrasz

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Lexical and Syntax Analysis

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Syntactic Analysis. Top-Down Parsing


CS 314 Principles of Programming Languages

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Compiler construction in4303 lecture 3

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Outline. Top Down Parsing. SLL(1) Parsing. Where We Are 1/24/2013

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax Analysis Part I

Outline. 1 Introduction. 2 Context-free Grammars and Languages. 3 Top-down Deterministic Parsing. 4 Bottom-up Deterministic Parsing

Part 3. Syntax analysis. Syntax analysis 96

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CSCI312 Principles of Programming Languages

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

Context-free grammars

LL(k) Compiler Construction. Choice points in EBNF grammar. Left recursive grammar

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Compiler Construction: Parsing

CS 536 Midterm Exam Spring 2013

LECTURE 7. Lex and Intro to Parsing

Lecture Notes on Top-Down Predictive LL Parsing

4. Lexical and Syntax Analysis

Monday, September 13, Parsers

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

S Y N T A X A N A L Y S I S LR

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Defining syntax using CFGs

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

4. Lexical and Syntax Analysis

Context-free grammars (CFG s)

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Chapter 4. Lexical and Syntax Analysis

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Wednesday, August 31, Parsers

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

LL(k) Compiler Construction. Top-down Parsing. LL(1) parsing engine. LL engine ID, $ S 0 E 1 T 2 3

Compiler Construction 2016/2017 Syntax Analysis

Question Points Score

Chapter 3. Parsing #1

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

CSE 401 Midterm Exam Sample Solution 2/11/15

Chapter 2 Syntax Analysis

Syntax Analysis. Chapter 4

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Building Compilers with Phoenix

Compila(on (Semester A, 2013/14)

CSE431 Translation of Computer Languages

Bottom-up Parser. Jungsik Choi

Recursive Descent Parsers

Syntax Analysis. Prof. James L. Frankel Harvard University. Version of 6:43 PM 6-Feb-2018 Copyright 2018, 2015 James L. Frankel. All rights reserved.

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Lexical and Syntax Analysis

More Bottom-Up Parsing

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Transcription:

EDAN65: Compilers LL parsing Nullable, FIRST, and FOLLOW Görel Hedin Revised: 2014-09- 22

Regular expressions Context- free grammar ATribute grammar Lexical analyzer (scanner) SyntacKc analyzer (parser) SemanKc analyzer source code (text) tokens LL parsing Nullable, FIRST, FOLLOW AST (Abstract syntax tree) ATributed AST runtime system activation records objects stack garbage collection heap Intermediate code generator Interpreter OpKmizer intermediate code Virtual machine code and data intermediate code Target code generator target code machine 2

Algorithm for construckng an LL(1) parser Fairly simple. The non-trivial part: how to select the correct production p for X, based on the lookahead token. X p1: X ->... p2: X ->... Which tokens can occur in the FIRST position?... t 1... t n t n+1... FIRST FOLLOW Can one of the productions derive the empty string? I.e., is it "NULLABLE"? If so, which tokens can occur in the FOLLOW position? 3

Steps in construckng an LL(1) parser 1. Write the grammar on canonical form 2. Analyze the grammar to construct a table. The table shows what production to select, given the current lookahead token. 3. Conflicts in the table? The grammar is not LL(1). 4. No conflicts? Straight forward implementation using table-driven parser or recursive descent. t 1 t 2 t 3 t 4 X 1 p1 p2 X 2 p3 p3 p4 4

Example: Construct the LL(1) table for this grammar: p1: statement -> assignment p2: statement -> compoundstmt p3: assignment -> ID "=" expr ";" p4: compoundstmt -> "{" statements "}" p5: statements -> statement statements p6: statements -> ε statement assignment compoundstmt statements ID "=" ";" "{" "}" For each production p: X -> γ, we are interested in: FIRST(γ) the tokens that occur first in a sentence derived from γ. NULLABLE(γ) is it possible to derive ε from γ? And if so: FOLLOW(X) the tokens that can occur immediately after an X-sentence. 5

Example: Construct the LL(1) table for this grammar: p1: statement -> assignment p2: statement -> compoundstmt p3: assignment -> ID "=" expr ";" p4: compoundstmt -> "{" statements "}" p5: statements -> statement statements p6: statements -> ε ID "=" ";" "{" "}" statement p1 p2 assignment p3 compoundstmt p4 statements p5 p5 p6 To construct the table, look at each production p: X -> γ. Compute the token set FIRST(γ). Add p to each corresponding entry for X. Then, check if γ is NULLABLE. If so, compute the token set FOLLOW(X), and add p to each corresponding entry for X. 6

Example: Dealing with End of File: p1: vardecl -> type ID optinit p2: type -> "integer" p3: type -> "boolean" "=" expr ";" p4: optinit -> "=" INT p5: optinit -> ε ID integer boolean "=" ";" INT vardecl type optinit 7

Example: Dealing with End of File: p0: S -> vardecl EOF p1: vardecl -> type ID optinit p2: type -> "integer" p3: type -> "boolean" "=" expr ";" p4: optinit -> "=" INT p5: optinit -> ε S vardecl type optinit ID integer boolean "=" ";" INT EOF 8

Example: Dealing with End of File: p0: S -> vardecl EOF p1: vardecl -> type ID optinit p2: type -> "integer" p3: type -> "boolean" "=" expr ";" p4: optinit -> "=" INT p5: optinit -> ε ID integer boolean "=" ";" INT EOF S p0 p0 vardecl p1 p1 type p2 p3 optinit p4 p5 9

Example: Ambiguous grammar: p1: E -> E "+" E p2: E -> ID p3: E -> INT E "+" ID INT 10

Example: Ambiguous grammar: p1: E -> E "+" E p2: E -> ID p3: E -> INT "+" ID INT E p1, p2 p1, p3 Collision in a table entry! The grammar is not LL(1) An ambiguous grammar is not even LL(k) adding more lookahead does not help. 11

Example: Unambiguous, but le]- recursive grammar: p1: E -> E "*" F p2: E -> F p3: F -> ID p4: F -> INT E F "*" ID INT 12

Example: Unambiguous, but le]- recursive grammar: p1: E -> E "*" F p2: E -> F p3: F -> ID p4: F -> INT "*" ID INT E p1,p2 p1,p2 F p3 p4 Collision in a table entry! The grammar is not LL(1) A grammar with left-recursion is not even LL(k) adding more lookahead does not help. 13

Example: Grammar with common prefix: p1: E -> F "*" E p2: E -> F p3: F -> ID p4: F -> INT p5: F -> "(" E ")" E F "*" ID INT "(" ")" 14

Example: Grammar with common prefix: p1: E -> F "*" E p2: E -> F p3: F -> ID p4: F -> INT p5: F -> "(" E ")" "*" ID INT "(" ")" E p1,p2 p1,p2 p1,p2 F p3 p4 p5 Collision in a table entry! The grammar is not LL(1) A grammar with common prefix is not LL(1). Some grammars with common prefix are LL(k), for some k, but not this one. 15

Summary: construckng an LL(1) parser 1. Write the grammar on canonical form 2. Analyze the grammar using FIRST, NULLABLE, and FOLLOW. 3. Use the analysis to construct a table. The table shows what production to select, given the current lookahead token. 4. Conflicts in the table? The grammar is not LL(1). 5. No conflicts? Straight forward implementation using table-driven parser or recursive descent. 16

S Recall main parsing ideas A X A X B C t r u v r u r u... t r u v r u r u... LL(1): decides to build X after seeing the first token of its subtree. The tree is built top down. LR(1): decides to build X after seeing the first token following its subtree. The tree is built bottom up. For each produckon X -> γ we need to compute FIRST(γ): the tokens that can appear first in a γ derivakon NULLABLE(γ): can the empty string be derived from γ? FOLLOW(X): the tokens that can follow an X derivakon 17

Algorithm for construckng an LL(1) table initialize all entries table[x i, t j ] to the empty set. for each production p: X -> γ for each t FIRST(γ) add p to table[x, t] if NULLABLE(γ) for each t FOLLOW(X) add p to table[x, t] t 1 t 2 t 3 t 4 X 1 p1 p2 X 2 p3 p3 p4 If some entry has more than one element, then the grammar is not LL(1). 18

Exercise: what is NULLABLE(X)? Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z NULLABLE 19

SoluKon: what is NULLABLE(X) Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z NULLABLE true true false 20

DefiniKon of NULLABLE Informally: NULLABLE(γ): true if the empty string can be derived from γ Formal definition, given G=(N,T,P,S) NULLABLE(ε) == true (1) NULLABLE(t) == false (2) where t T, i.e., t is a terminal symbol NULLABLE(X) == NULLABLE(γ 1 )... NULLABLE(γ n ) (3) where X -> γ 1,... X -> γ n are all the productions for X in P NULLABLE(sγ) == NULLABLE (s) && NULLABLE (γ) (4) where s N U T, i.e., s is a nonterminal or a terminal The equations for NULLABLE are recursive. How would you write a program that computes NULLABLE(X)? Just using recursive functions could lead to nontermination! 21

Fixed- point problems Computing NULLABLE(X) is an example of a fixed-point problem. These problems have the form: x == f(x) Can we find a value x for which the equation holds (i.e., a solution)? x is then called a fixed point of the function f. Fixed-point problems can (sometimes) be solved using iteration: Guess an initial value x 0, then apply the function iteratively, until the fixed point is reached: x 1 := f(x 0 ); x 2 := f(x 1 );... x n := f(x n-1 ); until x n == x n-1 This is called a fixed-point iteration, and x n is the fixed point. 22

Implement NULLABLE by a fixed- point iterakon represent NULLABLE as a vector nlbl[ ] of boolean variables initialize all nlbl[x] to false repeat changed = false for each nonterminal X with productions X -> γ 1,..., X -> γ n do newvalue = nlbl(γ 1 )... nlbl(γ n ) if newvalue!= nlbl[x] then nlbl[x] = newvalue changed = true fi do until!changed where nlbl(γ) is computed using the current values in nlbl[ ]. The computation will terminate because - the variables are only changed monotonically (from false to true) - the number of possible changes is finite (from all false to all true) 23

Exercise: compute NULLABLE(X) nlbl[ ] Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z iter 0 iter 1 iter 2 iter 3 f f f In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = nlbl(γ 1 )... nlbl(γ n ) where nlbl(γ) is computed using the current values in nlbl[ ]. 24

SoluKon: compute NULLABLE(X) nlbl[ ] Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a iter 0 iter 1 iter 2 iter 3 X f f t t Y f t t t Z f f f f In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = nlbl(γ 1 )... nlbl(γ n ) where nlbl(γ) is computed using the current values in nlbl[ ]. 25

DefiniKon of FIRST Informally: FIRST(γ): the tokens that can occur as the first token in sentences derived from γ Formal definition, given G=(N,T,P,S) FIRST(ε) == (1) FIRST(t) == { t } (2) where t T, i.e., t is a terminal symbol FIRST(X) == FIRST(γ 1 )... FIRST(γ n ) (3) where X -> γ 1,... X -> γ n are all the productions for X in P FIRST(sγ) == FIRST(s) (if NULLABLE(s) then FIRST(γ) else fi) (4) where s N U T, i.e., s is a nonterminal or a terminal The equations for FIRST are recursive. Compute using fixed-point iteration. 26

Implement FIRST by a fixed- point iterakon represent FIRST as a vector FIRST[ ] of token sets initialize all FIRST[X] to the empty set repeat changed = false for each nonterminal X with productions X -> γ 1,..., X -> γ n do newvalue = FIRST(γ 1 )... FIRST(γ n ) if newvalue!= FIRST[X] then FIRST[X] = newvalue changed = true fi do until!changed where FIRST(γ) is computed using the current values in FIRST[ ]. The computation will terminate because - the variables are changed monotonically (using set union) - the largest possible set is finite: T, the set of all tokens - the number of possible changes is therefore finite 27

SoluKon: compute FIRST(X) Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z FIRST[ ] NULLABLE t t f X Y Z iter 0 iter 1 iter 2 iter 3 In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = FIRST(γ 1 )... FIRST(γ n ) where FIRST(γ) is computed using the current values in FIRST[ ]. 28

Exercise: compute FIRST(X) Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a X Y Z FIRST[ ] NULLABLE t t f iter 0 iter 1 iter 2 iter 3 X {a} {a, c} {a, c} Y {c} {c} {c} Z {a, c, d} {a, c, d} {a, c, d} In each iteration, compute: for each nonterminal X with productions X -> γ 1,..., X -> γ n newvalue = FIRST(γ 1 )... FIRST(γ n ) where FIRST(γ) is computed using the current values in FIRST[ ]. 29

DefiniKon of FOLLOW Informally: FOLLOW(X): the tokens that can occur as the first token following X, in any sentential form derived from the start symbol S. Formal definition, given G=(N,T,P,S) The nonterminal X occurs in the right-hand side of a number of productions. Let Y -> γ X δ denote such an occurrence, where γ and δ are arbitrary sequences of terminals and nonterminals. FOLLOW(X) == U FOLLOW(Y -> γ X δ ), (1) over all occurrences Y -> γ X δ and where FOLLOW(Y -> γ X δ ) == (2) FIRST(δ) (if NULLABLE(δ) then FOLLOW(Y) else fi) The equations for FOLLOW are recursive. Compute using fixed-point iteration. 30

Implement FOLLOW by a fixed- point iterakon represent FOLLOW as a vector FOLLOW[ ] of token sets initialize all FOLLOW[X] to the empty set repeat changed = false for each nonterminal X do newvalue == U FOLLOW(Y -> γ X δ ), for each occurrence Y -> γ X δ if newvalue!= FOLLOW[X] then FOLLOW[X] = newvalue changed = true fi do until!changed where FOLLOW(Y -> γ X δ ) is computed using the current values in FOLLOW[ ]. Again, the computation will terminate because - the variables are changed monotonically (using set union) - the largest possible set is finite: T 31

Exercise: compute FOLLOW(X) S -> Z $ Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a NULLABLE FIRST X t {a, c} Y t {c} Z f {a, c, d} FOLLOW[ ] The grammar has been extended with end of file, $. X Y Z iter 0 iter 1 iter 2 iter 3 In each iteration, compute: newvalue == U FOLLOW(Y -> γ X δ ), for each occurrence Y -> γ X δ where FOLLOW(Y -> γ X δ ) is computed using the current values in FOLLOW[ ]. 32

SoluKon: compute FOLLOW(X) S -> Z $ Z -> d Z -> X Y Z Y -> ε Y -> c X -> Y X -> a The grammar has been extended with end of file, $. NULLABLE FIRST X t {a, c} Y t {c} Z f {a, c, d} FOLLOW[ ] iter 0 iter 1 iter 2 iter 3 X {a, c, d} {a, c, d} Y {a, c, d} {a, c, d} Z {$} {$} In each iteration, compute: newvalue == U FOLLOW(Y -> γ X δ ), for each occurrence Y -> γ X δ where FOLLOW(Y -> γ X δ ) is computed using the current values in FOLLOW[ ]. 33

Summary queskons Construct an LL(1) table for a grammar. What does it mean if there is a collision in an LL(1) table? Why can it be useful to add an end- of- file rule to some grammars? How can we decide if a grammar is LL(1) or not? What is the definikon of NULLABLE, FIRST, and FOLLOW? What is a fixed- point problem? How can it be solved using iterakon? How can we know that the computakon terminates? 34