CA Compiler Construction

Similar documents
Types of parsing. CMSC 430 Lecture 4, Page 1

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Syntactic Analysis. Top-Down Parsing

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Table-Driven Parsing

CSCI312 Principles of Programming Languages

Parsing Part II (Top-down parsing, left-recursion removal)

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Parsing III. (Top-down parsing: recursive descent & LL(1) )

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)


CS502: Compilers & Programming Systems

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

CS 406/534 Compiler Construction Parsing Part I

Computer Science 160 Translation of Programming Languages

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Syntax Analysis, III Comp 412

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top down vs. bottom up parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax Analysis, III Comp 412

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Topdown parsing with backtracking

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Context-free grammars

Concepts Introduced in Chapter 4

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Introduction to Parsing. Comp 412

Compiler Construction: Parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Syntactic Analysis. Top-Down Parsing. Parsing Techniques. Top-Down Parsing. Remember the Expression Grammar? Example. Example

Compilers. Predictive Parsing. Alex Aiken

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

CS 314 Principles of Programming Languages

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS 321 Programming Languages and Compilers. VI. Parsing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Monday, September 13, Parsers

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Chapter 3. Parsing #1

Introduction to parsers

Compiler Construction 2016/2017 Syntax Analysis

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

The procedure attempts to "match" the right hand side of some production for a nonterminal.

Wednesday, August 31, Parsers

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsing II Top-down parsing. Comp 412

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468


Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Abstract Syntax Trees & Top-Down Parsing

Compila(on (Semester A, 2013/14)

A programming language requires two major definitions A simple one pass compiler

LL parsing Nullable, FIRST, and FOLLOW

Configuration Sets for CSX- Lite. Parser Action Table

Part 3. Syntax analysis. Syntax analysis 96

VIVA QUESTIONS WITH ANSWERS

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CS143 Handout 08 Summer 2009 June 30, 2009 Top-Down Parsing

Lexical and Syntax Analysis. Top-Down Parsing

CMSC 330: Organization of Programming Languages

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Outline. 1 Introduction. 2 Context-free Grammars and Languages. 3 Top-down Deterministic Parsing. 4 Bottom-up Deterministic Parsing

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Chapter 4: LR Parsing

Lexical and Syntax Analysis

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

More Bottom-Up Parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Lecture 8: Deterministic Bottom-Up Parsing

COMP3131/9102: Programming Languages and Compilers

Bottom-Up Parsing. Lecture 11-12

Transcription:

CA4003 - Compiler Construction David Sinclair A top-down parser starts with the root of the parse tree, labelled with the goal symbol of the grammar, and repeats the following steps until the fringe of the parse tree matches the input string. 1. At a node labelled A, select a production A α and construct the appropriate child for each symbol of α. 2. When a terminal is added to the fringe that doesnt match the input string, backtrack. 3. Find the next node to be expanded (must have a label in V n ) The key is selecting the right production in step 1

Example Recall the simple expressions grammar: 1 <goal> ::= <expr> 2 <expr> ::= <expr> + <term> 3 <expr> - <term> 4 <term> 5 <term> ::= <term> * <factor> 6 <term> / <factor> 7 <factor> 8 <factor> ::= num 9 id Consider the input string x - 2 * y Example [2] Prod n Sentential form Input <goal> x 2 y 1 <expr> x 2 y 2 <expr> + <term> x 2 y 4 <term> + <term> x 2 y 7 <factor> + <term> x 2 y 9 id + <term> x 2 y id + <term> x 2 y <expr> x 2 y 3 <expr> - <term> x 2 y 4 <term> - <term> x 2 y 7 <factor> - <term> x 2 y 9 id - <term> x 2 y id - <term> x 2 y

Example [3] Prod n Sentential form Input id - <term> x 2 y 7 id - <factor> x 2 y 8 id - num x 2 y id - num x 2 y id - <term> x 2 y 5 id - <term> * <factor> x 2 y 7 id - <factor> * <factor> x 2 y 8 id - num * <factor> x 2 y id - num * <factor> x 2 y id - num * <factor> x 2 y 9 id - num * id x 2 y id - num * id x 2 y To avoid backtracking the parse should be guided by the input string. Example [4] Another possible parse for x - 2 * y Prod n Sentential form Input <goal> x 2 y 1 <expr> x 2 y 2 <expr> + <term> x 2 y 2 <expr> + <term> + <term> x 2 y 2 <expr> + <term> +... x 2 y 2 <expr> + <term> +... x 2 y 2... x 2 y If the parser makes wrong choices, expansion doesn t terminate. This must be avoided!

Left-recursion Top-down parsers cannot handle left-recursion in a grammar. Formally, a grammar is left-recursive if A V n such that A + Aα for some string α. Our simple expression grammar is left-recursive. Eliminating Left-recursion To remove left-recursion, we can transform the grammar. Consider the grammar fragment: A ::= Aα β where α and β do not start with A. We can rewrite this as: where A is a new non-terminal A ::= βa A ::= αa ǫ This fragment contains no left-recursion.

Lookahead Picking the right production rule reduces the amount of backtracking. Picking the wrong production rule may result in the derivation not terminating. By looking ahead into the input string we can use the information on the upcoming tokens to select the right production rule. In general we need arbitrary lookahead to parse a CFG but there is a large number of subclasses of CFGs, including most programming language constructs, that can be parsed with limited lookahead. Among the interesting subclasses are: LL(1): Left to right scan, Left-most derivation, 1- token lookahead LR(1): Left to right scan, Right-most derivation, 1-token lookahead Predictive Parsing For any two productions A α β, we would like a distinct way of choosing the correct production to expand. For some RHS α G, define FIRST(α) as the set of tokens that appear first in some string derived from α That is, for some w V t, w FIRST(α) iff. α wγ. Key property: Whenever two productions A α and A β both appear in the grammar, we would like FIRST(α) FIRST(β) = This would allow the parser to make a correct choice with a lookahead of only one symbol! The example expression grammar has this property!

Left-factoring What if a grammar does not have this key property? Sometimes, we can transform a grammar to have this property. For each non-terminal A find the longest prefix α common to two or more of its alternatives. if α ǫ then replace all of the A productions A αβ 1 αβ 2... αβ n with A αa A β 1 β 2... β n where A is a new non-terminal. Repeat until no two alternatives for a single non-terminal have a common prefix. Left-factoring Example The following is a right-recursive version of the expression grammar that is right-associative: 1 <goal> ::= <expr> 2 <expr> ::= <term> + <expr> 3 <term> - <expr> 4 <term> 5 <term> ::= <factor> * <term> 6 <factor> / <term> 7 <factor> 8 <factor> ::= num 9 id To choose between productions 2, 3, & 4, the parser must look beyond the num or id and look at the next token ( +, -, * or /). FIRST(2) FIRST(3) FIRST(4)

Left-factoring Example [2] There are two nonterminals, <expr> and <term> that must be left factored: <expr> ::= <term> + <expr> <term> ::= <factor> * <term> <term> - <expr> <factor> / <term> <term> <factor> Applying the transformation gives us: <expr> ::= <term> <expr > <term> ::= <factor> <term > <expr > ::= + <expr> <term > ::= * <term> - <expr> / <term> ǫ ǫ Left-factoring Example [3] Substituting back into the grammar yields: 1 <goal> ::= <expr> 2 <expr> ::= <term> <expr > 3 <expr > ::= + <expr> 4 - <expr> 5 ǫ 6 <term> ::= <factor> <term > 7 <term > ::= * <term> 8 / <term> 9 ǫ 10 <factor> ::= num 11 id Now, selection requires only a single token lookahead, but it is still right-associative.

Eliminating Indirect Left-recursion Left-recursion can be hidden by indirect references. Consider the following rules: A 0 A 1 α 1... A 1 A 2 α 2...... A n A 0 α n+1... This could result in the derivation: A 0 A 1 α 1 A 2 α 2 α 1...A 0 αn+1α n...α 1 α 0 which is left-recursive. If the grammar has no cycles, i.e there is no derivation of the form B B, for any non-terminal B, and there are no ǫ-productions, then we can apply the following algorithm to eliminate all left recursion. Eliminating Indirect Left-recursion (2) Arrange all the non-terminals into some arbitrary order, say A 1,A 2,...,A n. For each non-terminal A i in turn, do: For each A i such that 1 j < i and there is a production rule of the form A i A j α, where the A j productions are A j β 1... β n : Replace the production rule A i A j α with the rule A i β 1 α... β n α. Eliminate any immediate left recursion among the A i production rules.

Left-recursion elimination after Left-factoring Given a left-factored CFG, to eliminate left-recursion: if A Aα then replace all of the A productions A Aα β... γ with A NA N β... γ A αa ǫ where N and A are new productions. Repeat until there are no left-recursive productions. Table-Driven Parsing stack source code scanner tokens table-driven parser IR parsing tables

Table-Driven Parsing [2] The generation of the tables can be automated. stack source code scanner tokens table-driven parser IR grammar parser generator parsing tables Table-Driven Parsing [3] Input: a string w and a parsing table M for G. tos = 0 Stack[tos] = EOF Stack[++tos] = Start Symbol token = next token() repeat X = Stack[tos] if X is a terminal or EOF then if X == token then pop X token = next token() else error() else /* X is a non-terminal */ if M[X,token] == X Y 1 Y 2...Y k then pop X push Y k,y k 1,...,Y 1 else error() until X == EOF

Table-Driven Parsing [4] For the expression grammar: 1 <goal> ::= <expr> 2 <expr> ::= <term> <expr > 3 <expr > ::= + <expr> 4 - <expr> 5 ǫ Its parse table: 6 <term> ::= <factor> <term > 7 <term > ::= * <term> 8 / <term> 9 ǫ 10 <factor> ::= num 11 id id num + - * / $ <goal> 1 1 <expr> 2 2 <expr > 3 4 5 <term> 6 6 <term > 9 9 7 8 9 <factor> 11 10 How do we generate this table? FIRST For a string of grammar symbols α, we define FIRST(α) as: the set of terminal symbols that begin strings derived from α: {a V t α aβ} If α ǫ then ǫ FIRST(α) Consider the following grammar: Z d Z XYZ Y Y c X Y X a Then FIRST(XYZ) = {a, c, d} Symbols that can derive the empty string,ǫ, are called nullable and we must keep track of what can follow a nullable symbol.

FIRST [2] To compute FIRST(X) for all grammar symbol X, apply the following rules until no more terminals or ǫ can be added to any FIRST set: 1. If X is a terminal, then FIRST(X) = {X} 2. If X is a nonterminal and X Y 1 Y 2...Y k is a production rule for some k 1, then add a to FIRST(X) if for some i, a FIRST(Y i ) and Y 1...Y i 1 are nullable. If all Y 1...Y k are nullable, add ǫ to FIRST(X). 3. If X ǫ is a production rule, then add ǫ to FIRST(X). FOLLOW FOLLOW(X) is the set of terminals that can immediately follow X. So t FOLLOW(X) if there is any derivation containing Xt. This would include any derivation containing XYZt where Y and Z are nullable. To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set. 1. Place $ in FOLLOW(S) where S is the start symbol and $ in the end of input marker. 2. If there is a production rule A αbβ, then everything in FIRST(β), except ǫ, is in FOLLOW(B). 3. If there is a production rule A αb, or a production rule A αbβ where FIRST(β) contains ǫ, then everything in FOLLOW(A) is in FOLLOW(B).

Example Consider the following grammar where E is the start symbol. E TE E +TE E T FT FIRST(F) = { (, id} FIRST(T) = FIRST(F) = { (, id} FIRST(E) = FIRST(T) = { (, id} FIRST(E ) = { +, ǫ} FIRST(T ) = { *, ǫ} T FT T F (E) F id FOLLOW(E) = { ), $} FOLLOW(E ) = FOLLOW(E) = { ), $} FOLLOW(T) = { +, ), $} FOLLOW(T ) = FOLLOW(T) ={+, ), $} FOLLOW(F) = {+, *, ), $} Example 2 We did the last example by inspection, but we could have also done it by iteratively applying the rule. Consider the following grammar (again): Z d Z XYZ Y Y c X Y X a We start with all nonterminals not nullable and initially empty FIRST and FOLLOW sets. X Y Z nullable FIRST FOLLOW no no no

Example 2 [2] In the first iteration: nullable FIRST FOLLOW X no a c, d Y yes c d Z no d In the second iteration: nullable FIRST FOLLOW X yes a, c a, c, d Y yes c a, c, d Z no a, c, d A third iteration finds no new information. LOOKAHEAD For a production rule A α, we define LOOKAHEAD(A α) as the set of terminals which can appear next in the input when recognising production rule A α. Thus, a production rule s LOOKAHEAD set specifies the tokens which should appear next in the input before the production rule is applied. To build LOOKAHEAD(A α): 1. Put FIRST(α) - {ǫ} in LOOKAHEAD(A α). 2. If ǫ FIRST(α) then put FOLLOW(A) in LOOKAHEAD(A α). A grammar G is LL(1) iff for each set of productions A α 1 α 2... α n : LOOKAHEAD(A α 1 ),LOOKAHEAD(A α 2 ),..., LOOKAHEAD(A α n ) are all pairwise disjoint.

Example 3 S E T FT E TE T T /T ǫ E +E E ǫ F id num FIRST FOLLOW S {num, id} {$} E {num, id} {$} E {ǫ,+, } {$} T {num, id} {+,-,$} T {ǫ,,/} {+,-,$} F {num, id} {+,-,*,/,$} id {id} num {num} * {*} / {/} + {+} - {-} LOOKAHEAD S E {num, id} E TE {num, id} E +E {+} E E {-} E ǫ {$} T FT {num, id} T T {*} T /T {/} T ǫ {+,-,$} F id {id} F num {num} LL(1) parse table construction Input: Grammar G Output: Parsing table M Method: 1. productions A α: a LOOKAHEAD(A α), add A α to M[A,a] 2. Set each undefined entry of M to error If M[A,a] with multiple entries then grammar is not LL(1).

Example 3 - again LOOKAHEAD S E {num, id} E TE {num, id} E +E {+} E E {-} E ǫ {$} T FT {num, id} T T {*} T /T {/} T ǫ {+,-,$} F id {id} F num {num} id num + - * / $ S S E S E E E TE E TE E E +E E E E ǫ T T FT T FT T T ǫ T ǫ T T T /T T ǫ F F id F num Building the Abstract Syntax Tree Again, we insert code at the right points. tos = 0 Stack[tos] = EOF Stack[++tos] = root node Stack[++tos] = Start Symbol token = next token() repeat X = Stack[tos] if X is a terminal or EOF then if X == token then pop X token = next token() pop and fill in node else error() else /* X is a non-terminal */ if M[X,token] == X Y 1 Y 2...Y k then pop X pop node for X build node for each child and make it a child of node for X push {n k,y k,n k 1,Y k 1,...,n 1,Y 1 } else error() until X == EOF

Some facts about LL(1) grammars Provable facts about LL(1) grammars: 1. No left-recursive grammar is LL(1). 2. No ambiguous grammar is LL(1). 3. Some languages have no LL(1) grammar. 4. A ǫ free grammar where each alternative expansion for A begins with a distinct terminal is a simple LL(1) grammar. Example: S as a is not LL(1) because LOOKAHEAD(S as) = LOOKAHEAD(S a) = {a} S as S as ǫ accepts the same language and is LL(1). Key notion: Error Recovery For each non-terminal, construct a set of terminals on which the parser can synchronize. When an error occurs looking for A, scan until an element of SYNCH(A) is found. Building SYNCH: 1. a FOLLOW(A) a SYNCH(A). 2. place keywords that start statements in SYNCH(A). 3. add symbols in FIRST(A) to SYNCH(A). If we can t match a terminal on top of stack: 1. pop the terminal, 2. print a message saying the terminal was inserted, and 3. continue the parse. (i.e., SYNCH(a) = V t {a})

A grammar that is not LL(1) <stmt> ::= if <expr> then <stmt> if <expr> then <stmt> else <stmt>... Left-factored: <stmt> ::= if <expr> then <stmt> <stmt >... <stmt > ::= else <stmt> ǫ FIRST(<stmt >) = {ǫ,else} FOLLOW(<stmt >) = {else,$} LOOKAHEAD(<stmt > ::= else <stmt>) = {else} LOOKAHEAD(<stmt > ::= ǫ) = {else,$} A grammar that is not LL(1) [2] On seeing else, conflict between choosing <stmt > ::= else <stmt> and <stmt > ::= ǫ grammar is not LL(1)! The fix: Put priority on <stmt > ::= else <stmt> to associate else with closest previous then.