Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Similar documents
Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Compila(on (Semester A, 2013/14)

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CA Compiler Construction

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Types of parsing. CMSC 430 Lecture 4, Page 1

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

CS502: Compilers & Programming Systems

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Table-Driven Parsing

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

Monday, September 13, Parsers

Compilers. Predictive Parsing. Alex Aiken

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

LL parsing Nullable, FIRST, and FOLLOW

Compiler construction in4303 lecture 3

Compiler construction lecture 3

Part 3. Syntax analysis. Syntax analysis 96

Top down vs. bottom up parsing

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Wednesday, August 31, Parsers

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Lexical and Syntax Analysis. Top-Down Parsing

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CSCI312 Principles of Programming Languages

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Lexical and Syntax Analysis

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

LL(1) predictive parsing


Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntactic Analysis. Top-Down Parsing

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Parsing Part II (Top-down parsing, left-recursion removal)

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

CS 314 Principles of Programming Languages

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Syntax Analysis Part I

More Bottom-Up Parsing

Introduction to Parsing. Comp 412

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS 406/534 Compiler Construction Parsing Part I

Topdown parsing with backtracking

Syntax Analysis, III Comp 412

Parsing III. (Top-down parsing: recursive descent & LL(1) )

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Parsing II Top-down parsing. Comp 412

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

CMSC 330: Organization of Programming Languages

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

Syntax Analysis, III Comp 412

Extra Credit Question

CMSC 330, Fall 2009, Practice Problem 3 Solutions

Defining syntax using CFGs

Introduction to parsers

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

Chapter 3. Parsing #1

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Syntax Analyzer --- Parser

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

4. Lexical and Syntax Analysis

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CSE 401 Midterm Exam Sample Solution 2/11/15

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CS 321 Programming Languages and Compilers. VI. Parsing

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Topic 3: Syntax Analysis I

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

COP 3402 Systems Software Syntax Analysis (Parser)

Concepts Introduced in Chapter 4

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Compiler Construction 2016/2017 Syntax Analysis

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Transcription:

Fall 2014-2015 Compiler Principles Lecture 3: Parsing part 2 Roman Manevich Ben-Gurion University

Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Lowering Local Optimizations Register Allocation Top-down Parsing (LL) Dataflow Analysis Instruction Selection Bottom-up Parsing (LR) Loop Optimizations Attribute Grammars mid-term exam 2

Previously Role of syntax analysis Context-free grammars refresher Top-down (predictive) parsing Recursive descent 3

Functions for nonterminals E LIT (E OP E) not E LIT true false OP and or xor E() { } if (current {TRUE, FALSE}) else if (current == LPAREN) else if (current == NOT) else LIT(); match(lparent); E(); OP(); E(); match(rparen); match(not); E(); error; LIT() { } if (current == TRUE) else if (current == FALSE) else match(true); match(false); error; OP() { } if (current == AND) else if (current == OR) else if (current == XOR) else match(and); match(or); match(xor); error; 4

Technical challenges with recursive descent 5

Recursive descent: problem 1 term ID indexed_elem indexed_elem ID [ expr ] With lookahead 1, the function for indexed_elem will never be tried What happens for input of the form ID[expr] 6

Recursive descent: problem 2 S A a b A a int S() { return A() && match(token( a )) && match(token( b )); } int A() { return match(token( a )) 1; } What happens for input ab? What happens if you flip order of alternatives and try aab? 7

Recursive descent: problem 3 p. 127 E E - term term int E() { } return E() && match(token( - )) && term(); What happens when we execute this procedure? Recursive descent parsers cannot handle left-recursive grammars 8

Agenda Predicting productions via FIRST/FOLLOW/NULLABLE sets Handling conflicts LL(k) via pushdown automata 9

How do we predict? E LIT (E OP E) not E LIT true false OP and or xor How can we decide which production of E to take? 10

FIRST sets For a nonterminal A, FIRST(A) is the set of terminals that can start in a sentence derived from A Formally: FIRST(A) = {t A * t ω} For a sentential form α, FIRST(α) is the set of terminals that can start in a sentence derived from α Formally: FIRST(α) = {t α * t ω} 11

FIRST sets example E LIT (E OP E) not E LIT true false OP and or xor FIRST(E) =? FIRST(LIT) =? FIRST(OP) =? 12

FIRST sets example E LIT (E OP E) not E LIT true false OP and or xor FIRST(E) = FIRST(LIT) FIRST(( E OP E )) FIRST(not E) FIRST(LIT) = { true, false } FIRST(OP) = {and, or, xor} A set of recursive equations How do we solve them? 13

Computing FIRST sets Assume no null productions (A ) 1. Initially, for all nonterminals A, set FIRST(A) = { t A t ω for some ω } 2. Repeat the following until no changes occur: for each nonterminal A for each production A α 1 α k FIRST(A) = FIRST(α 1 ) FIRST(α k ) This is known as a fixed-point algorithm We will see such iterative methods later in the course and learn to reason about them 14

Exercise: compute FIRST STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT EXPR TERM 15

1. Initialization STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while EXPR zero? Not ++ -- TERM id constant 16

2. Iterate 1 STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while zero? Not ++ -- EXPR zero? Not ++ -- TERM id constant 17

2. Iterate 2 STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while zero? Not ++ -- EXPR zero? Not ++ -- id constant TERM id constant 18

2. Iterate 3 fixed-point STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while zero? Not ++ -- EXPR zero? Not ++ -- id constant TERM id constant id constant 19

Reasoning about the algorithm Assume no null productions (A ) 1. Initially, for all nonterminals A, set FIRST(A) = { t A t ω for some ω } 2. Repeat the following until no changes occur: for each nonterminal A for each production A α 1 α k FIRST(A) = FIRST(α 1 ) FIRST(α k ) Is the algorithm correct? Does it terminate? (complexity) 20

Reasoning about the algorithm Termination: Correctness: 21

LL(1) Parsing of grammars without epsilon productions 22

Using FIRST sets Assume G has no epsilon productions and for every non-terminal X and every pair of productions X and X we have that FIRST( ) FIRST( ) = {} No intersection between FIRST sets => can always pick a single rule 23

Using FIRST sets In our Boolean expressions example FIRST( LIT ) = { true, false } FIRST( ( E OP E ) ) = { ( } FIRST( not E ) = { not } If the FIRST sets intersect, may need longer lookahead LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens LL(1) is an important and useful class What if there are epsilon productions? 24

Extending LL(1) Parsing for epsilon productions 25

FIRST, FOLLOW, NULLABLE sets For each non-terminal X FIRST(X) = set of terminals that can start in a sentence derived from X FIRST(X) = {t X * t ω} NULLABLE(X) if X * FOLLOW(X) = set of terminals that can follow X in some derivation FOLLOW(X) = {t S * X t } 26

Computing the NULLABLE set Lemma: NULLABLE( 1 k ) = NULLABLE( 1 ) NULLABLE( k ) 1. Initially NULLABLE(X) = false 2. For each non-terminal X if exists a production X then NULLABLE(X) = true 3. Repeat for each production Y 1 k if NULLABLE( 1 k ) then NULLABLE(Y) = true until NULLABLE stabilizes 27

Exercise: compute NULLABLE S A a b A a B A B C C b NULLABLE(S) = NULLABLE(A) NULLABLE(a) NULLABLE(b) NULLABLE(A) = NULLABLE(a) NULLABLE( ) NULLABLE(B) = NULLABLE(A) NULLABLE(B) NULLABLE(C) NULLABLE(C) = NULLABLE(b) NULLABLE( ) 28

FIRST with epsilon productions How do we compute FIRST( 1 k ) when epsilon productions are allowed? FIRST( 1 k ) =? 29

FIRST with epsilon productions How do we compute FIRST( 1 k ) when epsilon productions are allowed? FIRST( 1 k ) = if not NULLABLE( 1 ) then FIRST( 1 ) else FIRST( 1 ) FIRST ( 2 k ) 30

Exercise: compute FIRST S A c b A a NULLABLE(S) = NULLABLE(A) NULLABLE(c) NULLABLE(b) NULLABLE(A) = NULLABLE(a) NULLABLE( ) FIRST(S) = FIRST(A) FIRST(cb) FIRST(A) = FIRST(a) FIRST ( ) FIRST(S) = FIRST(A) {c} FIRST(A) = FIRST(a) 31

FOLLOW sets if X α Y then FOLLOW(Y)? if NULLABLE( ) or = then FOLLOW(Y)? p. 189 32

FOLLOW sets if X α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y)? p. 189 33

FOLLOW sets if X α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y) FOLLOW(X) p. 189 34

FOLLOW sets p. 189 if X α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y) FOLLOW(X) Allows predicting epsilon productions: X when the lookahead token is in FOLLOW(X) S A c b A a What should we predict for input cb? What should we predict for input acb? 35

LL(k) grammars 36

Conflicts FIRST-FIRST conflict X α and X and If FIRST(α) FIRST(β) {} FIRST-FOLLOW conflict NULLABLE(X) If FIRST(X) FOLLOW(X) {} 37

LL(1) grammars A grammar is in the class LL(1) when it can be derived via: Top-down derivation Scanning the input from left to right (L) Producing the leftmost derivation (L) With lookahead of one token For every two productions A α and A β we have FIRST(α) FIRST(β) = {} and if NULLABLE(A) then FIRST(A) FOLLOW(A) = {} A language is said to be LL(1) when it has an LL(1) grammar 38

LL(k) grammars Generalizes LL(1) for k lookahead tokens Need to generalize FIRST and FOLLOW for k lookahead tokens 39

Agenda Predicting productions via FIRST/FOLLOW/NULLABLE sets Handling conflicts LL(k) via pushdown automata 40

Handling conflicts 41

Back to problem 1 term ID indexed_elem indexed_elem ID [ expr ] FIRST(term) = { ID } FIRST(indexed_elem) = { ID } FIRST-FIRST conflict 42

Solution: left factoring Rewrite the grammar to be in LL(1) term ID indexed_elem indexed_elem ID [ expr ] New grammar is more complex has epsilon production term ID after_id After_ID [ expr ] Intuition: just like factoring in algebra: x*y + x*z into x*(y+z) 43

Exercise: apply left factoring S if E then S else S if E then S T 44

Exercise: apply left factoring S if E then S else S if E then S T S if E then S S T S else S 45

Back to problem 2 S A a b A a FIRST(S) = { a } FOLLOW(S) = { } FIRST(A) = { a } FOLLOW(A) = { a } FIRST-FOLLOW conflict 46

Solution: substitution S A a b A a Substitute A in S S a a b a b 47

Solution: substitution S A a b A a Substitute A in S S a a b a b Left factoring S a after_a after_a a b b 48

Back to problem 3 E E - term term Left recursion cannot be handled with a bounded lookahead What can we do? 49

Left recursion removal p. 130 N Nα β N βn N αn G 1 G 2 L(G 1 ) = β, βα, βαα, βααα, L(G 2 ) = same For our 3 rd example: Can be done algorithmically. Problem 1: grammar becomes mangled beyond recognition Problem 2: grammar may not be LL(1) E E - term term E term TE term TE - term TE 50

Recap Given a grammar Compute for each non-terminal NULLABLE FIRST using NULLABLE FOLLOW using FIRST and NULLABLE Compute FIRST for each sentential form appearing on right-hand side of a production Check for conflicts If exist: attempt to remove conflicts by rewriting grammar 51

Agenda Predicting productions via FIRST/FOLLOW/NULLABLE sets Handling conflicts LL(k) via pushdown automata 52

LL(1) parsing: the automata approach By MG (talk contribs) (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons 53

Marking end-of-file Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G with a new start non-terminal S and a new production rule S S $ where $ is not part of the set of tokens To parse an input α with G we change it into α $ Simplifies top-down parsing with null productions and LR parsing 54

Another convention We will assume that all productions have been consecutively numbered (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) 55

LL(1) Parsers Recursive Descent Manual construction (parsing combinators make this easier, but ) Uses recursion Wanted A parser that can be generated automatically Does not use recursion 56

LL(1) parsing via pushdown automata Pushdown automaton uses Input stream Prediction stack Parsing table Nonterminal token production rule Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t Essentially, classic conversion from CFG to PDA The only difference is that we replace nondeterministic choice with the parsing table 57

Model of non-recursive predictive parser a + b $ Stack X Y Predictive Parsing program Output Z $ Parsing Table 58

LL(1) parsing algorithm Set stack=s$ While true Prediction When top of stack is nonterminal N pop N, lookup table[n,t] If table[n,t] is not empty, push table[n,t] on prediction stack Otherwise: return syntax error Match When top of prediction stack is a terminal t, must be equal to next input token t. If (t = t ), pop t and consume t. If (t t ): return syntax error End When prediction stack is empty If input is empty at that point: return success Otherwise: return syntax error 59

Nonterminals Example transition table (1) E LIT (2) E ( E OP E ) (3) E not E (4) LIT true (5) LIT false (6) OP and (7) OP or (8) OP xor ( FIRST(E) Input tokens Which rule should be used ( ) not true false and or xor $ E 2 3 1 1 LIT 4 5 OP 6 7 8 60

Running parser example aacbb$ A aab c Input suffix Stack content Move aacbb$ A$ predict(a,a) = A aab aacbb$ aab$ match(a,a) acbb$ Ab$ predict(a,a) = A aab acbb$ aabb$ match(a,a) cbb$ Abb$ predict(a,c) = A c cbb$ cbb$ match(c,c) bb$ bb$ match(b,b) b$ b$ match(b,b) $ $ match($,$) success a b c A A aab A c 61

Illegal input example abcbb$ A aab c Input suffix Stack content Move abcbb$ A$ predict(a,a) = A aab abcbb$ aab$ match(a,a) bcbb$ Ab$ predict(a,b) = ERROR a b c A A aab A c 62

Creating the prediction table Let G be an LL(1) grammar Compute FIRST/NULLABLE/FOLLOW Check for conflicts For non-terminal N and token t predict: 63

Top-down parsing summary Recursive descent LL(k) grammars LL(k) parsing with pushdown automata Cannot deal with left recursion Left-recursion removal might result with complicated grammar 64

Next lecture: Bottom-up parsing