Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Similar documents
Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Compila(on (Semester A, 2013/14)

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

3. Parsing. Oscar Nierstrasz

CA Compiler Construction

Top down vs. bottom up parsing

Types of parsing. CMSC 430 Lecture 4, Page 1

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

LL parsing Nullable, FIRST, and FOLLOW

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Part 3. Syntax analysis. Syntax analysis 96

Syntax Analysis Part I

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntactic Analysis. Top-Down Parsing

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS502: Compilers & Programming Systems

Abstract Syntax Trees & Top-Down Parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CS 314 Principles of Programming Languages

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III. (Top-down parsing: recursive descent & LL(1) )


8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compiler construction in4303 lecture 3

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CS 406/534 Compiler Construction Parsing Part I

Monday, September 13, Parsers

Compilers. Predictive Parsing. Alex Aiken

Wednesday, August 31, Parsers

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Lexical and Syntax Analysis. Top-Down Parsing

Table-Driven Parsing

Introduction to Parsing. Comp 412

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Syntax Analysis, III Comp 412

Compiler construction lecture 3

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Lexical and Syntax Analysis

CSCI312 Principles of Programming Languages

Syntax Analysis, III Comp 412

LL(1) predictive parsing

Fall Compiler Principles Context-free Grammars Refresher. Roman Manevich Ben-Gurion University of the Negev

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Fall Compiler Principles Lecture 5: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Automatic generation of LL(1) parsers

Parsing II Top-down parsing. Comp 412

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Topic 3: Syntax Analysis I

Chapter 3. Parsing #1

Parsing Part II (Top-down parsing, left-recursion removal)

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

A programming language requires two major definitions A simple one pass compiler

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

CS 321 Programming Languages and Compilers. VI. Parsing

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

CSE302: Compiler Design

VIVA QUESTIONS WITH ANSWERS

Introduction to parsers

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Compiler Construction LECTURE # 1

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Syntax Analysis Check syntax and construct abstract syntax tree

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

Fall Compiler Principles Lecture 5: Parsing part 4. Roman Manevich Ben-Gurion University

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

CSE 401 Midterm Exam Sample Solution 2/11/15

Concepts Introduced in Chapter 4

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

Context-free grammars

Transcription:

Fall 2016-2017 Compiler Principles Lecture 2: LL parsing Roman Manevich Ben-Gurion University of the Negev 1

Books Compilers Principles, Techniques, and Tools Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Modern Compiler Implementation in Java Andrew W. Appel Modern Compiler Design D. Grune, H. Bal, C. Jacobs, K. Langendoen Advanced Compiler Design and Implementation Steven Muchnik 2

Tentative syllabus Front End Intermediate Representation Optimizations Code Generation Scanning Operational Semantics Dataflow Analysis Register Allocation Top-down Parsing (LL) Lowering Loop Optimizations Energy Optimization Bottom-up Parsing (LR) Instruction Selection mid-term exam 3

Parsing background Context-free grammars Terminals Nonterminals Start nonterminal Productions (rules) Context-free languages Derivations (leftmost, rightmost) Derivation tree (also called parse tree) Ambiguous grammars 4

Agenda Understand role of syntax analysis Parsing strategies LL parsing Building a predictor table via FIRST/FOLLOW/NULLABLE sets Pushdown automata algorithm Handling conflicts 5

Role of syntax analysis High-level Language Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation Executable Code (scheme) Recover structure from stream of tokens Parse tree / abstract syntax tree Error reporting (recovery) Other possible tasks Syntax directed translation (one pass compilers) Create symbol table Create pretty-printed version of the program, e.g., Auto Formatting function in IDE 6

From tokens to abstract syntax trees program text 59 + (1257 * xposition) Lexical Analyzer Regular expressions Finite automata Lexical error valid token stream num + ( num * id ) Grammar: E id E num E E + E E E * E E ( E ) syntax error + Parser valid Context-free grammars Push-down automata Abstract Syntax Tree num * num x 7

Marking end-of-file Sometimes it will be useful to transform a grammar G with start non-terminal S into a grammar G with a new start non-terminal S and a new production rule S S $ $ is not part of the set of tokens It is a special End-Of-File (EOF) token To parse α with G we change it into α $ Simplifies parsing grammars with null productions Also simplifies parsing LR grammars 8

Another convention We will assume that all productions have been consecutively numbered (1) S E $ (2) E T (3) E E + T (4) T id (5) T ( E ) 9

Parsing strategies 10

Broad kinds of parsers Parsers for arbitrary grammars Cocke-Younger-Kasami [ 65] method O(n 3 ) Earley s method (implemented by NLTK) O(n 3 ) but lower for restricted classes Not commonly used by compilers Parsers for restricted classes of grammars Top-Down With/without backtracking Bottom-Up 11

Top-down parsing Constructs parse tree in a topdown matter Find leftmost derivation Predictive: for every nonterminal and k-tokens predict the next production LL(k) Challenge: beginning with the start symbol, try to guess the productions to apply to end up at the user's program By Fidelio (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0-2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons 12

Predictive parsing 13

Exercise: show leftmost derivation How did we decide which production of E to take? E not E not ( E OP E ) not ( not E OP E ) not ( not LIT OP E ) not ( not true OP E ) not ( not true or E ) not ( not true or LIT ) not ( not true or false ) (1) E LIT (2) (E OP E) (3) not E (4) LIT true (5) false (6) OP and (7) or (8) xor not E E ( E OP E ) not E or LIT LIT false true 14

Predictive parsing Given a grammar G attempt to derive a word ω Idea Scan input from left to right Apply production to leftmost nonterminal Pick production rule based on next input token Problem: there is more than one production based for next token Solution: restrict grammars to LL(1) Parser correctly predicts which production to apply If grammar is not in LL(1) the parser construction algorithm will detect it 15

nonterminal LL(1) parsing via pushdown automata Input stream a + b $ Stack of symbols (current sentential form) X Y Z $ Parsing program token Derivation tree / error Prediction table production 16

LL(1) parsing algorithm Set stack=s$ while true Prediction When top of stack is nonterminal N 1. Pop N 2. lookup Table[N,t] 3. If table[n,t] is not empty, push Table[N,t] on stack else return syntax error Match When top of stack is terminal t If t=next input toke, pop t and increment input index else return syntax error End When stack is empty If input is empty return success else return syntax error 17

Nonterminals Example prediction table (1) E LIT (2) E ( E OP E ) (3) E not E (4) LIT true (5) LIT false (6) OP and (7) OP or (8) OP xor ( FIRST( ( E OP E ) ) Input tokens Table entries determine which production to take ( ) not true false and or xor $ E 2 3 1 1 LIT 4 5 OP 6 7 8 18

Running parser example aacbb$ S asb c Input suffix Stack content Move aacbb$ S$ predict(s,a) = S asb aacbb$ asb$ match(a,a) acbb$ Sb$ predict(s,a) = S asb acbb$ asbb$ match(a,a) cbb$ Sbb$ predict(s,c) = S c cbb$ cbb$ match(c,c) bb$ bb$ match(b,b) b$ b$ match(b,b) $ $ match($,$) success a b c S S asb S c 19

Illegal input example abcbb$ S asb c Input suffix Stack content Move abcbb$ S$ predict(s,a) = S asb abcbb$ asb$ match(a,a) bcbb$ Sb$ predict(s,b) = ERROR a b c S S asb S c 20

Building the prediction table Let G be a grammar Compute FIRST/NULLABLE/FOLLOW Check for conflicts No conflicts => G is an LL(1) grammar Conflicts exit => G is not an LL(1) grammar Attempt to transform G into an equivalent LL(1) grammar G 21

First sets 22

FIRST sets Definition: For a nonterminal A, FIRST(A) is the set of terminals that can start in a sentence derived from A Formally: FIRST(A) = {t A * t ω} Definition: For a sentential form α, FIRST(α) is the set of terminals that can start in a sentence derived from α Formally: FIRST(α) = {t α * t ω} 23

FIRST sets example E LIT (E OP E) not E LIT true false OP and or xor FIRST(E) =? FIRST(LIT) =? FIRST(OP) =? 24

FIRST sets example E LIT (E OP E) not E LIT true false OP and or xor FIRST(E) = FIRST(LIT) FIRST(( E OP E )) FIRST(not E) FIRST(LIT) = { true, false } FIRST(OP) = {and, or, xor} A set of recursive equations How do we solve them? 25

Computing FIRST sets Assume no null productions (A ) 1. Initially, for all nonterminals A, set FIRST(A) = { t A t ω for some ω } 2. Repeat the following until no changes occur: for each nonterminal A for each production A α 1 α k FIRST(A) := FIRST(α 1 ) FIRST(α k ) This is known as a fixed-point algorithm We will see such iterative methods later in the course and learn to reason about them 26

Exercise: compute FIRST FIRST(STMT) = FIRST(if) FIRST(while) FIRST(EXPR) FIRST(EXPR) = FIRST(TERM) FIRST(zero?) FIRST(not) FIRST(++) FIRST(--) FIRST(TERM) = FIRST(id) FIRST(constant) STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT EXPR TERM 27

Exercise: compute FIRST FIRST(STMT) = {if, while} FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} FIRST(TERM) FIRST(TERM) = {id, constant} STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT EXPR TERM 28

1. Initialization FIRST(STMT) = {if, while} FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} FIRST(TERM) FIRST(TERM) = {id, constant} STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while EXPR zero? Not ++ -- TERM id constant 29

2. Iterate 1 FIRST(STMT) = {if, while} FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} FIRST(TERM) FIRST(TERM) = {id, constant} STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while zero? Not ++ -- EXPR zero? Not ++ -- TERM id constant 30

2. Iterate 2 FIRST(STMT) = {if, while} FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} FIRST(TERM) FIRST(TERM) = {id, constant} STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while zero? Not ++ -- EXPR zero? Not ++ -- id constant TERM id constant 31

2. Iterate 3 fixed-point FIRST(STMT) = {if, while} FIRST(EXPR) FIRST(EXPR) = {zero?, not, ++, --} FIRST(TERM) FIRST(TERM) = {id, constant} STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM -> id zero? TERM not EXPR ++ id -- id TERM id constant STMT if while zero? Not ++ -- EXPR zero? Not ++ -- id constant TERM id constant id constant 32

Reasoning about the algorithm Assume no null productions (A ) 1. Initially, for all nonterminals A, set FIRST(A) = { t A t ω for some ω } 2. Repeat the following until no changes occur: for each nonterminal A for each production A α 1 α k FIRST(A) := FIRST(α 1 ) FIRST(α k ) Is the algorithm correct? Does it terminate? (complexity) 33

Reasoning about the algorithm Termination: Correctness: 34

LL(1) Parsing of grammars without epsilon productions 35

Using FIRST sets Assume G has no epsilon productions and for every non-terminal X and every pair of productions X and X we have that FIRST( ) FIRST( ) = {} No intersection between FIRST sets => can always pick a single rule 36

Using FIRST sets In our Boolean expressions example FIRST( LIT ) = { true, false } FIRST( ( E OP E ) ) = { ( } FIRST( not E ) = { not } If the FIRST sets intersect, may need longer lookahead LL(k) = class of grammars in which production rule can be determined using a lookahead of k tokens LL(1) is an important and useful class What if there are epsilon productions? 37

Extending LL(1) Parsing for epsilon productions 38

FIRST, FOLLOW, NULLABLE sets For each non-terminal X FIRST(X) = set of terminals that can start in a sentence derived from X FIRST(X) = {t X * t ω} NULLABLE(X) if X * FOLLOW(X) = set of terminals that can follow X in some derivation FOLLOW(X) = {t S * X t } 39

Computing the NULLABLE set Lemma: NULLABLE( 1 k ) = NULLABLE( 1 ) NULLABLE( k ) 1. Initially NULLABLE(X) = false 2. For each non-terminal X if exists a production X then NULLABLE(X) = true 3. Repeat for each production Y 1 k if NULLABLE( 1 k ) then NULLABLE(Y) = true until NULLABLE stabilizes 40

Exercise: compute NULLABLE S A a b A a B A B C C b NULLABLE(S) = NULLABLE(A) NULLABLE(a) NULLABLE(b) NULLABLE(A) = NULLABLE(a) NULLABLE( ) NULLABLE(B) = NULLABLE(A) NULLABLE(B) NULLABLE(C) NULLABLE(C) = NULLABLE(b) NULLABLE( ) 41

FIRST with epsilon productions How do we compute FIRST( 1 k ) when epsilon productions are allowed? FIRST( 1 k ) =? 42

FIRST with epsilon productions How do we compute FIRST( 1 k ) when epsilon productions are allowed? FIRST( 1 k ) = if not NULLABLE( 1 ) then FIRST( 1 ) else FIRST( 1 ) FIRST ( 2 k ) 43

Exercise: compute FIRST S A c b A a NULLABLE(S) = NULLABLE(A) NULLABLE(c) NULLABLE(b) NULLABLE(A) = NULLABLE(a) NULLABLE( ) FIRST(S) = FIRST(A) FIRST(cb) FIRST(A) = FIRST(a) FIRST ( ) FIRST(S) = FIRST(A) {c} FIRST(A) = {a} 44

FOLLOW sets if X α Y then FOLLOW(Y)? if NULLABLE( ) or = then FOLLOW(Y)? p. 189 45

FOLLOW sets if X α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y)? p. 189 46

FOLLOW sets if X α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y) FOLLOW(X) p. 189 47

FOLLOW sets p. 189 if X α Y then FOLLOW(Y) FIRST( ) if NULLABLE( ) or = then FOLLOW(Y) FOLLOW(X) Allows predicting epsilon productions: X when the lookahead token is in FOLLOW(X) S A c b A a What should we predict for input cb? What should we predict for input acb? 48

LL(1) conflicts 49

Conflicts FIRST-FIRST conflict X α and X and If FIRST(α) FIRST(β) {} FIRST-FOLLOW conflict NULLABLE(X) If FIRST(X) FOLLOW(X) {} 50

LL(1) grammars A grammar is in the class LL(1) when its LL(1) prediction table contains no conflicts A language is said to be LL(1) when it has an LL(1) grammar 51

LL(k) grammars 52

LL(k) grammars Generalizes LL(1) for k lookahead tokens Need to generalize FIRST and FOLLOW for k lookahead tokens 53

Agenda LL(k) via pushdown automata Predicting productions via FIRST/FOLLOW/NULLABLE sets Handling conflicts 54

Handling conflicts 55

Problem 1: FIRST-FIRST conflict term ID indexed_elem indexed_elem ID [ expr ] FIRST(term) = { ID } FIRST(indexed_elem) = { ID } How can we transform the grammar into an equivalent grammar that does not have this conflict? 56

Solution: left factoring Rewrite the grammar to be in LL(1) term ID indexed_elem indexed_elem ID [ expr ] New grammar is more complex has epsilon production term ID after_id After_ID [ expr ] Intuition: just like factoring in algebra: x*y + x*z into x*(y+z) 57

Exercise: apply left factoring S if E then S else S if E then S T 58

Exercise: apply left factoring S if E then S else S if E then S T S if E then S S T S else S 59

Problem 2: FIRST-FOLLOW conflict S A a b A a FIRST(S) = { a } FOLLOW(S) = { } FIRST(A) = { a } FOLLOW(A) = { a } How can we transform the grammar into an equivalent grammar that does not have this conflict? 60

Solution: substitution S A a b A a Substitute A in S S a a b a b 61

Solution: substitution S A a b A a Substitute A in S S a a b a b Left factoring S a after_a after_a a b b 62

Problem 3: FIRST-FIRST conflict E E - term term Left recursion cannot be handled with a bounded lookahead How can we transform the grammar into an equivalent grammar that does not have this conflict? 63

Solution: left recursion removal p. 130 N Nα β N βn N αn G 1 G 2 L(G 1 ) = β, βα, βαα, βααα, L(G 2 ) = same For our 3 rd example: Can be done algorithmically. Problem 1: grammar becomes mangled beyond recognition Problem 2: grammar may not be LL(1) E E - term term E term TE term TE - term TE 64

Recap Given a grammar Compute for each non-terminal NULLABLE FIRST using NULLABLE FOLLOW using FIRST and NULLABLE Compute FIRST for each sentential form appearing on right-hand side of a production Check for conflicts If exist: attempt to remove conflicts by rewriting grammar 65

The bigger picture Compilers include different kinds of program analyses each further constrains the set of legal programs Lexical constraints Syntax constraints Semantic constraints Logical constraints (Verifying Compiler grand challenge) Program consists of legal tokens Program included in a given contextfree language Program included in a given attribute grammar (type checking, legal inheritance graph, variables initialized before used) Memory safety: null dereference, array-out-of-bounds access, data races, functional correctness (program meets specification) 66

Next lecture: bottom-up parsing