Compila(on (Semester A, 2013/14)

Similar documents
Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

3. Parsing. Oscar Nierstrasz

Compilers. Predictive Parsing. Alex Aiken

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Top down vs. bottom up parsing

CA Compiler Construction

Abstract Syntax Trees & Top-Down Parsing

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Types of parsing. CMSC 430 Lecture 4, Page 1

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

CS502: Compilers & Programming Systems

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Monday, September 13, Parsers

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012


CS2210: Compiler Construction Syntax Analysis Syntax Analysis

EDA180: Compiler Construc6on. Top- down parsing. Görel Hedin Revised: a

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Lexical and Syntax Analysis

Lexical and Syntax Analysis. Top-Down Parsing

Wednesday, August 31, Parsers

Table-Driven Parsing

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSCI312 Principles of Programming Languages

Parsing Part II (Top-down parsing, left-recursion removal)

Syntactic Analysis. Top-Down Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Compiler construction lecture 3

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Compiler construction in4303 lecture 3

Syntax Analysis Part I

CS 314 Principles of Programming Languages

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Topdown parsing with backtracking

Part 3. Syntax analysis. Syntax analysis 96

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

EDA180: Compiler Construc6on Context- free grammars. Görel Hedin Revised:

Extra Credit Question

Introduction to parsers

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

CS 406/534 Compiler Construction Parsing Part I

Introduction to Parsing. Comp 412

Chapter 3. Parsing #1

LL(1) predictive parsing

CS Parsing 1

CMSC 330: Organization of Programming Languages

Syntax Analysis, III Comp 412

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

A programming language requires two major definitions A simple one pass compiler

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

CMSC 330: Organization of Programming Languages

Syntax Analysis, III Comp 412

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

CMSC 330, Fall 2009, Practice Problem 3 Solutions

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

CS 321 Programming Languages and Compilers. VI. Parsing

COP 3402 Systems Software Syntax Analysis (Parser)

Concepts Introduced in Chapter 4

VIVA QUESTIONS WITH ANSWERS

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Parsing II Top-down parsing. Comp 412

LL parsing Nullable, FIRST, and FOLLOW

Chapter 4: Syntax Analyzer

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

CMSC 330 Practice Problem 4 Solutions

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Plan for Today. Regular Expressions: repetition and choice. Syntax and Semantics. Context Free Grammars

CMSC 330: Organization of Programming Languages

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Context-free grammars (CFG s)

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Transcription:

Compila(on 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top- Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky Slides credit: Roman Manevich, Mooly Sagiv, Jeff Ullman, Eran Yahav 1

Admin Next week: Trubowicz 101 (Law school) Mobiles... 2

What is a Compiler? A compiler is a computer program that transforms source code wri\en in a programming language (source language) into another language (target language). The most common reason for wan(ng to transform source code is to create an executable program. - - Wikipedia 3

Conceptual Structure of a Compiler Compiler Source text txt Frontend Semantic Representation Backend Executable code exe Lexical Analysis Syntax Analysis Parsing Semantic Analysis Intermediate Representation (IR) Code Generation words sentences 4

From scanning to parsing program text ((23 + 7) * x) Lexical Analyzer token stream ( LP ( LP 23 Num + OP 7 Num ) RP * OP x Id ) RP Grammar: E... Id Id a... z syntax error Parser Op(*) valid Abstract Syntax Tree Op(+) Id(b) Num(23) Num(7) 5

Context Free Grammars V non terminals (syntac(c variables) T terminals (tokens) P deriva(on rules Each rule of the form V t (T 4 V)* S start symbol G = (V,T,P,S) 6

CFG terminology S S ; S S id := E E id E num E E + E Symbols: Terminals (tokens): ; := ( ) id num print Non- terminals: S E L Start non- terminal: S Conven(on: the non- terminal appearing in the first deriva(on rule Grammar produc=ons (rules) N μ 7

CFG terminology Deriva(on - a sequence of replacements of non- terminals using the deriva(on rules Language - the set of strings of terminals derivable from the start symbol Senten(al form - the result of a par(al deriva(on in which there may be non- terminals 8

Parse Tree S S ; S S id := E ; S id := id S ; S S id := id ; id := E id := id id := E + E ; E id := E id := id id := E + id ; id := id ; id := id + id x:= z ; y := x + z E E id := id ; id + id 9

Leomost/rightmost Deriva(on Leomost deriva(on always expand leomost non- terminal Rightmost deriva(on Always expand rightmost non- terminal Allows us to describe deriva(on by lis(ng the sequence of rules always know what a rule is applied to 10

Leomost Deriva(on x := z; y := x + z S t S;S S t id := E E t id E + E E * E ( E ) S S ; S id := E ; S id := id ; S id := id ; id := E id := id ; id := E + E id := id ; id := id + E id := id ; id := id + id x := z ; y := x + z S t S;S S t id := E E t id S t id := E E t E + E E t id E t id 11

Broad kinds of parsers Parsers for arbitrary grammars Earley s method, CYK method Usually, not used in prac(ce (though might change) Top- Down parsers Construct parse tree in a top- down ma\er Find the leomost deriva(on Linear Bo\om- Up parsers Construct parse tree in a bo\om- up manner Find the rightmost deriva(on in a reverse order 12

Intui(on: Top- Down Parsing Begin with start symbol Guess the produc(ons Check if parse tree yields user's program 13

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E Recall: Non standard precedence T T F F F 1 + 2 * 3 14

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) We need this rule to get the * E 1 + 2 * 3 15

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E E T 1 + 2 * 3 16

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E T 1 + 2 * 3 17

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E T T F 1 + 2 * 3 18

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E T T F 1 F + 2 * 3 19

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E T T F 1 F + 2 * 3 20

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E T T F 1 F + 2 * 3 21

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E T T F F F 1 + 2 * 3 22

Intui(on: Top- Down parsing Unambiguous grammar E E * T E T T T + F T F F id F num F ( E ) E T E T T F F F 1 + 2 * 3 23

Challenges in top- down parsing Top- down parsing begins with virtually no informa(on Begins with just the start symbol, which matches every program How can we know which produc(ons to apply? 24

Recursive Descent Blind exhaus(ve search Goes over all possible produc(on rules Read & parse prefixes of input Backtracks if guesses wrong Implementa(on Uses (possibly recursive) func(ons for every produc(on rule Backtracks è rewind input 25

Recursive descent bool A(){ // A à A 1 A n pos = recordcurrentposition(); } for (i = 1; i n; i++){ if (A i ()) return true; rewindcurrent(pos); } return false; bool A i (){ // A i = X 1 X 2 X k for (j=1; j k; j++) if (X j is a terminal) if (X j == current) match(current); else return false; else if (! X j ()) return false; return true; } current token stream ( LP ( LP 23 Num + OP 7 Num ) RP * OP x Id ) RP 26

Example Grammar E à T T + E T à int int * T ( E ) Input: ( 5 ) Token stream: LP int RP 27

Problem: Leo Recursion p. 127 E E - term term int E() { return E() && match(token( - )) && term(); } What happens with this procedure? Recursive descent parsers cannot handle leo- recursive grammars 28

Leo recursion removal p. 130 N Nα β N βn N αn ε G 1 G 2 L(G 1 ) = β, βα, βαα, βααα, L(G 2 ) = same Can be done algorithmically. Problem: grammar becomes mangled beyond recogni(on For our 3 rd example: E E - term term E term TE term TE - term TE ε 29

Challenges in top- down parsing Top- down parsing begins with virtually no informa(on Begins with just the start symbol, which matches every program How can we know which produc(ons to apply? Wanted: Top- Down parsing without backtracking 30

Predic(ve parsing Given a grammar G and a word w derive w using G Apply produc(on to leomost nonterminal Pick produc(on rule based on next input token General grammar More than one op(on for choosing the next produc(on based on a token Restricted grammars (LL) Know exactly which single rule to apply based on Non terminal Next (k) tokens (lookahead) 31

Boolean expressions example E LIT (E OP E) not E LIT true false OP and or xor not ( not true or false ) 32

Boolean expressions example E LIT (E OP E) not E LIT true false OP and or xor produc(on to apply known from next token not ( not true or false ) E E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E ( E OP E ) not LIT or LIT true false 33

Problem: produc(ons with common prefix term ID ID [ expr ] Cannot tell which rule to use based on lookahead (ID) 34

Solu(on: leo factoring Rewrite the grammar to be in LL(1) term ID ID [ expr ] term ID aoer_id Aoer_ID [ expr ] ε Intui(on: just like factoring x*y + x*z into x*(y+z) 35

Leo factoring another example S if E then S else S if E then S T S if E then S S T S else S ε 36

LL(k) Parsers Predic(ve parser Can be generated automa(cally Does not use recursion Efficient In contrast, recursive descent Manual construc(on Recursive Expensive 37

LL(k) parsing via pushdown automata and predic(on table Pushdown automaton uses Predic(on stack Input token stream Transi(on table nonterminals x tokens - > produc(on alterna(ve Entry indexed by nonterminal N and token t contains the alterna(ve of N that must be predicated when current input starts with t 38

Example transi(on table (1) E LIT (2) E ( E OP E ) (3) E not E (4) LIT true (5) LIT false (6) OP and (7) OP or (8) OP xor Input tokens Which rule should be used Nonterminals ( ) not true false and or xor $ E 2 3 1 1 LIT 4 5 OP 6 7 8 Table 39

Non- Recursive Predic(ve Parser a + b $ Stack X Y Predic(ve Parsing program Output Z $ Parsing Table 40

LL(k) parsing via pushdown automata and predic(on table Two possible moves Predic(on When top of stack is nonterminal N, pop N, lookup table[n,t]. If table[n,t] is not empty, push table[n,t] on predic(on stack, otherwise syntax error Match When top of predic(on stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t T) syntax error Parsing terminates when predic(on stack is empty If input is empty at that point, success. Otherwise, syntax error 41

Running parser example aacbb$ A t aab c Input suffix Stack content Move aacbb$ A$ predict(a,a) = A t aab aacbb$ aab$ match(a,a) acbb$ Ab$ predict(a,a) = A t aab acbb$ aabb$ match(a,a) cbb$ Abb$ predict(a,c) = A t c cbb$ cbb$ match(c,c) bb$ bb$ match(b,b) b$ b$ match(b,b) $ $ match($,$) success a b c A A t aab A t c 42

FIRST sets FIRST(α) = { t α à * t β} {X à * ℇ } FIRST(α) = all terminals that α can appear as first in some deriva(on for α + ℇ if can be derived from X Example: FIRST( LIT ) = { true, false } FIRST( ( E OP E ) ) = { ( } FIRST( not E ) = { not } 43

Compu(ng FIRST sets FIRST (t) = { t } // t non terminal ℇ FIRST(X) if X à ℇ or X à A 1.. A k and ℇ FIRST(A i ) i=1 k FIRST(α) FIRST(X) if Xà A 1.. A k α and ℇ FIRST(A i ) i=1 k 44

FIRST sets computa(on example STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM zero? TERM not EXPR ++ id - - id TERM id constant STMT EXPR TERM 45

1. Ini(aliza(on STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM zero? TERM not EXPR ++ id - - id TERM id constant STMT if while EXPR zero? Not ++ - - TERM id constant 46

2. Iterate 1 STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM zero? TERM not EXPR ++ id - - id TERM id constant STMT if while zero? Not ++ - - EXPR zero? Not ++ - - TERM id constant 47

2. Iterate 2 STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM - > id zero? TERM not EXPR ++ id - - id TERM id constant STMT if while zero? Not ++ - - EXPR zero? Not ++ - - id constant TERM id constant 48

2. Iterate 3 fixed- point STMT if EXPR then STMT while EXPR do STMT EXPR ; EXPR TERM zero? TERM not EXPR ++ id - - id TERM id constant STMT if while zero? Not ++ - - EXPR zero? Not ++ - - id constant TERM id constant id constant 49

FOLLOW sets p. 189 FOLLOW(X) = { t Sà * α X t β} FOLLOW(X) = set of tokens that can immediately follow X in some senten(al form NOT related to what can be derived from X Intui(on: X à A B FIRST(B) FOLLOW(A) FOLLOW(B) FOLLOW(X) If Bà * ε then FOLLOW(A) FOLLOW(X) 50

FOLLOW sets: Constraints $ FOLLOW(S) FIRST(β) {ℇ} FOLLOW(X) For each A à α X β FOLLOW(A) FOLLOW(X) For each A à α X β and ℇ FIRST(β) 51

Example: FOLLOW sets Eà TX Tà (E) int Y Xà + E ℇ Y à * T ℇ Terminal + ( * ) int FOLLOW int, ( int, ( int, ( _, ), $ *, ), +, $ Non. Term. E T X Y FOLLOW ), $ +, ), $ $, ) _, ), $ 52

Predic(on Table A à α T[A,t] = α if t FIRST(α) T[A,t] = α if ℇ FIRST(α) and t FOLLOW(A) t can also be $ T is not well defined è the grammar is not LL(1) 53

Problem: Non LL Grammars S A a b A a ε bool S() { return A() && match(token( a )) && match(token( b )); } bool A() { return match(token( a )) true; } What happens for input ab? What happens if you flip order of alterna(ves and try aab? 54

Problem: Non LL Grammars S A a b A a ε FIRST(S) = { a } FOLLOW(S) = { $ } FIRST(A) = { a ε } FOLLOW(A) = { a } FIRST/FOLLOW conflict 55

Solu(on: subs(tu(on S A a b A a ε S a a b a b S a aoer_a aoer_a a b b Subs(tute A in S Leo factoring 56

LL(k) grammars A grammar is LL(k) if it can be derived via: Top- down deriva(on Scanning the input from leo to right (L) Producing the leomost deriva(on (L) With lookahead of k tokens (k) T is well defined A language is said to be LL(k) when it has an LL(k) grammar 57

LL(k) grammars A grammar is not LL(k) if it is Ambiguous Leo recursive Not leo factored 58

Earley Parsing Invented by Earley [PhD. 1968] Handles arbitrary CFG Can handle ambiguous grammars Complexity O(N 3 ) when N = input Uses dynamic programming Compactly encodes ambiguity 59

Dynamic programming Break a problem P into subproblems P 1 P k Solve P by combining solu(ons for P 1 P k Memo- ize (store) solu(ons to subproblems instead of re- computa(on Bellman- Ford shortest path algorithm Sol(x,y,i) = minimum of Sol(x,y,i- 1) Sol(t,y,i- 1) + weight(x,t) for edges (x,t) 60

Earley Parsing Dynamic programming implementa(on of a recursive descent parser S[N+1] Sequence of sets of Earley states N = INPUT Earley state (item) s is a senten(al form + aux info S[i] All parse tree that can be produced (by a RDP) aoer reading the first i tokens S[i+1] built using S[0] S[i] 61

Earley States s = < cons(tuent, back > cons(tuent (do\ed rule) for Aà αβ Aà αβ predicated cons(tuents Aà α β in- progress cons(tuents Aà αβ completed cons(tuents back previous Early state in deriva(on 62

Earley Parser Input = x[1 N] S[0] = <E à E, 0>; S[1] = S[N] = {} for i = 0... N do un(l S[i] does not change do foreach s S[i] if s = <Aà a, b> and a=x[i+1] then S[i+1] = S[i+1] {<Aà a, b> } // scan if s = <Aà X, b> and Xà α then S[i] = S[i] {<Xà α, i > } // predict if s = < Aà, b> and <Xà A, k> S[b] then S[i] = S[i] {<Xà A, k> } // complete 63

Example S 0 S E,0 E E + E,0 E n,0 n S 1 E n, 0 S E, 0 E E +E,0 + S 2 E E + E,0 E E + E,2 E n,2 n S 3 E n, 2 E E + E, 0 E E +E,2 S E, 0 FIGURE 1. Earley sets for the grammar E E + E n and the input n + n.itemsinboldareoneswhichcorrespondtothe input s derivation. 64

65