Compilation Lecture 3: Syntax Analysis: Top-Down parsing. Noam Rinetzky

Similar documents
Fall Compiler Principles Lecture 3: Parsing part 2. Roman Manevich Ben-Gurion University

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

Compila(on (Semester A, 2013/14)

Fall Compiler Principles Lecture 2: LL parsing. Roman Manevich Ben-Gurion University of the Negev

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

Compiler construction in4303 lecture 3

Compiler construction lecture 3

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Compilers. Predictive Parsing. Alex Aiken

Abstract Syntax Trees & Top-Down Parsing

Top down vs. bottom up parsing

Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Lexical and Syntax Analysis. Top-Down Parsing

CS502: Compilers & Programming Systems

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Types of parsing. CMSC 430 Lecture 4, Page 1

1 Introduction. 2 Recursive descent parsing. Predicative parsing. Computer Language Implementation Lecture Note 3 February 4, 2004

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lexical and Syntax Analysis

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CA Compiler Construction

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Table-Driven Parsing

Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Administrativia. WA1 due on Thu PA2 in a week. Building a Parser III. Slides on the web site. CS164 3:30-5:00 TT 10 Evans.

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

3. Parsing. Oscar Nierstrasz

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Syntactic Analysis. Top-Down Parsing

Topdown parsing with backtracking

Wednesday, August 31, Parsers

Monday, September 13, Parsers

Part 3. Syntax analysis. Syntax analysis 96

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CS 314 Principles of Programming Languages

CMSC 330: Organization of Programming Languages

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Compiler Design 1. Top-Down Parsing. Goutam Biswas. Lect 5

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

CSCI312 Principles of Programming Languages

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Compilers: CS31003 Computer Sc & Engg: IIT Kharagpur 1. Top-Down Parsing. Lect 5. Goutam Biswas

Concepts Introduced in Chapter 4

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

4 (c) parsing. Parsing. Top down vs. bo5om up parsing

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Parsing Part II (Top-down parsing, left-recursion removal)

Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

LL(1) predictive parsing

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Ambiguity. Grammar E E + E E * E ( E ) int. The string int * int + int has two parse trees. * int

CMSC 330: Organization of Programming Languages

CS 4120 Introduction to Compilers

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

CS 321 Programming Languages and Compilers. VI. Parsing

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Topic 3: Syntax Analysis I

CS Parsing 1

Lecture Notes on Top-Down Predictive LL Parsing

CMSC 330: Organization of Programming Languages

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Compiler Construction 2016/2017 Syntax Analysis

CMSC 330, Fall 2009, Practice Problem 3 Solutions

Introduction to Parsing. Comp 412

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Parsing III. (Top-down parsing: recursive descent & LL(1) )

Introduction to Syntax Analysis

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

More Bottom-Up Parsing

CMSC 330: Organization of Programming Languages

Languages and Compilers

A programming language requires two major definitions A simple one pass compiler

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

COP 3402 Systems Software Syntax Analysis (Parser)

Syntax Analysis Part I

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Introduction to parsers

CSE 401 Midterm Exam Sample Solution 2/11/15

COMP 181. Prelude. Next step. Parsing. Study of parsing. Specifying syntax with a grammar

Introduction to Syntax Analysis. The Second Phase of Front-End

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Parsing. source code. while (k<=n) {sum = sum+k; k=k+1;}

CS 403: Scanning and Parsing

Context-free grammars

Transcription:

Compilation 0368-3133 Lecture 3: Syntax Analysis: Top-Down parsing Noam Rinetzky 1

Recursive descent parsing Define a function for every nonterminal Every function work as follows Find applicable production rule Terminal function checks match with next input token Nonterminal function calls (recursively) other functions If there are several applicable productions for a nonterminal, use lookahead 2

FIRST sets FIRST(X) = { t X à* t β} {ℇ X à * ℇ} FIRST(X) = all terminals that α can appear as first in some derivation for X + ℇ if can be derived from X Example: FIRST( LIT ) = { true, false } FIRST( ( E OP E ) ) = { ( } FIRST( not E ) = { not } 3

FOLLOW sets p. 189 What do we do with nullable (e) productions? A B C D B e C e Use what comes afterwards to predict the right production For every production rule A α FOLLOW(A) = set of tokens that can immediately follow A Can predict the alternative A k for a non-terminal N when the lookahead token is in the set FIRST(A k ) (if A k is nullable then FOLLOW(N)) 4

FOLLOW sets: Constraints $ FOLLOW(S) FIRST(β) {ℇ} FOLLOW(X) For each A à α X β FOLLOW(A) FOLLOW(X) For each A à α X β and ℇ FIRST(β) 5

Example: FOLLOW sets Eà TX Tà (E) int Y Xà+ E ℇ Y à * T ℇ Terminal + ( * ) int FOLLOW int, ( int, ( int, ( +, ), $ *, ), +, $ Non. Term. E T X Y FOLLOW ), $ +, ), $ $, ) +, ), $ 6

Prediction Table A à α T[A,t] = α if t FIRST(α) T[A,t] = α if ℇ FIRST(α) and t FOLLOW(A) t can also be $ T is not well defined è the grammar is not LL(1) 7

LL(k) grammars A grammar is in the class LL(K) when it can be derived via: Top-down derivation Scanning the input from left to right (L) Producing the leftmost derivation (L) With lookahead of k tokens (k) A language is said to be LL(k) when it has an LL(k) grammar 8

LL(1) grammars A grammar is in the class LL(1) iff For every two productions A α and A β we have FIRST(α) FIRST(β) = {} // including e If e FIRST(α) then FIRST(β) FOLLOW(A) = {} If e FIRST(β) then FIRST(α) FOLLOW(A) = {} 9

Problem: Non LL Grammars 10

Problem 1: productions with common prefix term ID indexed_elem indexed_elem ID [ expr ] The function for indexed_elem will never be tried What happens for input of the form ID[expr] 11

S A a b A a e Problem 2: null productions S() { return A() ; match(token( a )) ; match(token( b )) } A() { } match(token( a )) skip What happens for input ab? What happens if you flip order of alternatives and try aab? 12

Problem 3: left recursion p. 127 E E - term term E() { return E() ; match(token( - )) ; term() } term() What happens with this procedure? Recursive descent parsers cannot handle left-recursive grammars 13

What to do? 14

Problem: Non LL Grammars S A a b A a e bool S() { return A() && match(token( a )) && match(token( b )); } bool A() { return match(token( a )) true; } What happens for input ab? What happens if you flip order of alternatives and try aab? 15

Problem: Non LL Grammars S A a b A a e FIRST(S) = { a } FOLLOW(S) = { $ } FIRST(A) = { a, e } FOLLOW(A) = { a } FIRST/FOLLOW conflict 16

Back to problem 1 term ID indexed_elem indexed_elem ID [ expr ] FIRST(term) = { ID } FIRST(indexed_elem) = { ID } FIRST/FIRST conflict 17

Solution: left factoring Rewrite the grammar to be in LL(1) term ID indexed_elem indexed_elem ID [ expr ] term ID after_id After_ID [ expr ] e Intuition: just like factoring x*y + x*z into x*(y+z) 18

Left factoring another example S if E then S else S if E then S T S if E then S S T S else S e 19

Back to problem 2 S A a b A a e FIRST(S) = { a } FOLLOW(S) = { } FIRST(A) = { a, e } FOLLOW(A) = { a } FIRST/FOLLOW conflict 20

Solution: substitution S A a b A a e S a a b a b S a after_a after_a a b b Substitute A in S Left factoring 21

Back to problem 3 E E - term term Left recursion cannot be handled with a bounded lookahead What can we do? 22

Left recursion removal p. 130 N Nα β N βn N αn e G 1 G 2 L(G 1 ) = β, βα, βαα, βααα, L(G 2 ) = same Can be done algorithmically. Problem: grammar becomes mangled beyond recognition For our 3 rd example: E E - term term E term TE term TE - term TE e 23

LL(k) Parsers Recursive Descent Manual construction Uses recursion Wanted A parser that can be generated automatically Does not use recursion 24

Pushdown Automata (PDA) 25

Intuition: PDA An ε-nfa with the additional power to manipulate one stack Top X Y IF $ stack control (ε-nfa) 26

Intuition: PDA Think of an ε-nfa with the additional power that it can manipulate a stack PDA moves are determined by: The current state (of its ε-nfa ) The current input symbol (or ε) The current symbol on top of its stack 27

Intuition: PDA Current input if (oops) then stat:= blah else abort Top X Y IF $ stack control (ε-nfa) 28

PDA Formalism PDA = (Q, Σ, Γ, δ, q 0, $, F): Q: finite set of states Σ: Input symbols alphabet Γ: stack symbols alphabet δ: transition function q 0 : start state $: start symbol F: set of final states Tokens Non terminals 29

LL(k) parsing via pushdown Pushdown automaton uses Prediction stack Input stream Transition table automata nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t 30

LL(k) parsing via pushdown automata Two possible moves Prediction When top of stack is nonterminal N, pop N, lookup table[n,t]. If table[n,t] is not empty, push table[n,t] on prediction stack, otherwise syntax error Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t T) syntax error Parsing terminates when prediction stack is empty If input is empty at that point, success. Otherwise, syntax error 31

Example transition table (1) E LIT (2) E ( E OP E ) (3) E not E (4) LIT true (5) LIT false (6) OP and (7) OP or (8) OP xor Input tokens Which rule should be used Nonterminals ( ) not true false and or xor $ E 2 3 1 1 LIT 4 5 OP 6 7 8 32

Model of non-recursive predictive parser a + b $ Stack X Y Predictive Parsing program Output Z $ Parsing Table 33

Running parser example aacbb$ A aab c Input suffix Stack content Move aacbb$ A$ predict(a,a) = A aab aacbb$ aab$ match(a,a) acbb$ Ab$ predict(a,a) = A aab acbb$ aabb$ match(a,a) cbb$ Abb$ predict(a,c) = A c cbb$ cbb$ match(c,c) bb$ bb$ match(b,b) b$ b$ match(b,b) $ $ match($,$) success a b c A A aab A c 34

Erorrs 35

Handling Syntax Errors Report and locate the error Diagnose the error Correct the error Recover from the error in order to discover more errors without reporting too many strange errors 36

Error Diagnosis Line number may be far from the actual error The current token The expected tokens Parser configuration 37

Error Recovery Becomes less important in interactive environments Example heuristics: Search for a semi-column and ignore the statement Try to replace tokens for common errors Refrain from reporting 3 subsequent errors Globally optimal solutions For every input w, find a valid program w with a minimal-distance from w 38

Illegal input example abcbb$ A aab c Input suffix Stack content Move abcbb$ A$ predict(a,a) = A aab abcbb$ aab$ match(a,a) bcbb$ Ab$ predict(a,b) = ERROR a b c A A aab A c 39

Error handling in LL parsers c$ S a c b S Input suffix Stack content Move c$ S$ predict(s,c) = ERROR Now what? Predict b S anyway missing token b inserted in line XXX a b c S S a c S b S 40

Error handling in LL parsers c$ S a c b S Input suffix Stack content Move bc$ S$ predict(b,c) = S bs bc$ bs$ match(b,b) c$ S$ Looks familiar? Result: infinite loop a b c S S a c S b S 41

Error handling and recovery x = a * (p+q * ( -b * (r-s); Where should we report the error? The valid prefix property 42

The Valid Prefix Property For every prefix tokens t 1, t 2,, t i that the parser identifies as legal: there exists tokens t i+1, t i+2,, t n such that t 1, t 2,, t n is a syntactically valid program If every token is considered as single character: For every prefix word u that the parser identifies as legal there exists w such that u.w is a valid program 43

The Valid Prefix Property For every prefix tokens t 1, t 2,, t i that the parser identifies as legal: there exists tokens t i+1, t i+2,, t n such that t 1, t 2,, t n is a syntactically valid program If every token is considered as single character: For every prefix word u that the parser identifies as legal there exists w such that u.w is a valid program 44

Recovery is tricky Heuristics for dropping tokens, skipping to semicolon, etc. 45

Building the Parse Tree 46

Adding semantic actions Can add an action to perform on each production rule Can build the parse tree Every function returns an object of type Node Every Node maintains a list of children Function calls can add new children 47

Building the parse tree Node E() { result = new Node(); result.name = E ; if (current Î {TRUE, FALSE}) // E LIT result.addchild(lit()); else if (current == LPAREN) // E ( E OP E ) result.addchild(match(lparen)); result.addchild(e()); result.addchild(op()); result.addchild(e()); result.addchild(match(rparen)); else if (current == NOT) // E not E result.addchild(match(not)); result.addchild(e()); else error; return result; } 48

Parser for Fully Parenthesized Expers static int Parse_Expression(Expression **expr_p) { Expression *expr = *expr_p = new_expression() ; /* try to parse a digit */ if (Token.class == DIGIT) { expr->type= D ; expr->value=token.repr 0 ; get_next_token(); return 1; } /* try parse parenthesized expression */ if (Token.class == ( ) { expr->type= P ; get_next_token(); if (!Parse_Expression(&expr->left)) Error( missing expression ); if (!Parse_Operator(&expr->oper)) Error( missing operator ); if (Token.class!= ) ) Error( missing ) ); get_next_token(); return 1; } return 0; } 49