Compiler Construction: Parsing

Similar documents
Context-free grammars

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Compiler Construction 2016/2017 Syntax Analysis

S Y N T A X A N A L Y S I S LR

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Principles of Programming Languages

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Concepts Introduced in Chapter 4

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Compilers. Bottom-up Parsing. (original slides by Sam

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay


UNIT-III BOTTOM-UP PARSING

SLR parsers. LR(0) items


Lexical and Syntax Analysis. Bottom-Up Parsing

Monday, September 13, Parsers

LR Parsing Techniques

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Syntax Analyzer --- Parser

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Wednesday, August 31, Parsers

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

LALR Parsing. What Yacc and most compilers employ.

LR Parsers. Aditi Raste, CCOEW

4. Lexical and Syntax Analysis

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

4. Lexical and Syntax Analysis

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Bottom-Up Parsing. Lecture 11-12

UNIT III & IV. Bottom up parsing

In One Slide. Outline. LR Parsing. Table Construction

Using an LALR(1) Parser Generator

3. Parsing. Oscar Nierstrasz

CA Compiler Construction

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

Lecture 7: Deterministic Bottom-Up Parsing

LR Parsing LALR Parser Generators

Lecture 8: Deterministic Bottom-Up Parsing

Chapter 4. Lexical and Syntax Analysis

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Chapter 4: LR Parsing

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

VIVA QUESTIONS WITH ANSWERS

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

CS 4120 Introduction to Compilers

Syn S t yn a t x a Ana x lysi y s si 1

LR Parsing Techniques

Bottom-Up Parsing. Lecture 11-12

How do LL(1) Parsers Build Syntax Trees?

LR Parsing LALR Parser Generators

Syntactic Analysis. Top-Down Parsing

MODULE 14 SLR PARSER LR(0) ITEMS

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lecture Bottom-Up Parsing

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Syntax Analysis Part I

Conflicts in LR Parsing and More LR Parsing Types

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CSE302: Compiler Design

Syntax-Directed Translation

shift-reduce parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Types of parsing. CMSC 430 Lecture 4, Page 1

General Overview of Compiler

LR Parsing. Table Construction

2068 (I) Attempt all questions.

CS 314 Principles of Programming Languages

A programming language requires two major definitions A simple one pass compiler

Lexical and Syntax Analysis (2)

CS 406/534 Compiler Construction Parsing Part I

Principles of Programming Languages

CS415 Compilers. LR Parsing & Error Recovery

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Transcription:

Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33

Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol, productions Example: E E op E (E) E id op + / % NOTE: recall token vs. lexeme difference Derivation: starting from the start symbol, use productions to generate a string (sequence of tokens) Parse tree: pictorial representation of a derivation M. Mitra (ISI) Parsing 2 / 33

Ambiguity Reference: Section 4.2, 4.3. Left-most derivation: at each step, replace left-most non-terminal Ambiguous grammar: G is ambiguous if a string has > 1 left-most (or right-most) derivation ALT: G is ambiguous if > 1 parse tree can be constructed for a string Examples: 1. E E + E E E E 2. stmt if expr then stmt if expr then stmt else stmt [ SOLUTION: stmt matched unmatched matched if E then matched else matched unmatched if E then stmt if E then matched else unmatched ] M. Mitra (ISI) Parsing 3 / 33

Top-down parsing - I Reference: Section 4.4. Recursive descent parsing: Corresponds to finding a leftmost derivation for an input string Equivalent to constructing parse tree in pre-order Example: Grammar: S cad Input: cad A ab a Problems: 1. backtracking involved ( buffering of tokens required) 2. left recursion will lead to infinite looping 3. left factors may cause several backtracking steps M. Mitra (ISI) Parsing 4 / 33

Top-down parsing - I Reference: Section 4.3. Left recursion: G istb left recursive if for some non-terminal A, A + Aα Simple case I: A Aα β A βa A αa ϵ Simple case II: A Aα 1 Aα 2... Aα m β 1... β n A β 1 A... β n A A α 1 A α 2 A... α m A ϵ M. Mitra (ISI) Parsing 5 / 33

Top-down parsing - I General case: Input: G without any cycles (A + A) or ε-productions Output: equivalent non-recursive grammar Algorithm: Let the non-terminals be A 1,... A n. for i = 1 to n do for j = 1 to i 1 do replace A i A j γ by A i δ 1 γ δ 2 γ... δ k γ where A j δ 1 δ 2... δ k are the current A j productions end for Eliminate immediate left-recursion for A i. end for M. Mitra (ISI) Parsing 6 / 33

Top-down parsing - I Left factoring: Example: stmt if ( expr ) stmt Algorithm: if ( expr ) stmt else stmt while left factors exist do for each non-terminal A do Find longest prefix α common to 2 rules Replace A αβ 1... αβ n... by A αa... A β 1... β n end for end while M. Mitra (ISI) Parsing 7 / 33

Top-down parsing - II Reference: Section 4.4. Predictive parsing: recursive descent parsing without backtracking Principle: Given current input symbol and non-terminal, we should be able to determine which production is to be used Example: stmt if ( expr )... while... for... M. Mitra (ISI) Parsing 8 / 33

Top-down parsing - II Implementation: use transition diagrams (1 per non-terminal) A X 1 X 2... X n s X 1 X 2 X... n F 1. If X i is a terminal, match with next input token and advance to next state. 2. If X i is a non-terminal, go to the transition diagram for X i, and continue. On reaching the final state of that transition diagram, advance to next state of current transition diagram. Example: E E + T T T T F F F (E) id M. Mitra (ISI) Parsing 9 / 33

Top-down parsing - II Non-recursive implementation: a Input top X Stack TABLE Table: 2-d array s.t. M[A, a] specifies A-production to be used if input symbol is a Algorithm: 0. Initially: stack contains EOF S, input pointer is at start of input 1. if X = a = EOF, done 2. if X = a EOF, pop stack and advance input pointer 3. if X is non-terminal, lookup M[X, a] X UV W pop X, push W, V, U M. Mitra (ISI) Parsing 10 / 33

FIRST and FOLLOW FIRST(α): set of terminals that begin strings derived from α if α ϵ, then ϵ FIRST (α) FOLLOW (A): set of terminals that can appear immediately to the right of A in some sentential form FOLLOW (A) = {a S αaaβ} if A is the rightmost symbol in any sentential form, then EOF FOLLOW (A) M. Mitra (ISI) Parsing 11 / 33

FIRST and FOLLOW FIRST: 1. if X is a terminal, then FIRST (X) = {X} 2. if X ϵ is a production, then add ϵ to FIRST (X) 3. if X Y 1 Y 2... Y k is a production: if a FIRST (Y i ) and ϵ FIRST (Y 1 ),..., FIRST (Y i 1 ), add a to FIRST (X) if ϵ FIRST (Y i ) i, add ϵ to FIRST (X) FOLLOW: 1. Add EOF to FOLLOW (S) 2. For each production of the form A αbβ (a) add FIRST (β)\{ϵ} to FOLLOW (B) (b) if β = ϵ or ϵ FIRST (β), then add everything in FOLLOW (A) to FOLLOW (B) M. Mitra (ISI) Parsing 12 / 33

Table construction Algorithm: 1. For each production A α (a) for each terminal a FIRST (α), add A α to M[A, a] (b) if ϵ FIRST (α), add A α to M[A, b] for each terminal b FOLLOW (A) 2. Mark all other entries error Example: Grammar: E E + T T T T F F F (E) id Input: id + id * id M. Mitra (ISI) Parsing 13 / 33

LL(1) grammars If the table has no multiply defined entries, grammar istb LL(1) L - left-to-right L - leftmost 1 - lookahead If G is LL(1), then G cannot be left-recursive or ambiguous Example: S i E t S S a S e S ϵ E b M[S, e] = {S ϵ, S e S} Some non-ll(1) grammars may be transformed into equivalent LL(1) grammars M. Mitra (ISI) Parsing 14 / 33

LL(1) parsing Error recovery: 1. if X is a terminal, but X a, pop X 2. if M[X, a] is blank, skip a 3. if M[X, a] = synch, pop X, but do not advance input pointer Synch sets: use FOLLOW (A) add the FIRST set of a higher-level non-terminal to the synch set of a lower-level non-terminal M. Mitra (ISI) Parsing 15 / 33

Bottom-up parsing Reference: Section 4.5. Example: Grammar: E E + T T Input: id + id * id T T F F F (E) id Sentential form: any string α s.t. S α Handle: for a sentential form γ, handle is a production A β and a position of β in γ, s.t. β may be replaced by A to produce the previous right-sentential form in a rightmost derivation of γ Properties: 1. string to right of handle must consist of terminals only 2. if G is unambiguous, every right-sentential form has a unique handle Advantages: 1. No backtracking 2. More powerful than LL(1) / predictive parsing M. Mitra (ISI) Parsing 16 / 33

Bottom-up parsing Reference: Section 4.7. Implementation scheme: 0. Use input buffer, stack, and parsing table. 1. Shift 0 input symbols onto stack until a handle β is on top of stack. 2. Reduce β to A (i.e. pop symbols of β and push A). 3. Stop when stack = EOF, S, and input pointer is at EOF. Stack: s o X 1 s 1... X m s m, where each s i represents a state (current situation in the parsing process) Table: used to guide steps 2 and 3 2-d array indexed by state, input symbol pairs consists of two parts (action + goto) M. Mitra (ISI) Parsing 17 / 33

Bottom-up parsing Algorithm: 1. Initially, stack = s 0 (initial state) 2. Let s - state on top of stack a - current input symbol if action[s, a] = shift s push a, s on stack, advance input pointer if action[s, a] = reduce A β pop 2 β symbols let s be the new top of stack push A, goto[s, A] on stack if action[s, a] = accept, done else error M. Mitra (ISI) Parsing 18 / 33

SLR parsing Grammar augmentation: Create new start symbol S ; add S S to productions Item (LR(0) item): production of G with a dot at some position in the RHS, representing how much of the RHS has already been seen at a given point in the parse Example: A ϵ A Closure: Let I be a set of items closure(i) I repeat until no more changes for each A α Bβ in closure(i) for each production B γ s.t. B γ closure(i) add B γ to closure(i) Example: closure(e E) M. Mitra (ISI) Parsing 19 / 33

SLR parsing Goto construction: goto(i, X) = closure( {A αx β A α Xβ I} ) Example: Let I = {E E, E E +T } goto(i, +) = closure({e E + T }) Canonical collection construction: 1. C {closure({s S})} 2. repeat until no more changes: for each I C, for each grammar symbol X if goto(i, X) is not empty and not in C add goto(i, X) to C M. Mitra (ISI) Parsing 20 / 33

SLR parsing Table construction: 1. Let C = {I 0,..., I n } be the canonical collection of LR(0) items for G. 2. Create a state s i corresponding to each I i. The set containing S S corresponds to the initial state. 3. If A α aβ I i and goto(i i, a) = I j, then action(s i, a) = shift s j. 4. If A α I i (A S ), then action(s i, a) = reduce A α for all a FOLLOW (A). 5. If S S I i, then action(s i, EOF) = accept. 6. If goto(i i, a) = I j, then goto(s i, a) = s j. 7. Mark all blank entries error. M. Mitra (ISI) Parsing 21 / 33

Conflicts 1. Shift-reduce conflict: stmt if ( expr ) stmt if ( expr ) stmt else stmt 2. Reduce-reduce conflict: stmt id ( param list ) ; Example: expr id ( expr list ) param id expr id Grammar: S L = R S R L R L id R L Canonical collection: I 0 = {S S,...} I 2 = {S L = R, R L }... Table: action(2, =) = shift... action(2, =) = reduce... NOTE: SLR(1) grammars are unambiguous, but not vice versa.. M. Mitra (ISI) Parsing 22 / 33

Canonical LR parsers Motivation: Reduction by A α not necessarily proper even if a FOLLOW (A) explicitly indicate tokens for which reduction is acceptable LR(1) item: pair of the form A α β, a, where A αβ is a production, a is a terminal or EOF Properties: 1. A α β, a - lookahead has no effect A α, a - reduce only if input symbol is a 2. {a A α, a canonical collection} FOLLOW (A) M. Mitra (ISI) Parsing 23 / 33

Canonical LR parsers Closure: Let I be a set of items closure(i) I repeat until no more changes for each item A α Bβ, a in closure(i) for each production B γ and each terminal b FIRST (βa) if B γ, b closure(i) add B γ, b to closure(i) Goto: goto(i, X) = closure({ A αx β, a A α Xβ, a I}) Canonical collection construction: 1. C {closure({ S S, EOF })} 2. /* Similar to SLR algorithm */ M. Mitra (ISI) Parsing 24 / 33

Canonical LR parsers Table construction: 1. If A α aβ, b I i and goto(i i, a) = I j then action(i, a) = shift j. 2. If A α, b I i, then action(i, b) = reduce A α (A S ). If S S, EOF I i, then action(i, EOF) = accept. M. Mitra (ISI) Parsing 25 / 33

LALR parsers Motivation: try to combine efficiency of SLR parser with power of canonical method Core: set of LR(0) items corresponding to a set of LR(1) items Method: 1. Construct canonical collection of LR(1) items, C = {I 0,..., I n }. 2. Merge all sets with the same core. Let the new collection be C = {J 0,..., J m }. 3. Construct the action table as before. 4. If J = I 1... I k, then goto(j, X) = union of all sets with the same core as goto(i 1, X). M. Mitra (ISI) Parsing 26 / 33

SLR vs LR(1) vs LALR No. of states: SLR = LALR <= LR(1) Power: SLR < LALR < LR(1) (cf. Pascal) SLR vs. LALR: LALR items can be regarded as SLR items, with the core augmented by appropriate subsets of FOLLOW (A) explicitly specified LALR vs. LR(1): 1. If there were no shift-reduce conflicts in the LR(1) table, there will be no shift-reduce conflicts in the LALR table. 2. Step 2 may generate reduce-reduce conflicts. Example: I 1 = { A α, a, B β, b } I 2 = { A α, b, B β, a } 3. Correct inputs: LALR parser mimics LR(1) parser Incorrect inputs: incorrect reductions may occur on a lookahead a parser goes back to a state I i in which A has just been recognized. But a cannot follow A in this state error M. Mitra (ISI) Parsing 27 / 33

Error recovery Reference: Section 4.8. Detection: Canonical LR - errors are immediately detected (no unnecessary shift/reduce actions) SLR/LALR - no symbols are shifted onto stack, but reductions may occur before error is detected Panic mode recovery: 1. Scan down the stack until a state s with a goto on a significant non-terminal A (e.g. expr, stmt, etc.) is found. 2. Discard input until a symbol a which can follow A is found. 3. Push A, goto(s, A) and resume parsing. Expln: s α Aaβ α γ }{{} aβ location of error M. Mitra (ISI) Parsing 28 / 33

Error recovery Phrase-level error recovery: recovery by local correction on remaining input e.g. insertion, deletion, substitution, etc. Scheme: 1. Consider each blank entry, and decide what the error is likely to be. 2. Call appropriate recovery method should consume input (to avoid infinite loop) avoid popping a significant non-terminal from stack Examples: State: E E Input: + or Action: push id, goto appropriate state Message: missing operand State: E E + T Input: ) Action: skip ) from input Message: extra ) M. Mitra (ISI) Parsing 29 / 33

Yacc Reference: Section 4.9. Usage: $ yacc myfile.y (generates y.tab.c) File format: declarations %% grammar rules (terminals, non-terminals, start symbol) %% auxiliary procedures (C functions) Semantic actions: $$ - attribute value associated with LHS non-terminal $i - attribute value associated with i-th grammar symbol on RHS specified action is executed on reduction by corresponding production default action: $$ = $1 M. Mitra (ISI) Parsing 30 / 33

M. Mitra (ISI) Parsing 31 / 33

Yacc Lexical analyzer: yylex() must be provided should return integer code for a token should set yylval to attribute value Usage with lex: % lex scanner.l % yacc parser.y % cc y.tab.c -ly -ll Add #include "lex.yy.c" to third section of file Declared tokens can be used as return values in scanner.l M. Mitra (ISI) Parsing 32 / 33

Yacc Implicit conflict resolution: 1. Shift-reduce: choose shift 2. Reduce-reduce: choose the production listed earlier Explicit conflict resolution: Precedence: tokens are assigned precedence according to the order in which they are declared (lowest first) Associativity: left, right, or nonassoc Precedence/assoc. of a production = precedence/assoc. of rightmost terminal or explicitly specified using %prec Given A α, a: if precedence of production is higher, reduce if precedence of production is same, and associativity of production is left, reduce else shift M. Mitra (ISI) Parsing 33 / 33