Context-free grammars

Similar documents
Compiler Construction: Parsing

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Compiler Construction 2016/2017 Syntax Analysis

S Y N T A X A N A L Y S I S LR

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Principles of Programming Languages

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Compilers. Bottom-up Parsing. (original slides by Sam

Concepts Introduced in Chapter 4

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

UNIT-III BOTTOM-UP PARSING


LR Parsing Techniques

Syntax Analyzer --- Parser


SLR parsers. LR(0) items

Lexical and Syntax Analysis. Bottom-Up Parsing

Monday, September 13, Parsers

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Wednesday, August 31, Parsers

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

LALR Parsing. What Yacc and most compilers employ.

LR Parsers. Aditi Raste, CCOEW

4. Lexical and Syntax Analysis

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

4. Lexical and Syntax Analysis

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Bottom-Up Parsing. Lecture 11-12

UNIT III & IV. Bottom up parsing

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

In One Slide. Outline. LR Parsing. Table Construction

Using an LALR(1) Parser Generator

CA Compiler Construction

3. Parsing. Oscar Nierstrasz

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 3, vrijdag 22 september 2017

Syntactic Analysis. Top-Down Parsing

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

LR Parsing Techniques

Lecture 7: Deterministic Bottom-Up Parsing

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

LR Parsing LALR Parser Generators

Lecture 8: Deterministic Bottom-Up Parsing

Chapter 4. Lexical and Syntax Analysis

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Chapter 4: LR Parsing

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

VIVA QUESTIONS WITH ANSWERS

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Lecture Bottom-Up Parsing

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

CS 4120 Introduction to Compilers

Syn S t yn a t x a Ana x lysi y s si 1

Bottom-Up Parsing. Lecture 11-12

How do LL(1) Parsers Build Syntax Trees?

LR Parsing LALR Parser Generators

MODULE 14 SLR PARSER LR(0) ITEMS

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Syntactic Analysis. Chapter 4. Compiler Construction Syntactic Analysis 1

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Syntax-Directed Translation

Syntax Analysis Part I

Types of parsing. CMSC 430 Lecture 4, Page 1

Conflicts in LR Parsing and More LR Parsing Types

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

2068 (I) Attempt all questions.

Syntax Analysis. Martin Sulzmann. Martin Sulzmann Syntax Analysis 1 / 38

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

CS 406/534 Compiler Construction Parsing Part I

CSE302: Compiler Design

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018

shift-reduce parsing

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

General Overview of Compiler

Fall Compiler Principles Lecture 4: Parsing part 3. Roman Manevich Ben-Gurion University of the Negev

LR Parsing. Table Construction

A programming language requires two major definitions A simple one pass compiler

CS 314 Principles of Programming Languages

Lexical and Syntax Analysis (2)

Principles of Programming Languages

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

CS415 Compilers. LR Parsing & Error Recovery

Transcription:

Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol, productions Example: E E op E (E) E id op + / % NOTE: recall token vs. lexeme difference Derivation: starting from the start symbol, use productions to generate a string (sequence of tokens) Parse tree: pictorial representation of a derivation Compiler Construction: Parsing p. 1/31

Ambiguity Left-most derivation: at each step, replace left-most non-terminal Section 4.2, 4.3 Ambiguous grammar: G is ambiguous if a string has > 1 left-most (or right-most) derivation ALT: G is ambiguous if > 1 parse tree can be constructed for a string Examples: 1. E E + E E E E 2. stmt if expr then stmt if expr then stmt else stmt [ SOLUTION: stmt matched unmatched matched if E then matched else matched unmatched if E then stmt if E then matched else unmatched ] Compiler Construction: Parsing p. 2/31

Top-down parsing - I Section 4.4 Recursive descent parsing: corresponds to finding a leftmost derivation for an input string Equivalent to constructing parse tree in pre-order Example: Grammar: S cad A ab a Input: cad Problems: 1. backtracking involved ( buffering of tokens required) 2. left recursion will lead to infinite looping 3. left factors may cause several backtracking steps Compiler Construction: Parsing p. 3/31

Top-down parsing - I Left recursion: G istb left recursive if for some non-terminal A, A + Aα Simple case I: A Aα β A βa A αa ɛ Section 4.3 Simple case II: A Aα 1 Aα 2... Aα m β 1... β n A β 1 A... β n A A α 1 A α 2 A... α m A ɛ Compiler Construction: Parsing p. 4/31

Top-down parsing - I General case: Input: G without any cycles (A + A) or ε-productions Output: equivalent non-recursive grammar Algorithm: Let the non-terminals be A 1,... A n. for i = 1 to n do for j = 1 to i 1 do replace A i A j γ by A i δ 1 γ δ 2 γ... δ k γ where A j δ 1 δ 2... δ k are the current A j productions end for Eliminate immediate left-recursion for A i. end for Compiler Construction: Parsing p. 5/31

Top-down parsing - I Left factoring: Example: stmt if ( expr ) stmt if ( expr ) stmt else stmt Algorithm: while left factors exist do for each non-terminal A do Find longest prefix α common to 2 rules Replace A αβ 1... αβ n... by A αa... A β 1... β n end for end while Compiler Construction: Parsing p. 6/31

Top-down parsing - II Section 4.4 Predictive parsing: recursive descent parsing without backtracking Principle: Given current input symbol and non-terminal, we should be able to determine which production is to be used Example: stmt if ( expr )... while... for... Compiler Construction: Parsing p. 7/31

Top-down parsing - II Implementation: use transition diagrams (1 per non-terminal) A X 1 X 2... X n s X 1 X 2 X n... F 1. If X i is a terminal, match with next input token and advance to next state. 2. If X i is a non-terminal, go to the transition diagram for X i, and continue. On reaching the final state of that transition diagram, advance to next state of current transition diagram. Example: E E + T T T T F F F (E) id Compiler Construction: Parsing p. 8/31

Top-down parsing - II Non-recursive implementation: a Input top X Table: 2-d array s.t. TABLE Stack Algorithm: M[A, a] specifies A- production to be used if input symbol is a 0. Initially: stack contains EOF S, input pointer is at start of input 1. if X = a = EOF, done 2. if X = a EOF, pop stack and advance input pointer 3. if X is non-terminal, lookup M[X, a] X UV W pop X, push W, V, U Compiler Construction: Parsing p. 9/31

FIRST and FOLLOW FIRST(α): set of terminals that begin strings derived from α if α ɛ, then ɛ FIRST (α) FOLLOW (A): set of terminals that can appear immediately to the right of A in some sentential form FOLLOW (A) = {a S αaaβ} if A is the rightmost symbol in any sentential form, then EOF FOLLOW (A) Compiler Construction: Parsing p. 10/31

FIRST and FOLLOW FIRST: 1. if X is a terminal, then FIRST (X) = {X} 2. if X ɛ is a production, then add ɛ to FIRST (X) 3. if X Y 1 Y 2... Y k is a production: if a FIRST (Y i ) and ɛ FIRST (Y 1 ),..., FIRST (Y i 1 ), add a to FIRST (X) if ɛ FIRST (Y i ) i, add ɛ to FIRST (X) FOLLOW: 1. Add EOF to FOLLOW (S) 2. For each production of the form A αbβ (i) add FIRST (β)\{ɛ} to FOLLOW (B) (ii) if β = ɛ or ɛ FIRST (β), then add everything in FOLLOW (A) to FOLLOW (B) Compiler Construction: Parsing p. 11/31

Algorithm: Table construction 1. For each production A α (i) for each terminal a FIRST (α), add A α to M[A, a] (ii) if ɛ FIRST (α), add A α to M[A, b] for each terminal b FOLLOW (A) 2. Mark all other entries error Example: Grammar: E E + T T T T F F F (E) id Input: id + id * id Compiler Construction: Parsing p. 12/31

L L(1) grammars If the table has no multiply defined entries, grammar istb LL(1) L - left-to-right L - leftmost 1 - lookahead If G is LL(1), then G cannot be left-recursive or ambiguous Example: S i E t S S a S e S ɛ E b M[S, e] = {S ɛ, S e S} Some non-ll(1) grammars may be transformed into equivalent LL(1) grammars Compiler Construction: Parsing p. 13/31

L L(1) parsing Error recovery: 1. if X is a terminal, but X a, pop X 2. if M[X, a] is blank, skip a 3. if M[X, a] = synch, pop X, but do not advance input pointer Synch sets: use FOLLOW (A) add the FIRST set of a higher-level non-terminal to the synch set of a lower-level non-terminal Compiler Construction: Parsing p. 14/31

Bottom-up parsing Example: Grammar: E E + T T T T F F F (E) id Input: id + id * id Section 4.5 Sentential form: any string α s.t. S α Handle: for a sentential form γ, handle is a production A β and a position of β in γ, s.t. β may be replaced by A to produce the previous right-sentential form in a rightmost derivation of γ Properties: 1. string to right of handle must consist of terminals only 2. if G is unambiguous, every right-sentential form has a unique handle Advantages: 1. No backtracking 2. More powerful than LL(1) / predictive parsing Compiler Construction: Parsing p. 15/31

Bottom-up parsing Section 4.7 Implementation scheme: 0. Use input buffer, stack, and parsing table. 1. Shift 0 input symbols onto stack until a handle β is on top of stack. 2. Reduce β to A (i.e. pop symbols of β and push A). 3. Stop when stack = EOF, S, and input pointer is at EOF. Stack: s o X 1 s 1... X m s m, where each s i represents a state (current situation in the parsing process) Table: used to guide steps 2 and 3 2-d array indexed by state, input symbol pairs consists of two parts (action + goto) Compiler Construction: Parsing p. 16/31

Bottom-up parsing Algorithm: 1. Initially, stack = s 0 (initial state) 2. Let s - state on top of stack a - current input symbol if action[s, a] = shift s push a, s on stack, advance input pointer if action[s, a] = reduce A β pop 2 β symbols let s be the new top of stack push A, goto[s, A] on stack if action[s, a] = accept, done else error Compiler Construction: Parsing p. 17/31

SLR parsing Grammar augmentation: Create new start symbol S ; add S S to productions Item (LR(0) item): production of G with a dot at some position in the RHS, representing how much of the RHS has already been seen at a given point in the parse Example: A ɛ A Closure: Let I be a set of items closure(i) I repeat until no more changes for each A α Bβ in closure(i) for each production B γ s.t. B γ closure(i) add B γ to closure(i) Example: closure(e E) Compiler Construction: Parsing p. 18/31

SLR parsing Goto construction: goto(i, X) = closure( {A αx β A α Xβ I} ) Example: Let I = {E E, E E +T } goto(i, +) = closure({e E + T }) Canonical collection construction: 1. C {closure({s S})} 2. repeat until no more changes: for each I C, for each grammar symbol X if goto(i, X) is not empty and not in C add goto(i, X) to C Compiler Construction: Parsing p. 19/31

SLR parsing Table construction: 1. Let C = {I 0,..., I n } be the canonical collection of LR(0) items for G. 2. Create a state s i corresponding to each I i. The set containing S S corresponds to the initial state. 3. If A α aβ I i and goto(i i, a) = I j, then action(s i, a) = shift s j. 4. If A α I i (A S ), then action(s i, a) = reduce A α for all a FOLLOW (A). 5. If S S I i, then action(s i, EOF) = accept. 6. If goto(i i, a) = I j, then goto(s i, a) = s j. 7. Mark all blank entries error. Compiler Construction: Parsing p. 20/31

Conflicts 1. Shift-reduce conflict: stmt if ( expr ) stmt if ( expr ) stmt else stmt 2. Reduce-reduce conflict: stmt id ( param_list ) ; expr id ( expr_list ). param id expr id Example: Grammar: S L = R S R L R L id R L Canonical collection: I 0 = {S S,...} I 2 = {S L = R, R L }... Table: action(2, =) = shift... action(2, =) = reduce... NOTE: SLR(1) grammars are unambiguous, but not vice versa. Compiler Construction: Parsing p. 21/31

Canonical LR parsers Motivation: Reduction by A α not necessarily proper even if a FOLLOW (A) explicitly indicate tokens for which reduction is acceptable LR(1) item: pair of the form A α β, a, where A αβ is a production, a is a terminal or EOF Properties: 1. A α β, a - lookahead has no effect A α, a - reduce only if input symbol is a 2. {a A α, a canonical collection} FOLLOW (A) Compiler Construction: Parsing p. 22/31

Canonical LR parsers Closure: Let I be a set of items closure(i) I repeat until no more changes for each item A α Bβ, a in closure(i) for each production B γ and each terminal b FIRST (βa) if B γ, b closure(i) add B γ, b to closure(i) Goto: goto(i, X) = closure({ A αx β, a A α Xβ, a I}) Canonical collection construction: 1. C {closure({ S S, EOF })} 2. /* Similar to SLR algorithm */ Compiler Construction: Parsing p. 23/31

Canonical LR parsers Table construction: 1. If A α aβ, b I i and goto(i i, a) = I j then action(i, a) = shift j. 2. If A α, b I i, then action(i, b) = reduce A α (A S ). If S S, EOF I i, then action(i, EOF) = accept. Compiler Construction: Parsing p. 24/31

LALR parsers Motivation: try to combine efficiency of SLR parser with power of canonical method Core: set of LR(0) items corresponding to a set of LR(1) items Method: 1. Construct canonical collection of LR(1) items, C = {I 0,..., I n }. 2. Merge all sets with the same core. Let the new collection be C = {J 0,..., J m }. 3. Construct the action table as before. 4. If J = I 1... I k, then goto(j, X) = union of all sets with the same core as goto(i 1, X). Compiler Construction: Parsing p. 25/31

SLR vs LR(1) vs LALR No. of states: SLR = LALR <= LR(1) Power: SLR < LALR < LR(1) (cf. Pascal) SLR vs. LALR: LALR items can be regarded as SLR items, with the core augmented by appropriate subsets of FOLLOW (A) explicitly specified LALR vs. LR(1): 1. If there were no shift-reduce conflicts in the LR(1) table, there will be no shift-reduce conflicts in the LALR table. 2. Step 2 may generate reduce-reduce conflicts. Example: I 1 = { A α, a, B β, b } I 2 = { A α, b, B β, a } 3. Correct inputs: LALR parser mimics LR(1) parser Incorrect inputs: incorrect reductions may occur on a lookahead a parser goes back to a state I i in which A has just been recognized. But a cannot follow A in this state error Compiler Construction: Parsing p. 26/31

Error recovery Detection: Canonical LR - errors are immediately detected (no unnecessary shift/reduce actions) SLR/LALR - no symbols are shifted onto stack, but reductions may occur before error is detected Panic mode recovery: 1. Scan down the stack until a state s with a goto on a significant non-terminal A (e.g. expr, stmt, etc.) is found. 2. Discard input until a symbol a which can follow A is found. 3. Push A, goto(s, A) and resume parsing. Expln: s α Aaβ α γ }{{} aβ location of error Compiler Construction: Parsing p. 27/31

Error recovery Phrase-level error recovery: recovery by local correction on remaining input e.g. insertion, deletion, substitution, etc. Scheme: 1. Consider each blank entry, and decide what the error is likely to be. 2. Call appropriate recovery method should consume input (to avoid infinite loop) avoid popping a significant non-terminal from stack Examples: State: E E Input: + or Action: push id, goto appropriate state Message: missing operand State: E E + T Input: ) Action: skip ) from input Message: extra ) Compiler Construction: Parsing p. 28/31

Usage: $ yacc myfile.y (generates y.tab.c) File format: declarations %% grammar rules (terminals, non-terminals, start symbol) %% auxiliary procedures (C functions) Semantic actions: $$ - attribute value associated with LHS non-terminal $i - attribute value associated with i-th grammar symbol on RHS specified action is executed on reduction by corresponding production default action: $$ = $1 Yacc Compiler Construction: Parsing p. 29/31

29-1

Yacc Lexical analyzer: yylex() must be provided should return integer code for a token should set yylval to attribute value Usage with lex: % lex scanner.l % yacc parser.y % cc y.tab.c -ly -ll Add #include "lex.yy.c" to third section of file Declared tokens can be used as return values in scanner.l Compiler Construction: Parsing p. 30/31

Yacc Implicit conflict resolution: 1. Shift-reduce: choose shift 2. Reduce-reduce: choose the production listed earlier Explicit conflict resolution: Precedence: tokens are assigned precedence according to the order in which they are declared (lowest first) Associativity: left, right, or nonassoc Precedence/assoc. of a production = precedence/assoc. of rightmost terminal or explicitly specified using %prec Given A α, a: if precedence of production is higher, reduce if precedence of production is same, and associativity of production is left, reduce else shift Compiler Construction: Parsing p. 31/31