LR Parsing Techniques

Similar documents
LR Parsing Techniques

UNIT-III BOTTOM-UP PARSING

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

LALR Parsing. What Yacc and most compilers employ.

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

S Y N T A X A N A L Y S I S LR

MODULE 14 SLR PARSER LR(0) ITEMS

Principles of Programming Languages

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

CS308 Compiler Principles Syntax Analyzer Li Jiang

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

UNIT III & IV. Bottom up parsing

Context-free grammars

Compiler Construction 2016/2017 Syntax Analysis

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Compiler Construction: Parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Compilers. Bottom-up Parsing. (original slides by Sam

Downloaded from Page 1. LR Parsing

Lexical and Syntax Analysis. Bottom-Up Parsing

Syn S t yn a t x a Ana x lysi y s si 1

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.


Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.


LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Monday, September 13, Parsers

Wednesday, August 31, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Lecture Bottom-Up Parsing

Bottom-Up Parsing II. Lecture 8

LR Parsers. Aditi Raste, CCOEW

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

Concepts Introduced in Chapter 4

Bottom-Up Parsing. Lecture 11-12

LR Parsing LALR Parser Generators

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Chapter 4: LR Parsing

Table-Driven Parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

LR Parsing LALR Parser Generators

Lecture 7: Deterministic Bottom-Up Parsing

Lecture 8: Deterministic Bottom-Up Parsing

CS 4120 Introduction to Compilers

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Syntax Analysis Part I

4. Lexical and Syntax Analysis

VIVA QUESTIONS WITH ANSWERS

Syntax Analyzer --- Parser

Chapter 4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

Bottom-Up Parsing. Lecture 11-12

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Bottom-Up Parsing LR Parsing

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

In One Slide. Outline. LR Parsing. Table Construction

CS606- compiler instruction Solved MCQS From Midterm Papers

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Bottom-up Parser. Jungsik Choi

shift-reduce parsing

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

SLR parsers. LR(0) items

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

How do LL(1) Parsers Build Syntax Trees?

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Top down vs. bottom up parsing

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Conflicts in LR Parsing and More LR Parsing Types

3. Parsing. Oscar Nierstrasz

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Transcription:

LR Parsing Techniques Introduction Bottom-Up Parsing LR Parsing as Handle Pruning Shift-Reduce Parser LR(k) Parsing Model Parsing Table Construction: SLR, LR, LALR 1

Bottom-UP Parsing A bottom-up parser attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top). 2

Bottom-Up Parsing Construct a parse tree from the leaves to the root using rightmost derivation in reverse a b b c S a A B e A A b c b B d d e a A b b c d e a A b A b c d e input: abbcde a A b A b c B d e abbcde rm aabcde rm aade rm aabe rm S a A b A b S c B d e 3

Example of Bottom-Up Parsing Let G = S aabe A Abc b B d The sentence abbcde can be reduced to S according to the following steps: abbcde aabcde aade aabe S The above reductions trace out the following right-most derivation in reverse: S aabe aade aabcde abbcde rm rm rm rm 4

Right-most Derivation in Reverse E 1 E 5 E 3 E 2 E 3 E 4 E 5 E 4 E 1 E 2 id1 + id2 * id3 id1 + id2 * id3 5

LR Parsing The L stands for scanning the input from left to right The R stands for constructing a rightmost derivation in reverse 6

LR Parsing LR Parsing =/= Leftmost Reduction The 1 st reducible substring does not always result in successful parse Handle(s): those successfully lead to S Top-Down: Expansion Matching Bottom-Up: Locating next handle [How To??] Handle pruning 7

Handles NOT all (leftmost) reduction (A ) leads to the start symbol S: rm A rm (n) rm S Only some handles do A handle of a right-sentential form consists of a production A a position of where can be replaced by A to produce the previous right-sentential form in a rightmost derivation of r-sent. form: abbcde rm aabcde rm aade rm aabe rm S Handles: A b A A b c B d S a A B e 8

If S * A rm rm Handles, then A in the position following is a handle of. The string contains only terminal symbols. We say a handle rather than the handle since the grammar may be ambiguous. But if the grammar is unambiguous, then every right sentential form has exactly one handle. 9

Handles Informally, a handle of a string is substring that matches the RHS of a production, and whose reduction to the nonterminal on the LHS of the production represents one step along the reverse of a rightmost derivation. E.g., A b (after ab ) in previous example is not a handle. Formally, a handle of a right-sentential form (canonical sentential form) is a production A and a position of where the string may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of. rm rm If A, then A in the position following is a handle of. The string contains only terminal symbols. We say a handle rather than the handle since the grammar may be ambiguous. But if the grammar is unambiguous, then every right sentential form has exactly one handle. S * 10

Example of Handles Let G = S aabe A Abc b B d The sentence abbcde can be reduced to S according to the following steps: abbcde abbcde is a right-sentential form whose handle is A b at position 2. aabcde aabcde is a right-sentential form whose handle is A Abc at position 2. aade aade is a right-sentential form whose handle is B d at position 3. aabe aabe is a right-sentential form whose handle is S aabe at position 1. S The above reductions trace out the following right-most derivation in reverse: S aabe aade aabcde abbcde rm rm rm rm 11

LR Parsing as Handle Pruning rm A rm S S A The string to the right of the handle contains only terminals A is the leftmost complete interior node with all its children in the tree 12

An Example S S S A B A B A B A A a b b c d e a b c d e a d e A S B S a e 13

LR Parsing as Handle Pruning A rightmost derivation in reverse can be obtained by handle pruning. Let G = E E+E E*E (E) id (ambiguous!) rm Right-sentential Handle Reducing form production id 1 +id 2 *id 3 id 1 E id E+id 2 *id 3 id 2 E id E+E*id 3 id 3 E id E+E*E E*E E E*E E+E E+E E E+E E 14

LR Parsing as Handle Pruning (alternative reduction sequence) A rightmost derivation in reverse can be obtained by handle pruning. Let G = E E+E E*E (E) id (ambiguous!) Right-sentential Handle Reducing form production id 1 +id 2 *id 3 id 1 E id E+id 2 *id 3 id 2 E id E+E*id 3 E+E E E+E E*id 3 id 3 E id E*E E*E E E*E E 15

Bottom-Up Shift/Reduce Parsing A bottom-up parser can be implemented as a shiftreduce parser. Input tokens are shifted onto the stack until the top of the stack contains a handle of the sentential form. The handle is reduced by replacing it on the parse stack with the nonterminal that is its parent in the parse tree. A handle is a sequence of symbols that match some RHS of a production and may be correctly replaced with LHS (whose reduction leads to the start symbol). It can be proved that the handles will always appear on the top of (and never appear within) the stack 16

Shift-Reduce Parsing rm A rm S Input Handle Parsing program Output Parsing table Stack 17

Stack Implementation of Shift-Reduce Parsers A convenient way to implement a shift-reduce parse is to use a stack to hold grammar symbols and an input buffer to hold the string to be parsed. a push-down machine with a tape The parser operates by shifting zero or more symbols onto the stack until a handle is on top of the stack. The parser then replaces/reduces with/to the left side of the appropriate production. This procedure repeats until the stack contains the start symbol and the input is empty. 18

Stack Operations Shift: shift the next input symbol onto the top of the stack Reduce: replace the handle at the top of the stack with the corresponding nonterminal Accept: announce successful completion of the parsing Error: call an error recovery routine 19

Shift-Reduce Parsing Handle pruning with a stack 4 actions: Shift: the next input symbol onto the stack. Reduce (assume that a handle is at the top of the stack): pop the handle symbols from the stack and push the leftpart of the production rule A. halt and accept. halt and declare error. 20

An Example Action Stack Input S $ a b b c d e $ S $ a b b c d e $ R $ a b b c d e $ S $ a A b c d e $ S $ a A b c d e $ R $ a A b c d e $ S $ a A d e $ R $ a A d e $ S $ a A B e $ R $ a A B e $ A $ S $ 21

Configurations of shift-reduce parser on inputid 1 +id 2 *id 3 Step Stack Input Action 1 $ id 1 +id 2 *id 3 $ shift 2 $id 1 +id 2 *id 3 $ reduce by E id 3 $E +id 2 *id 3 $ shift 4 $E+ id 2 *id 3 $ shift 5 $E+id 2 *id 3 $ reduce by E id 6 $E+E *id 3 $ shift 7 $E+E* id 3 $ shift 8 $E+E*id 3 $ reduce by E id 9 $E+E*E $ reduce by E E*E 10 $E+E $ reduced by E E+E 11 $E $ accept 22

Sources of Conflicts When trying to reduce a sub-string of the current sentential form: Not all reducible substrings are handles Ambiguous: More than one substring as a handle Sources of Conflicts non-lr Grammar Shift-reduce conflicts Reduce-reduce conflicts 23

Shift/Reduce Conflict stmt if expr then stmt if expr then stmt else stmt other Stack Input $ - - - if expr then stmt * else - - - $ Shift if expr then stmt else stmt Reduce if expr then stmt 24

Reduce/Reduce Conflict (1) stmt id ( para_list ) // func(a,b) (2) stmt expr := expr (3) para_list para_list, para (4) para_list para (5) para id (6) expr id ( expr_list ) // array(a,b) (7) expr id (8) expr_list expr_list, expr (9) expr_list expr -Need a complex lexical analyzer to identify id vs. procid - Reduction depends on stack[sp-2] Stack Input (a) $ - - - id ( id, id ) - - - $ [Q: r5? r7?] [Sol: use stmt procid ( para_list ) => (a) r7 (b) r5] (b) $- - - procid ( id, id ) - - - $ [r5] 25

LR(k) Grammars Only some classes of grammars, known as the LR(k) Grammars, can be parsed deterministically by a shift-reduce parser CFG s that are non-lr may need some adaptation to make them deterministically parsed with a shift-reduce parser Parsing Table Construction Predict handles at each positions (after shifts) 26

LR(k) Parsing The L stands for scanning the input from left to right The R stands for constructing a rightmost derivation in reverse The k stands for the number of lookahead input symbols used to make parsing decisions 27

LR Parsing The LR parsing algorithm Constructing SLR(1) parsing tables Constructing LR(1) parsing tables Constructing LALR(1) parsing tables 28

Model of an LR Parser Input Stack S m X m S m-1 LR Parsing Program Output X m-1 Action Goto S 0 Parsing table 29

Parsing Table for Expression Grammar (0) E E (1) E E + T (2) E T (3) T T * F (4) T F (5) F ( E ) (6) F id Follow(E)={+,),$} Follow(T)={+,),$,*} Follow(F)={+,),$,*} State Action Goto id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 30

LR Parsing Algorithm Input: An input string and an LR parsing table with functions action and goto for a grammar G. Output: If is in L(G), a bottom-up parse for ; otherwise, an error indication. Method: Initially, the parser has s 0 on its stack, where s 0 is the initial state, and $ in the input buffer. Shift/reduce according to the parsing table (See next Page) 31

LR Parsing Program while (1) do { s := the state of top of the stack; a := get input token; if (action[s,a] == shift s ) { push a then s on top of the stack; a = get input token; } else if (action[s,a] == reduce A-> ) { pop 2* symbols off the stack; s = the state now on top of the stack; push A then goto[s,a]on top of the stack; output the production A-> ; } else if (action[s,a] == accept) return; else error(); } 32

Stack Input LR Parsing onid 1 *id 2 +id 3 shift/reduce+goto Action (1) 0 id * id + id $ (0,id):s5 Shift (2) 0 id 5 * id + id $ (5,*):r6; (0,F):3 Reduce by F id (3) 0 F 3 * id + id $ (3,*):r4; (0,T):2 Reduce by T F (4) 0 T 2 * id + id $ (2,*):s7 Shift (5) 0 T 2 * 7 id + id $ (7,id):s5 Shift (6) 0 T 2 * 7 id 5 + id $ (5,+):r6; (7,F):10 Reduce by F id (7) 0 T 2 * 7 F 10 + id $ (10,+):r3; (0,T):2 Reduce by T T*F (8) 0 T 2 + id $ (2,+):r2; (0,E):1 Reduce by E T (9) 0 E 1 + id $ (1,+):s6 Shift (10) 0 E 1 + 6 id $ (6,id):s5 Shift (11) 0 E 1 + 6 id 5 $ (5,$):r6; (6,F):3 Reduce by F id (12) 0 E 1 + 6 F 3 $ (3,$):r4; (6,T):9 Reduce by T F (13) (14) 0 E 1 + 6 T 9 0 E 1 $ $ (9,$):r1; (0,E):1 (1,$):acc Reduce by E E+T Accept 33

LR Parsing Advantages Efficient: non-backtracking Efficient Parsing Efficient Error detection (& correction) Coverage: Detect syntax error as soon as one appear during L-o-R scan virtually all programming languages G(LR) > G(TD predictive parsing) Disadvantages: Too much work to construct by hands ( YACC) 34

How To: LR Parsing (repeated) LR Parsing =/= Leftmost Reduction The 1 st reducible substring does not always result in successful parse Handle(s): those successfully lead to S Top-Down: Expansion Matching Bottom-Up: Locating next handle [How To??] Handle pruning 35

LR Parsing Table Construction Techniques Parsing Table Construction: SLR(1) Parser - LR(0) Items & States LR(1) Parser - shift/reduce conflict resolution - LR(1) Items & States LALR(1) Parser - LR(1) state merge - reduce-reduce conflict 36

LR(k) Grammar A grammar that can be parsed by an LR parser examining up to k input symbols on each move is called an LR(k) grammar 37

SLR Parser Coverage: weakest in terms of #grammars it succeeds Easiest to construct Parser: a DFA for recognizing viable prefixes States: Sets of LR(0) Items The items in a set can be viewed as the states of an NFA recognizing viable prefixes Grouping items into sets is equivalent to subset construction 38

Viable Prefix The set of prefixes of c.s.f. s (canonical/right sentential forms) that can appear on the stack of a shift-reduce parser are called viable prefixes. Equivalently, it is a prefix of a right-sentential form that does not continue past the right end of the rightmost handle of that sentential form If is a viable prefix, then w * w is a c.s.f. 39

Item and Valid Item An LR(0) item (item for short) is a marked production [A 1 2 ] (dotted rule: production with a dot at RHS) An item [A 1 2 ] is said to be valid for some viable prefix 1 iff w * S * Aw 1 2 w The represents where we are now during parsing Left of dot: those scanned Right of dot: those to be visited later S A w 1 2 40

Example of Valid Item Consider the grammar: S 1C D C 3 4 D 1B B 2 S S or S D 1 C Valid items for the viable prefix : [S 1C], [S D], and [D 1B] 1 B 41

Example of Valid Item (cont.) Assume 1, i.e., S ' could be S 1 C or S 1 C D S 1 B 3 4 2 Valid items for the viable prefix 1 : [S 1 C], [C 3], [C 4], [D 1 B], and [B 2] 42

Example of Valid Item (cont.) Assume S 1 C 3 Valid item for viable prefix 13 : [C 3 ] Valid item for viable prefix 1C : [S 1C ] 43

Closure: All Valid Items Enumerable from G Given a grammar E E E E+T T T T*F F F (E) id What are valid items for the viable prefix E+? [E E+ T], but also [T... F] since 1 2 E * E+T T F E+ F 1 2 Likewise, [T T*F], [T F], [F (E)], [F id] called Closure of [E E+ T] (inclusive) 3 3 4 1 4 44

Computation of Closure Given a set, I, of items Initially Closure(I) = I Loop: for all items [A B ] If [A B ] is in Closure(I) and B is in P, then include [B ] into Closure(I). Repeat the Loop until no new dotted rules can be added Initial set of items for a grammar: I 0 = Closure({[S S] }) (S: start symbol, S : augmented start symbol) 45

GOTO Computation Let I be a set of items which are valid for some viable prefix. Then goto(i,x), where X (N or Σ), is the set of items which are valid for the viable prefix X. So [A X ] in I implies Closure({[A X ]}) in goto(i,x) S * A]w X w X w ([]: set of items I, including [A X ] others) = 46

Sets of LR(0) Items Construction Augment the grammar with: S S Let I 0 = Closure({[S S] }), C = {I 0 } while (not all elements of C are marked) { } -select an unmarked item set of C (say and mark it; - X (V or Σ), if goto(i,x) is not already in C, then add goto(i,x) to C (unmarked); also called Characteristic Finite State Machine (CFSM) Construction Algorithm. I ) 47

SLR(1) Parsing Actions Compute the CFSM states C={I 0,I 1,,I n }. 1. If [A a ] I i and goto(i i,a) = I j then set action(i i,a) = shift,i j (where a is a terminal) 2. If [A ] I i then set action(i i,a) = reduce A for all a in Follow(A) 3. If [S S ] I i then set action(i i,$) = accept 4. Other action(*,*) = error 48

Conflicts Shift-reduce conflicts: both a shift action and a reduce action are possible in the same Closure. E.g., state 2 in Figure 4.37 (p.229) [Aho 86] Reduce-reduce conflicts: two or more distinct reduce actions are possible in the same Closure. 49

Example: Grammar G for Math Expressions (0) E E (1) E E+T (2) E T (3) T T*F (4) T F (5) F (E) (6) F id Follow(E)={+,),$}, Follow(T)={+,),$,*}, Follow(F)={+,),$,*} 50

Computing SLR(1) States for G an SLR(1) State = a set of LR(0) items (See the next slide, Fig. 4.35, page 225, [Aho 86]) 51

Canonical LR(0) Collection for G I0: E. E E.E+T E.T T.T*F T.F F.(E) F.id id ( I1: E E. E E.+T I2: E T. T T.*F I3: T F. I4: F (.E) E.E+T E.T T.T*F T.F F.(E) F.id I6: E E+.T T.T*F T.F F.(E) F.id I7: T T*.F F.(E) F.id I8: F (E.) E E.+T I9: E E+T. T T.*F I10: T T*F. I11: F (E). I5: F id. 52

Parsing Table for Expression Grammar (0) E E (1) E E + T (2) E T (3) T T * F (4) T F (5) F ( E ) (6) F id Follow(E)={+,),$} Follow(T)={+,),$,*} Follow(F)={+,),$,*} State Action Goto id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 53

Transition Diagram of DFA D for Viable Prefixes State transition in terms of sets of LR(0) items (Fig. 4.36) SLR(1) Parsing Table: (Fig. 4.31) I i = a => I j : action(i,a) = shift-j I i = A => I j : goto(i,a) = j I i : [A. ] action(i,follow(a)) = reduce [A If A = S (augmented start symbol ) action(i,$)=accept 54

Visualizing Transitions in the Transition Diagram Shift: moving forward one step along arc Equivalent to pushing input symbols Reduce LHS RHS : moving backward to a previous state s along arcs labeled with the RHS symbols Then GOTO(s, LHS) equivalent to popping RHS symbols from stack then pushing LHS, then redefining current state 55

Parsing Table for Expression Grammar State action goto id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 56

LR Parsing Table Construction Techniques Canonical LR Parsing Table LALR Parsing Table (See Textbook ) 57

Canonical LR Parser SLR(1) parser does NOT always work SLR(1) Grammar => Unambiguous Unambiguous CFG =/=> SLR(1) Grammar E.g., Shift-reduce conflicts in the SLR(1) parsing table may NOT be a real shift-reduce conflict (e.g., impossible reduce) Need more specific & additional information to define states [to avoid false reductions] LR(1) items, instead of LR(0) items Much more states than SLR(1) Need (canonical) LR(1) or LALR(1) Parsers (Parsing Table construction methods) 58

(0) S S (1) S L = R (2) S R Example: non-slr(1) Grammar for Assignment (3) L * R (content of R) (4) L id (5) R L L I2: Follow(R) = { =, } (1) S L. = R (5) R L. R Action(2, = ) = reduce 5 I3: Action(2, = ) =shift 6 (2) S R. = Follow(S) S => L = R => *R = R IF: Reduce on = Goto I3 Error ( Follow(S)) 59 NOT Really Reducible

Example: non-slr(1) Grammar for Assignment Problem: G is unambiguous SLR Shift/reduce conflict is false, but SLR parsing table is unable to remember enough left context to decide proper action on = when seeing a string reducible to L 60

Why Unambiguous Yet Non-SLR(1) Some reduce actions are not really reducible by checking input against Follow(LHS) Not all symbols in FOLLOW(LHS) result in successful reduction to S. May fail after a few steps of reductions. SLR(1) states does not resolve such conflicts by using LR(0)-item defined states Need more specific constraints to rule out a subset of Follow(LHS) from indicating a reduction action 61

LR(1) Parsing Table Construction SLR: reduce A on input a if Ii contains [A.] & a FOLLOW(A) Not really reducible for all a FOLLOW(A) Only a subset (maybe proper subset) But on some cases: S a =/=> A a Reduce A does not produce a right sentential form E.g., S L = R =/=> S R = R although S *R = R = in follow(r) 62

LR(1) Parsing Table Construction Solution: Define each state by including more specific information to rule out invalid reductions Sometimes results in splitting states of the same core LR(0) items: [A. ] Only dotted production (the core ) LR(1) items: [A., LA s] Dotted production(the core ), plus lookaheads that allow reduction upon [A ] 1 : length of LA symbols 63

LR(1) Parsing Table Construction [A., a] (& ) : LA ( a ) has no effect on items of this form [A., a] (i.e., = ): LA has effect on items of this form Reduction is called for only when next input is a (not all terminal symbols in Follow(A)) Only a subset in Follow(A) will be the right LA s Initially, only one restriction is known: [S. S, $] Infer other restrictions by closure computation 64

LR(1) Item and Valid Item An LR(1) item is a dotted production plus lookahead symbols: [A,, a] An LR(1) item [A,, a] is said to be valid for a viable prefix if r.m. derivation S * A w w, where 2. a First(w) (or w= && a = $ ) The represents where we are now during parsing Left of dot: those scanned Right of dot: those to be visited later 65

LR(1) Parsing Table Construction Change the closure() and goto() functions of SLR parsing table construction, with initial collection: C = {closure({s. S, $})} [A B a] valid implies [B, b] valid if b is in FIRST( a) Construction method for set of LR(1) items See next few pages 66

LR(1): Closure(I) Given a set, I, of items Initially Closure(I) = I Repeat: for each items [A B a] in I, each production B is in G, and each terminal b in FIRST( a), include [B, b] to Closure(I). Until no more items can be added to I 67

LR(1): GOTO(I,X) Let J = {[A X, a] such that [A X a] is in I}. goto(i,x) = closure(j) That is: J = {} For all [A X a] in I, J += {[A X, a]} Return(closure(J)) I: [A X, a] [A X, a ] X J: [A X, a] [A X, a ] Goto(I,X) = Closure ({[A X, a], [A X, a ]}) 68

Sets of LR(1) Items Construction Augment the grammar with: S S, call it G Let I 0 = Closure({[S S, $] }), C = {I 0 } Repeat { } - I C, - X (N or Σ), if goto(i,x) is not already in C, then add goto(i,x) to C Until no more sets of items can be added to C 69

Example: resolving shift/reduce conflicts with LR(1) items G : {S S, S CC, C cc d} L(G)={ c m d c n d } => I0 ~ I9 (Fig. 4.39, p. 235 [Aho 86]) I3 vs. I6: same set of LR(0) items with different lookaheads Conditions for reduction are different I3: reduce on c/d (when constructing 1 st C ) I6: reduce on $ (when constructing 2 nd C ) 70

Construction of Canonical LR(1) Parsing Table Algorithm 4.10 Shift: (same as SLR, ignoring LA in item) Reduce on a : [A,, a] Accept on $ : [S S,, $] Goto: (same as SLR) LR(1) Grammar: a grammar without conflicts (multiply defined actions) in LR(1) Parsing Table 71

SLR(1) vs. LR(1) LR(1): more specific states May split into states with the same core but with different lookaheads SLR(1) Grammar LR(1) Grammar Number of states LR(1) >> SLR(1) 72

LALR(1) Merge LR(1) states with the same core, while retaining lookahead symbols Considerably smaller than canonical LR tables Most programming language constructs can be expressed by an LALR grammar SLR and LALR have the same number of states Without/with lookahead symbols [full/subset of FOLLOW] Several hundred states for PASCAL Several thousands, if using LR(1) G is an LALR(1) Grammar: if no conflicts after state merge 73

LALR(1) vs. LR(1) Effect of LR(1) state merge: Behave like the original, or Declare error later, but before shifting next input symbol For correct input: LR and LALR have the same sequence of shift/reduce For erroneous input: LALR requires extra reduces after LR has detected an error (but before shifting next) 74

Example: Merge States with Same Core Fig. 4.39: I4 vs. I7 same reduction with different lookaheads State merge: dotted rules remain, LA s merged Examples: I3 + I6 => I36 I4 + I7 => I47 I8 + I9 => I89 Same as SLR(1) table (Fig. 4.41, p239, [Aho 86]) 75

LALR(1) Parsing Table Construction (I) Method 1: (Naïve Method) [1] Construct LR(1) parsing table Very costly [#states is normally very large] [2] Merge states with the same core 76

LALR(1) Parsing Table Construction (II) Method 2: (Efficient Construction Method) [1] Construct kernels set of LR(0) items, from [S S] It is Possible to Compute shift/reduce/goto actions directly from kernel items kernel items: items whose dot is not at the beginning, except [S. S, $]: those not derived from closure() Can represent a set of items [2] Append lookaheads Compute initial spontaneous lookaheads, and those item pairs that pass Propagated lookaheads 77

LALR(1) Parsing Table Construction (II.1) Compute shift/reduce/goto actions directly from kernel items: (pps. 240-241) Reduce: Shift: Goto: Need to pre-compute First (C) = {A r.m. C * A } for all pairs of nonterminals (C, A) and 78

LALR(1) Parsing Table Construction (II.2) Determine spontaneous and propagated lookaheads (Fig. 4.43) Compute closure({core,#}) by assuming a dummy lookahead # 79

LALR(1) Parsing Table Construction: Example Example: 4.46/Fig. 4.42 [p. 241, Aho 86] Kernels of sets of LR(0) items Fig. 4.37 [with non-kernel items] Example: 4.47 Get Spontaneous & Propagated lookaheads Fig. 4.44: item pairs that propagate lookaheads Fig. 4.45: initial spontaneous lookahead, and multiple passes of lookahead propagation LALR(1) parsing table: Todo by yourself 80

LALR(1) Parsing Table Construction LALR(/LR) (Fig 4.45) SLR (Fig. 4.37) SLR: I2: shift/reduce conflict on = I2: (1) S L. = R (5) R L. LALR(/LR): I2: shift on =, reduce on $, NO conflict I2: (1) S L. = R, $ (5) R L., $ 81

Using Ambiguous Grammar (see Handouts) 82

Parser Generators YACC (Slide Part II) 83