LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Similar documents
S Y N T A X A N A L Y S I S LR

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Compiler Construction 2016/2017 Syntax Analysis

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Principles of Programming Languages

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

MODULE 14 SLR PARSER LR(0) ITEMS

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

LR Parsing Techniques

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Context-free grammars

Compiler Construction: Parsing

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

CSE302: Compiler Design

Bottom-Up Parsing. Lecture 11-12

shift-reduce parsing

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

SLR parsers. LR(0) items

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

LR Parsers. Aditi Raste, CCOEW

Bottom-Up Parsing. Lecture 11-12

Lexical and Syntax Analysis. Bottom-Up Parsing

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

CS 4120 Introduction to Compilers

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Chapter 3: Lexing and Parsing

Lecture 7: Deterministic Bottom-Up Parsing

4. Lexical and Syntax Analysis

Lecture 8: Deterministic Bottom-Up Parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Bottom-Up Parsing II. Lecture 8

4. Lexical and Syntax Analysis

In One Slide. Outline. LR Parsing. Table Construction

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Compilers. Bottom-up Parsing. (original slides by Sam

LR Parsing LALR Parser Generators

LALR Parsing. What Yacc and most compilers employ.

LR Parsing Techniques

Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Wednesday, August 31, Parsers

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Monday, September 13, Parsers

LR Parsing LALR Parser Generators

Lecture Bottom-Up Parsing

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:


Chapter 4: LR Parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Lecture Notes on Bottom-Up LR Parsing

Bottom-Up Parsing LR Parsing

Concepts Introduced in Chapter 4

UNIT-III BOTTOM-UP PARSING

LR Parsing - The Items

Lecture Notes on Bottom-Up LR Parsing

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Bottom-up Parser. Jungsik Choi

Chapter 4. Lexical and Syntax Analysis

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

UNIT III & IV. Bottom up parsing

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

CMSC 330, Fall 2009, Practice Problem 3 Solutions

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

3. Parsing. Oscar Nierstrasz

CMSC 330 Practice Problem 4 Solutions

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Languages and Compilers

LR Parsing. Table Construction

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Parsing - 1. What is parsing? Shift-reduce parsing. Operator precedence parsing. Shift-reduce conflict Reduce-reduce conflict

Syntax Analysis Part I

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Context Free Grammars

Conflicts in LR Parsing and More LR Parsing Types

Zhizheng Zhang. Southeast University

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

CS2210: Compiler Construction Syntax Analysis Syntax Analysis

Introduction to Syntax Analysis

Introduction to Syntax Analysis. The Second Phase of Front-End


Transcription:

TDDD16 Compilers and Interpreters TDDB44 Compiler Construction LR Parsing, Part 2 Constructing Parse Tables Parse table construction Grammar conflict handling Categories of LR Grammars and Parsers An NFA Recognizing Viable Prefixes A.k.a. the characteristic finite automaton for a grammar G tates: LR(0) items (= context-free items) of extend. grammar Input stream: The grammar symbols on the stack tart state: [. ] Transitions: Final state: [.] 118a move dot across symbol if symbol found next on stack: A α.bγ to A αb.γ A α.bγ to A αb.γ ε-transitions to LR(0)-items for nonterminal productions from items where the dot precedes that nonterminal: A α.bγ to B.β Peter Fritzson, Christoph Kessler, IDA, Linköpings universitet, 2008. (Example: see whiteboard) 2 TDDD16/TDDB44 Compiler Construction, 2008 Computing the Closure 118b Representing ets of Items 118c For a set I of LR(0) items compute Closure(I): 1. Closure(I) := I 2. If [A α.bβ] in Closure(I) and production B γ then add [B.γ] to Closure(I) (if not already there) 3. Repeat tep 2 until no more items can be added to Closure(I). Remarks: For s=[a α.bγ], Closure(s) contains all NFA states reachable from s via ε-transitions, i.e., starting from which any substring derivable from Bβ could be recognized. A.k.a. ε-closure(s). Then apply the well-known subset construction to transform Closure-NFA -> DFA. DFA states will be sets unioning closures of NFA states Any item [A α.β] can be represented by 2 integers: production number position of the dot within the RH of that production The resulting sets often contain closure items (where the dot is at the beginning of the RH). Can easily be reconstructed (on demand) from other ( kernel ) items Kernel items: start state [.], plus all items where the dot is not at the left end. tore only kernel items explicitly, to save space 3 TDDD16/TDDB44 Compiler Construction, 2008 4 TDDD16/TDDB44 Compiler Construction, 2008 GOTO Function and DFA tates 120a Resulting DFA 120b Given: et I of items, grammar symbol X (Example: see whiteboard) GOTO( I, X ) := U [A α.xβ] in I Closure ( { [A αx.β] } ) To become the state transitions in the DFA Now do the subset construction to obtain the DFA states: C := Closure( { [.] } ) // C: et of sets of NFA states repeat for each set of items I of C: for each grammar symbol X if (GOTO(I,X) is not empty and not in C) add GOTO(I,X) to C until no new states are added to C on a round 5 TDDD16/TDDB44 Compiler Construction, 2008 All states correspond to some viable prefix Final states: contain at least one item with dot to the right recognized some handle reduce may (must) follow Other states: handle recognition incomplete -> shift will follow The DFA is also called the GOTO graph (not the same as the LR GOTO Table!!). This automaton is deterministic as a FA (i.e., selecting transitions considering only input symbol consumption) but can still be nondeterministic as a pushdown automaton (e.g., in state I 1 above: to reduce or not to reduce?) 6 TDDD16/TDDB44 Compiler Construction, 2008 1

From DFA to parser tables: ACTION 1. For each DFA transition I i I j reading a terminal a in Σ (thus, I i contains items of kind [X α.aβ]) ACTION table: state --, a b enter j (shift, new state I j ) in ACTION[ i, a ] 0 X X 4 5 2. For each DFA final state I i 1 A 2 * * 2 X X 4 5 (containing a complete item [X α.]) 3 R1 R1 * * 4 R3 R3 * * enter R x 5 R4 R4 * * (reduce, x = prod. rule number for X α) 6 in ACTION[ i, t ] R2 R2 * * LR(0) parser: for all t in Σ (all entries in row i) LR(1) parser: for all t in LA LR (i,[x α.]) = FOLLOW 1 (X) LALR(1) parser: for all t in LA LALR (i,[x α.]) (see later) 7 TDDD16/TDDB44 Compiler Construction, 2008 From DFA to parser tables: GOTO Table 1. For each DFA transition I i I j reading nonterminal A (i.e., I i contains an item [X α.aβ]) enter GOTO[ i, A ] = j GOTO table: state L E 0 1 6 1 * * 2 * 3 3 * * 4 * * 5 * * 6 * * 8 TDDD16/TDDB44 Compiler Construction, 2008 TDDD16 Compilers and Interpreters TDDB44 Compiler Construction Conflicts and Categories of LR Grammars and Parsers Conflict Examples in LR Grammars hift Reduce conflict: E id + E (shift +) id (reduce id) Reduce Reduce conflict: E id (reduce id) Pcall id (reduce id) (hift Accept conflict) L (accept) L L, E (shift,) Peter Fritzson, Christoph Kessler, IDA, Linköpings universitet, 2008. 10 TDDD16/TDDB44 Compiler Construction, 2008 Conflicts in LR Grammars Observe conflicts in DFA (GOTO graph) kernels or at the latest when filling the ACTION table. hift-reduce conflict A DFA accepting state has an outgoing transition, i.e. contains items [X α.] and [Y β.zγ] for some Z in NUΣ Reduce-Reduce conflict A DFA accepting state can reduce for multiple nonterminals, i.e. contains at least 2 items [X α.] and [Y β.], X!= Y (hift/reduce-accept conflict) A DFA accepting state containing [. --] contains another item [X α.] or [X α.bβ] Only for LR(0) grammars there are no conflicts. 11 TDDD16/TDDB44 Compiler Construction, 2008 Handling Conflicts in LR Grammars (Overview): Use lookahead if lucky, the LR(0) states + a few fixed lookahead sets are sufficient to eliminate all conflicts in the LR(0)-DFA LR(1), LALR(1) otherwise, use LR(1) items [X α.β, a] (a is look-ahead) to build new, larger NFA/DFA expensive (many items/states very large tables) if still conflicts, may try again with k>1 even larger tables Rewrite the grammar (factoring / expansion) and retry If nothing helps, re-design your language syntax ome grammars are not LR(k) for any constant k and cannot be made LR(k) by rewriting either 12 TDDD16/TDDB44 Compiler Construction, 2008 2

Look-Ahead (LA) ets For a LR(0) item [X α.β] in DFA-state I i, define lookahead set LA( I i, [X α.β] ) (a subset of Σ) For LR(1), LALR(1) etc., the LA sets only differ for reduce items: For LR(1): LA LR ( I i, [X α.] ) = { a in Σ: => * βxaγ } = FOLLOW 1 ( X ) for all I i with [X α.] in I i depends on nonterminal X only, not on state I i For LALR(1): LA LALR ( I i, [X α.] ) = { a in Σ: => * βxaw and the LR(0)-DFA started in I 0 reaches I i after reading βα } usually a subset of FOLLOW 1 ( X ), i.e. of LR LA set d d t t I 13 TDDD16/TDDB44 Compiler Construction, 2008 Made it simple: Is my grammar LR(1)? Construct the (LR(0)-item) characteristic NFA and its equivalent DFA (= GOTO graph) as above. Consider all conflicts in the DFA states: hift-reduce: [X α.] [Y β.bγ] b Consider all pairs of conflicting items [X α.], [Y β.bγ]: If b in FOLLOW 1 (X) for any of these not LR(1). Reduce-Reduce: Consider all pairs of conflicting items [X α.], [Y β.]: If FOLLOW 1 (X) intersects with FOLLOW 1 (Y) not LR(1). (hift-accept: similar to hift-reduce) [X α.] [Y β.] 14 TDDD16/TDDB44 Compiler Construction, 2008 Example: L-Values in C Language L-values on left hand side of assignment. Part of a C grammar: 1. 2. L = R 3. R 4. L *R 5. id 6. R L Avoids that R (for R-values) appears as LH of assignments But *R = is ok. This grammar is LALR(1) but not LR(1): 15 TDDD16/TDDB44 Compiler Construction, 2008 Example (cont.) LR(0) parser has a shift-reduce conflict in kernel of state I 2 : I 0 = { [.], [.L=R], [.R], [L.*R], [L.id], R.L] } I 1 = { [ ->.] } I 2 = { [->L.=R], [R->L.] } I 3 = { [->R.] } I 4 = { [L->*.R], [R->.L], [L->.*R], [L->.id] } I 5 = { [L->id.] } I 6 = { [->L=.R], [R->.L], [L->.*R], L->.id] } I 7 = { [L->*R.] } I 8 = { [R->L.] } I 9 = { [->L=R.] } FOLLOW 1 (R) = {, = } LR(1) still shift-reduce conflict in I 2 as = does not disambiguate hift = or reduce to R? 16 TDDD16/TDDB44 Compiler Construction, 2008 Example (cont.) I 0 = { [ ->.], [->.L=R], [->.R], [L->.*R], [L->.id], R->.L] } I 1 = { [ ->.] } I 2 = { [->L.=R], [R->L.] } I 3 = { [->R.] } I 4 = { [L->*.R], [R->.L], [L->.*R], [L->.id] } I 5 = { [L->id.] } I 6 = { [->L=.R], [R->.L], [L->.*R], L->.id] } I 7 = { [L->*R.] } I 8 = { [R->L.] } I 9 = { [->L=R.] } LA LALR ( I 2, [R->L.] ) = { } LALR(1) parser is conflict-free as computation path I 0 I 2 does not really allow = following R. = can only occur after R if *R was encountered before. 17 TDDD16/TDDB44 Compiler Construction, 2008 LALR(1) Parser Construction Method 1: (simple but not practical) 1. Construct the LR(1) items (see later). (If there is already a conflict, stop.) 2. Look for sets of LR(1) items that have the same kernel, and merge them. 3. Construct the ACTION table as for LR(1). If a conflict is detected, the grammar is not LALR(1). 4. Construct the GOTO function: For each merged J = I 1 U I 2 U U I r, the kernels of GOTO(I 1,X),, GOTO(I r,x) are identical because the kernels of I 1,,I r are identical. et GOTO( J, X ) := U { I: I has the same kernel as GOTO(I 1,X) } Method 2: (details see textbook) 1. tart from LR(0) items and construct kernels of DFA states I 0, I 1, 2. Compute lookahead sets by propagation along the GOTO(I j,x) edges (fixed point iteration). 18 TDDD16/TDDB44 Compiler Construction, 2008 3

olve Conflicts by Rewriting the Grammar LR(k) Grammar - Formal Definition p.116 Eliminate Reduce-Reduce Conflict: Factoring ( A ) ( B ) A char integer ident B float double ident [A ident. ] [B ident. ] factor ident ( A ) ( B ) ( C ) A char integer B float double C ident Eliminate hift-reduce Conflict: (one token lookahead: ( ) Inline-Expansion ( A ) OptY ( B ) OptY Y ε [. ( A ) ] [. OptY ( B) ] ( A ) ( B ) Y ( B ) [OptY.Y ] expand Y [OptY.ε ] OptY Y A A [OptY ε. ] B B [Y ] 19 TDDD16/TDDB44 Compiler Construction, 2008 Let G be the augmented grammar for G (i.e., extended by new start symbol and production rule > -- ) G is called a LR(k) grammar if and rm =>* αxw rm => αβw rm =>* γyx rm => αβy and w[1:k] = y[1:k] imply that α = γ and X = Y and x = y = w. i.e., considering at most k symbols after the handle, in each rightmost derivation the handle can be localized and the production to be applied can be determined. Remark: w, x, y in Σ* α, β, γ in (N U Σ)* X, Y in N Example: see whiteboard 20 TDDD16/TDDB44 Compiler Construction, 2008 ome grammars are not LR(k) for any fixed k Example: ab c B bb b b describes language { a b 2N+1 c : N >= 0 } This grammar is not LR(k) for any fixed k. No ambiguous grammar is LR(k) for any fixed k if E then if E then else other statements is ambiguous the following statement has 2 parse trees: if then if E2 then 1 else 2 Proof: As k is fixed (constant), consider for any n > k: =>* a b n B b n c => a b n b b n c The handle cannot be =>* a b n+1 B b n+1 c => a b n+1 b b n+1 localized c with only limited lookahead size k By the LR(k) definition, α = a b n β = b w = b n c 21 TDDD16/TDDB44 Compiler Construction, 2008 1 1 if E then if E then else E2 1 2 if E then else 2 if E then E2 1 22 TDDD16/TDDB44 Compiler Construction, 2008 (cont.) Consider situation (configuration of shift-reduce parser) -- if E then else -- Not clear whether to shift else (following production 2, i.e. if E then is not handle), or reduce handle if E then to (following production 1) Any fixed-size lookahead (else and beyond) does not help! uggestion: Rewrite grammar to make it unambiguous 23 TDDD16/TDDB44 Compiler Construction, 2008 Rewriting the grammar Matched Open Matched if E then Matched else Matched other statements Open if E then if E then Matched else Open is no longer ambiguous if E then Open if E then Mat- else Mat- E2 Matched 1 2 Impossible now to derive any sentential form containing an Open nonterminal from a Matched 24 TDDD16/TDDB44 Compiler Construction, 2008 4

ome grammars are not LR(k) for any fixed k Grammar with productions a a ε is unambiguous but not LR(k) for any fixed k. (Why?) An equivalent LR grammar for the same language is a a ε 25 TDDD16/TDDB44 Compiler Construction, 2008 LR(1) Items and LR(k) Items LR(k) parser: Construction similar to LR(0) / LR(1) parser, but plan for distinguishing between states for k>0 tokens lookahead already from the beginning tates in the LR(0) GOTO graph may be split up LR(1) items: [ A->α.β, a ] for all productions A->αβ and all a in Σ Can be combined for lookahead symbols with equal behavior: [ A->α.β, a b ] or [ A->α.β, L ] for a subset L of Σ Generalized to k>1: [ A->α.β, a 1 a 2 a k ] Interpretation of [ A->α.β, a ] in a state: If β not ε, ignore second component (as in LR(0)) If β=ε, i.e. [ A->α., a ], reduce only if next input symbol = a. 26 TDDD16/TDDB44 Compiler Construction, 2008 LR(1) Parser NFA start state is [ ->., ] Modify computation of Closure(I), GOTO(I,X) and the subset computation for LR(1) items Details see [AU86, p.232] or [ALU06, p.261] Can have many more states than LR(0) parser Which may help to resolve some conflicts Interesting to know For each LR(k) grammar with some constant k>1 there exists an equivalent* grammar G that is LR(1). For any LL(k) grammar there exists an equivalent LR(k) grammar (but not vice versa!) e.g., language { a n b n : n>0 } U { a n c n : n > 0 } has a LR(0) grammar but no LL(k) grammar for any constant k. ome grammars are LR(0) but not LL(k) for any k e.g., A b A Aa a (left recursion, could be rewritten) * Two grammars are equivalent if they describe the same language. 27 TDDD16/TDDB44 Compiler Construction, 2008 28 TDDD16/TDDB44 Compiler Construction, 2008 5