shift-reduce parsing

Similar documents
MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Lexical and Syntax Analysis. Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

CS 4120 Introduction to Compilers

S Y N T A X A N A L Y S I S LR

Bottom-Up Parsing. Lecture 11-12

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

In One Slide. Outline. LR Parsing. Table Construction

UNIT III & IV. Bottom up parsing

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

Lecture Bottom-Up Parsing

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

LR Parsing LALR Parser Generators

Lecture 8: Deterministic Bottom-Up Parsing

LR Parsing - The Items

Parsing #1. Leonidas Fegaras. CSE 5317/4305 L3: Parsing #1 1

Monday, September 13, Parsers

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Wednesday, August 31, Parsers

Conflicts in LR Parsing and More LR Parsing Types

Lecture 7: Deterministic Bottom-Up Parsing

LR Parsing. Table Construction

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

LR Parsing LALR Parser Generators

Wednesday, September 9, 15. Parsers

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

MODULE 14 SLR PARSER LR(0) ITEMS


Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

Downloaded from Page 1. LR Parsing

Chapter 3. Parsing #1

Context-Free Grammars and Parsers. Peter S. Housel January 2001

Context-free grammars

Compilers. Bottom-up Parsing. (original slides by Sam

Compiler Construction: Parsing

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Bottom-Up Parsing II. Lecture 8

Principles of Programming Languages

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

More Bottom-Up Parsing

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Bottom-Up Parsing LR Parsing

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

LALR Parsing. What Yacc and most compilers employ.

How do LL(1) Parsers Build Syntax Trees?

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

CS415 Compilers. LR Parsing & Error Recovery

Compiler Construction 2016/2017 Syntax Analysis

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

4. Lexical and Syntax Analysis

UNIT-III BOTTOM-UP PARSING

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

LR Parsing Techniques

LR Parsers. Aditi Raste, CCOEW

Top down vs. bottom up parsing

CS453 : Shift Reduce Parsing Unambiguous Grammars LR(0) and SLR Parse Tables by Wim Bohm and Michelle Strout. CS453 Shift-reduce Parsing 1

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

CSE302: Compiler Design

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

SLR parsers. LR(0) items

Lecture Notes on Bottom-Up LR Parsing

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Compilation 2013 Parser Generators, Conflict Management, and ML-Yacc

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Note that for recursive descent to work, if A ::= B1 B2 is a grammar rule we need First k (B1) disjoint from First k (B2).

Syntax Analysis, VII One more LR(1) example, plus some more stuff. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. target code.

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Configuration Sets for CSX- Lite. Parser Action Table

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Chapter 3 Parsing. time, arrow, banana, flies, like, a, an, the, fruit

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Chapter 4. Lexical and Syntax Analysis

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Action Table for CSX-Lite. LALR Parser Driver. Example of LALR(1) Parsing. GoTo Table for CSX-Lite

LR Parsing E T + E T 1 T

Lecture Notes on Bottom-Up LR Parsing

Transcription:

Parsing #2

Bottom-up Parsing Rightmost derivations; use of rules from right to left Uses a stack to push symbols the concatenation of the stack symbols with the rest of the input forms a valid bottom-up derivation E - num x-2*y$ stack input Derivation: E-num*id$ Two operations: reduction: if a postfix of the stack matches the RHS of a rule (called a handle), replace the handle with the LHS nonterminal of the rule eg, reduce the stack x * E + E by the rule E ::= E + E new stack: x * E shifting: if no handle is found, push the current input token on the stack and read the next symbol Also known as shift-reduce parsing

0) S :: = E $ 1) E ::= E + T 2) E ::= E - T 3) E ::= T 4) T ::= T * F 5) T ::= T / F 6) T ::= F 7) F ::= num 8) F ::= id Example Input: x-2*y$ Stack rest-of-the-input Action 1) id - num * id $ shift 2) id - num * id $ reduce by rule 8 3) F - num * id $ reduce by rule 6 4) T - num * id $ reduce by rule 3 5) E - num * id $ shift 6) E - num * id $ shift 7) E - num * id $ reduce by rule 7 8) E - F * id $ reduce by rule 6 9) E - T * id $ shift 10) E - T * id $ shift 11) E - T * id $ reduce by rule 8 12) E - T * F $ reduce by rule 4 13) E - T $ reduce by rule 2 14) E $ shift 15) S accept (reduce by 0)

Machinery Need to decide when to shift or reduce, and if reduce, by which rule use a DFA to recognize handles Example 0) S ::= R $ 1) R ::= R b 2) R ::= a state 2: accept (reduce by 0) state 3: reduce by 2 state 4: reduce by 1 The DFA is represented by an ACTION and a GOTO table The stack contains state numbers now ACTION GOTO a b $ S R 0 s3 1 1 s4 s2 2 a a a 3 r2 r2 r2 4 r1 r1 r1

The Shift-Reduce Parser push(0); read_next_token(); for(;;) { s = top(); /* current state is taken from top of stack */ if (ACTION[s,current_token] == 'si') /* shift and go to state i */ { push(i); read_next_token(); } else if (ACTION[s,current_token] == 'ri') /* reduce by rule i: X ::= A1...An */ { perform pop() n times; s = top(); /* restore state before reduction from top of stack */ push(goto[s,x]); /* state after reduction */ } else if (ACTION[s,current_token] == 'a') success!! else error();

Example Example: parsing abb$ ACTION GOTO a b $ S R 0 s3 1 1 s4 s2 2 a a a 3 r2 r2 r2 4 r1 r1 r1 Stack rest-of-input Action 0 abb$ s3 0 3 bb$ r2 (pop, push GOTO[0,R] since R ::= a) 0 1 bb$ s4 0 1 4 b$ r1 (pop twice, push GOTO[0,R] since R ::= R b) 0 1 b$ s4 0 1 4 $ r1 (pop twice, push GOTO[0,R] since R ::= R b) 0 1 $ s2 0 1 2 accept

Table Construction Problem: given a CFG, construct the finite automaton (DFA) that recognizes handles The DFA states are itemsets (sets of items) An item is a rule with a dot at the RHS eg, possible items for the rule E ::= E + E: E ::=. E + E E ::= E. + E E ::= E +. E E ::= E + E. The dot indicates how far we have progressed using this rule to parse the input eg, the item E ::= E +. E indicates that 1)we are using the rule E ::= E + E 2)we have parsed E, we have seen the token +, and we are ready to parse another E

Table Construction (cont.) The items in a itemset indicate different possibilities that will have to be resolved later by reading more input tokens eg, the itemset: T ::= ( E. ) E ::= E. + T coresponds to a DFA state where we don't know whether we are looking at an ( E ) handle or an E + T handle it will be ( E ) if the next token is ) it will be E + T if the next token is + When the dot is at the end of an item, we have found a handle eg, T ::= ( E ). it corresponds to a reduction by T ::= ( E ) reduce/reduce conflict: an itemset should never have more than one item with a dot at the end otherwise we can't choose a handle

Closure of an Itemset The closure of an item X ::= a. t b is the singleton set that contains the item X ::= a. t b only X ::= a. Y b is the set consisting of the item itself, plus all rules for Y with the dot at the beginning of the RHS, plus the closures of these items eg, the closure of the item E ::= E +. T is the set: E ::= E +. T T ::=. T * F T ::=. T / F T ::=. F F ::=. num F ::=. id The closure of an itemset is the union of closures of all items in the itemset

Constructing the DFA The initial state of the DFA (state 0) is the closure of the item S ::=. a, where S ::= a is the first rule of the grammar For each itemset, if there is an item X ::= a. s b in an itemset, where s is a symbol, we have a transition labeled by s to an itemset that contains X ::= a s. b But 1)if we have more than one item with a dot before the same symbol s, say X ::= a. s b and Y ::= c. s d, then the new itemset contains both X ::= a s. b and Y ::= c s. d 2)we need to get the closure of the new itemset 3)we need to check if this itemset has appeared before so that we don't create it again

Table construction for LR(0)

Table construction for LR(0) X For each edge I J where X is a terminal, we put the action shift J at position (I, X) of the table; if X is a nonterminal, we put goto J at position (I, X). For each state I containing an item S S.$ we put an accept action at (I, $). Finally, for a state containing an item A γ. (production n with the dot at the end), we put a reduce n action at (I, Y) for every token Y.

Example #1 0) S ::= R $ 1) R ::= R b 2) R ::= a ACTION GOTO a b $ S R 0 s3 1 1 s4 s2 2 a a a 3 r2 r2 r2 4 r1 r1 r1

Example #2 0) S' ::= S $ 1) S ::= B B 2) B ::= a B 3) B ::= c

Example #3 S ::= E $ E ::= ( L ) E ::= ( ) E ::= id L ::= L, E L ::= E

Example #4

LR(0) If an itemset has more than one reduction (an item with the dot at the end), it is a reduce/reduce conflict If an itemset has at least one shifting (an outgoing transition to another state) and at least one reduction, it is a shift/reduce conflict A grammar is LR(0) if it doesn't have any reduce/reduce or shift/reduce conflict 1) S ::= E $ 2) E ::= E + T 3) T 4) T ::= T * F 5) F 6) F ::= id 7) ( E ) S ::=. E $ T E ::= T. E ::=. E + T T ::= T. * F E ::=. T T ::=. T * F T ::=. F F ::=. id F ::=. ( E )

Another example of Shift/Reduce conflict

SLR(1) There is an easy fix for some of the shift/reduce or reduce/reduce errors requires to look one token ahead (called the lookahead token) Steps to resolve the conflict of an itemset: 1) for each shift item Y ::= b. c you find FIRST(c) 2) for each reduction item X ::= a. you find FOLLOW(X) 3) if each FOLLOW(X) do not overlap with any of the other sets, you have resolved the conflict! eg, for the itemset with E ::= T. and T ::= T. * F FOLLOW(E) = { $, +, ) } FIRST(* F) = { * } no overlapping! This is a SLR(1) grammar, which is more powerful than LR(0)

SLR parsing table for the previous example

LR(1) The SLR(1) trick doesn't always work S ::= E $ E ::= L = R R L ::= * R id S ::=. E $ E ::=. L = R L E ::= L. = R E ::=. R R ::= L. L ::=. * R L ::=. id R ::=. L R ::= L For a reduction item X ::= a. we need a smaller (more precise) set of lookahead tokens to tell us when to reduce called expected lookahead tokens must be a subset of or equal to FOLLOW(X) (hopefully subset) they are context-sensitive => finer control

LR(1) Items Now each item is associated with expected lookahead tokens eg, L ::= *. R =$ the tokens = and $ are the expected lookahead tokens the are only useful in a reduction item: L ::= * R. =$ it indicates that we reduce when the lookahead token from input is = or $ LR(1) grammar: at each shift/reduce reduce/reduce conflict, the expected lookahead tokens of the reductions must not overlap with the first tokens after dot (ie, the FIRST(c) in Y ::= b. c) Rules for constructing the LR(1) DFA: for a transition from A ::= a. s b by a symbol s, propagate the expected lookaheads when you add the item B ::=. c to form the closure of A ::= a. B b with expected lookahead tokens t, s,..., the expected lookahead tokens of B ::=. c are FIRST(bt) U FIRST(bs) U...

Example S ::= E $ E ::= L = R R L ::= * R id R ::= L S ::=. E $? E ::=. L = R $ L E ::= L. = R $ E ::=. R $ R ::= L. $ L ::=. * R =$ L ::=. id =$ R ::=. L $

LR(1) parsing table construction The algorithm for constructing an LR(1) parsing table is similar to that for LR(0), but the notion of an item is more sophisticated. An LR(1) item consists of a grammar production, a right-hand-side position (represented by the dot), and a lookahead symbol The idea is that an item (A α.β, x) indicates that the sequence α is on top of the stack, and at the head of the input is a string derivable from βx.

LR(1) parsing table construction (cont.) An LR(1) state is a set of LR(1) items, and there are Closure and Goto operations for LR(1) that incorporate the lookahead: The start state is the closure of the item (S.S $,?), where the lookahead symbol? will not matter, because the end-of-file marker will never be shifted. Reduce actions:

Example The following grammar is not SLR but LR(1): S S $ S V = E S E E V V x V * E

LALR(1) If the lookaheads s1 and s2 are different, then the items A ::= a s1 and A ::= a s2 are different this results to a large number of states since the combinations of expected lookahead symbols can be very large. We can combine the two states into one by creating an item A := a s3 where s3 is the union of s1 and s2 LALR(1) is weaker than LR(1) but more powerful than SLR(1) LALR(1) and LR(0) have the same number of states Easy construction of LALR(1) itemsets: start with LR(0) items and propagate lookaheads as in LR(1) don't create a new itemset if the LR(0) items are the same just union together the lookaheads of the corresponding items you may have to propagate lookaheads by looping through the same itemsets until you cannot add more Most parser generators are LALR(1)

Example The following grammar is not SLR but LR(1): S S $ S V = E S E E V V x V * E LR(1) Table LALR(1) Table

Hierarchy of Grammar Classes Any reasonable programming language has a LALR(1) grammar, and there are many parser-generator tools available for LALR(1) grammars. For this reason, LALR(1) has become a standard for programming languages and for automatic parser generators.

Using parser generators JavaCC is an LL(K) parse generator Productions are of the form: void Assignment() : {} { Identifier() "=" Expression() ";" } As shown the grammar should be written to a specific format so that we can use parser generator JavaCC reports like left-recursion in the grammar SableCC is an LALR(1) parser generator. Productions are of the form: assignment = identifier assign expression semicolon ; SableCC reports errors like shift-reduce or reduce-reduce conflicts Read more in the book and documentations for the tools

Error Recovery All the empty entries in the ACTION and GOTO tables correspond to syntax errors We can either report it and stop the parsing continue parsing finding more errors (error recovery) Recovery using the error symbol: exp ID exp exp + ext exp ( exps ) exps exp exps exps ; exp we can specify that if a syntax error is encountered in the middle of an expression, the parser should skip to the next semicolon or right parenthesis (these are called synchronizing tokens) and resume parsing. error

Error Recovery (cont.) Global Error repair: Global error repair finds the smallest set of insertions and deletions that would turn the source string into a syntactically correct string, even if the insertions and deletions are not at a point where an LL or LR parser would first report an error. Burke-Fisher error repair: a limited but useful form of global error repair, which tries every possible single-token insertion, deletion, or replacement at every point that occurs no earlier than K tokens before the point where the parser reported the error.

Chapter 3 Reading