S Y N T A X A N A L Y S I S LR

Similar documents
Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

LR Parsers. Aditi Raste, CCOEW

Principles of Programming Languages

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

MODULE 14 SLR PARSER LR(0) ITEMS

LR Parsing Techniques

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

SLR parsers. LR(0) items

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Context-free grammars

Compiler Construction: Parsing

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Lexical and Syntax Analysis. Bottom-Up Parsing

Compiler Construction 2016/2017 Syntax Analysis

Bottom-Up Parsing. Lecture 11-12

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

Bottom-Up Parsing LR Parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Lecture Bottom-Up Parsing

Compilers. Bottom-up Parsing. (original slides by Sam

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

UNIT-III BOTTOM-UP PARSING

Bottom-Up Parsing. Lecture 11-12

Lecture 8: Deterministic Bottom-Up Parsing

LR Parsing Techniques

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

Bottom-Up Parsing II. Lecture 8

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

LALR Parsing. What Yacc and most compilers employ.

shift-reduce parsing

Lecture 7: Deterministic Bottom-Up Parsing

Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

LR Parsing - The Items

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Downloaded from Page 1. LR Parsing


UNIT III & IV. Bottom up parsing

CS 4120 Introduction to Compilers

LR Parsing LALR Parser Generators

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Concepts Introduced in Chapter 4

Wednesday, August 31, Parsers

Conflicts in LR Parsing and More LR Parsing Types

Monday, September 13, Parsers

LR Parsing LALR Parser Generators

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

In One Slide. Outline. LR Parsing. Table Construction

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

CSE302: Compiler Design

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1


Syntax Analyzer --- Parser

Bottom-up Parser. Jungsik Choi

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

4. Lexical and Syntax Analysis

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Chapter 4: LR Parsing

4. Lexical and Syntax Analysis

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

More Bottom-Up Parsing

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Lecture 14: Parser Conflicts, Using Ambiguity, Error Recovery. Last modified: Mon Feb 23 10:05: CS164: Lecture #14 1

3. Parsing. Oscar Nierstrasz

Intro to Bottom-up Parsing. Lecture 9

Chapter 4. Lexical and Syntax Analysis

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

LR Parsing. Table Construction

Lecture Notes on Bottom-Up LR Parsing

The role of the parser

CS 314 Principles of Programming Languages

Lecture Notes on Bottom-Up LR Parsing

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Parsing. Rupesh Nasre. CS3300 Compiler Design IIT Madras July 2018

CS 406/534 Compiler Construction Parsing Part I

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Transcription:

LR parsing There are three commonly used algorithms to build tables for an LR parser: 1. SLR(1) = LR(0) plus use of FOLLOW set to select between actions smallest class of grammars smallest tables (number of states) simple, fast construction 2. LR(1) handles full set of LR(1) grammars largest tables (number of states) slow, large construction 3. LALR(1) = LR(0) plus analysis of lookahead to select between actions intermediate sized set of grammars same number of states as SLR(1) canonical construction is slow and large (but better construction techniques exist) An LR(1) parser for either ALGOL or PASCAL has several thousand states, while an SLR(1) or LALR(1) parser for the same language may have several hundred states 1 of 20

A viable prefix is Viable prefix 1. a prefix of a right-sentential form that does not continue past the right end of the rightmost handle of that sentential form y, or 2. a prefix of a right-sentential form that can appear on the stack of a shift-reduce parser. If the viable prefix is a proper prefix (that is, a handle), it is possible to add terminals onto its end to form a right-sentential form. As long as the prefix represented by the stack is viable, the parser has not seen a detectable error. If the grammar is unambiguous, there is a unique rightmost handle. LR(k) grammars are unambiguous. Operator grammars may be ambiguous, but are still parsed with shiftreduce parsers. 2 of 20

Viable prefix of a right-sentential form: contains both terminals and nonterminals recognized with NFA ordfa SLR(1) parsing Building a SLR parser begin with NFA for recognizing viable prefixes construct DFA for recognizing viable prefixes augment with FOLLOW to disambiguate reductions States in the NFA are LR(0) items States in the DFA are sets of LR(0) items 3 of 20

LR(0) items An LR(0) item is a string [ α ], where α is a production in G with a at some position in the rhs. The indicates how much of a rhs we have seen at a given state in the parse. [ A ::= XYZ ] indicates that the parser is looking for a string that can be derived from XYZ. [ A ::= XY Z ] indicates that the parser has seen a string derived from XY and is now looking for one derivable from Z. LR(0) Items (no lookahead ) The grammar rule A ::= XYZ generates 4 LR(0) items. 1. [ A ::= XYZ ] 2. [ A ::= X Y Z ] 3. [ A ::= XY Z ] 4. [ A ::= XYZ ] 4 of 20

Canonical LR(0) items The SLR(1) table construction algorithm uses a specific set of sets of LR(0) items. These sets are called the canonical collection of sets of LR(0) items for a grammar G. The canonical collection represents the set of valid states for the LR parser. The items in each set of the canonical collection fall into two classes: kernel items: items where is not at the left end of the rhs as well as [ S ::= S ] non-kernel items: all items where is at the left end of rhs, excluding [ S ::= S ] 5 of 20

LR(0) items Each LR(0) item corresponds to a point in the parse. To generate a parser state from a kernel item, we take its closure: if [ A ::=α Bβ ] I j, then, in state j, the parser might next see a string derivable from Bβ to form its closure, add all items of the form [ B ::= γ ] G Note: An augmented grammar is one where the start symbol appears only on the lhs of productions. For the rest of LR parsing, we will assume the grammar is augmented with a production S ::= S 6 of 20

Canonical LR(0) items The canonical collection of LR(0) items: set of items derivable from [ S ::= S ] set of all items that can derive the final configuration Essentially, each set in the canonical collection of sets of LR(0) items represents a state in an NFA that recognizes viable prefixes. Grouping together is really the subset construction (section 3.6 of Aho, Sethi & Ullman). To construct the canonical collection we need two functions: closure( I ) goto( I; X ) 7 of 20

Closure( I ) Given an item [ A ::= α Bβ ], its closure contains the item and any other items that can generate legal substrings to follow α. Thus, if the parser has viable prefix α on its stack, the input should reduce to Bβ (or for some other item [ B ::= γ ] in the closure). To compute closure( I ) function closure(i) repeat new_item false for each item [ A ::= α Bβ ] I, each production B ::= γ G do if [ B ::= γ ] / I then add [ B ::= γ ] to I new_item true endif until (new_item = false) return I 8 of 20

Goto( I; X ) Let I be a set of LR(0) items and X be a grammar symbol. Then, GOTO ( I; X ) is the closure of the set of all items [ A ::=αx β ] such that [ A ::=α Xβ ] I. If I is the set of valid items for some viable prefix γ, then goto( I; X ) is the set of valid items for the viable prefix γx. goto( I; X ) represents the state reached after recognizing X in state I. To compute goto( I;X ) function goto(i, X) J set of items [ A ::= αx β ] such that [ A ::= α Xβ ] I J closure(j) return J 9 of 20

Collection of sets of LR(0) items We start the construction of the collection of sets of LR(0) items with the item [ S ::= S ], where S is the start symbol of the augmented grammar G and S is the start symbol of G. To compute the collection of sets of LR(0) items procedure items( G ) S 0 closure( { [ S ::= S ] } ) Items { S 0 } ToDo { S 0 } while ToDo not empty do remove S i from ToDo for each grammar symbol X do S new goto( S i ; X ) if S new is a new state then Items Items { S new } ToDo ToDo { S new } endif endfor endwhile return Items 10 of 20

LR(0) machines LR(0) DFA states canonical sets of LR(0) items edges goto transitions recognizes all viable prefixes of handles no lookahead To be able to recognize viable prefixes of the language (instead of the handles), we must be able to reduce handles to nonterminals Reducing a handle (rhs of production) to a nonterminal can be viewed as: returning to state at beginning of handle making transition on nonterminal To return to state at beginning of the handle, we must use the stack! 11 of 20

SLR(1) tables SLR(1) parser augmented LR(0) machine add FOLLOW information using one token of lookahead encoded as ACTION, GOTO tables ACTION table for each [state, lookahead] pair have we reached end of handle? if not, shift if at end of handle, reduce may also accept or error use lookahead to guide decision GOTO table for each [state, nonterminal] pair pick state to go to after reduction look at nonterminal at top of stack 12 of 20

The Algorithm SLR(1) table construction 1. construct the collection of sets of LR(0) items for G. 2. State i of the parser is constructed from I i. (a) if [ A ::= α a β ] I i and goto( I i, a ) = I j, then set ACTION[ i, a] to shift j. ( a must be a terminal) (b) if [ A ::= α ] I i, then set ACTION[ i, a] to reduce A ::= α for all a in FOLLOW ( A ). (c) if [ S ::= S ] I i, then set ACTION[ i, eof] to accept. 3. If goto( I i ; A ) = I j, then set GOTO[ i, A ] to j. 4. All other entries in ACTION and GOTO are set to error 5. The initial state of the parser is the state constructed from the set containing the item [ S ::= S ]. 13 of 20

SLR(1) parser example The Grammar 1 E ::= T + E 2 T 3 T ::= id The Augmented Grammar 0 S ::= E 1 E ::= T + E 2 T 3 T ::= id Symbol FIRST FOLLOW S { id } { eof } E { id } { eof } T { id } { +, eof } 14 of 20

LR(0) states for example S 0 : [ S ::= E ], [ E ::= T + E ], [ E ::= T ], [ T ::= id ] S 1 : [ S ::= E ] S 2 : [ E ::= T + E ], [ E ::= T ] S 3 : [ T ::= id ] S 4 : [ E ::= T + E ], [ E ::= T + E], [ E ::= T ], [ T ::= id ] S 5 : [ E ::= T + E ] 15 of 20

Start S 0 closure ( { [ S ::= E ] } ) Iteration 1 goto( S 0 ; E ) = S 1 goto( S 0 ; T ) = S 2 goto( S 0, id ) = S 3 Iteration 2 goto( S 2, + ) = S 4 Iteration 3 goto( S 4, id ) = S 3 goto( S 4 ; E ) = S 5 goto( S 4 ; T ) = S 2 Example GOTO function 16 of 20

The SLR(1) Automaton S1 E T T + E S0 S2 S4 S5 id id S3 17 of 20

The SLR(1) Automaton S1 [ S ::= E ] T S0 [ S ::= E ] [ E ::= T + E ] [ E ::= T ] [ T ::= id ] E T S2 [ E ::= T + E ] [ E ::= T ] + S4 [ E ::= T + E ] [ E ::= T + E] [ E ::= T ] [ T ::= id ] id id E S3 [ T ::= id ] [ E ::= T + E ] S5 18 of 20

Example SLR(1) ACTION and GOTO tables State ACTION GOTO id + $ E T 0 shift 3 1 2 1 accept 2 shift 4 reduce 2 3 reduce 3 reduce 3 4 shift 3 5 2 5 reduce 1 Stack Input Action 0 id + id $ shift 3 0 3 + id $ reduce 3 (T ::= id ) 0 2 + id $ shift 4 0 2 4 id $ shift 3 0 2 4 3 $ reduce 3 (T ::= id ) 0 2 4 2 $ reduce 2 (E ::= T) 0 2 4 5 $ reduce 1 (E ::= T + E) 0 1 $ accept 19 of 20

What can go wrong? Rules 2a, 2b, & 2c in the SLR(1) table construction algorithm can multiply define an entry in the ACTION table. If this happens, the grammar is not SLR(1). Two cases arise... shift/reduce conflict In general, it indicates an ambiguous construct in the grammar. (The classic example: dangling else in Pascal.) can modify the grammar to eliminate it can resolve in favor of shifting reduce/reduce conflict Again, it indicates an ambiguous construct in the grammar. (A classic example: PL/I call and subscript.) often, no simple resolution parse a nearby language, a superset of the desired language. 20 of 20