MODULE 14 SLR PARSER LR(0) ITEMS

Similar documents
Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

S Y N T A X A N A L Y S I S LR

LR Parsing Techniques

Principles of Programming Languages

UNIT III & IV. Bottom up parsing

LALR Parsing. What Yacc and most compilers employ.

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

LR Parsers. Aditi Raste, CCOEW

Compiler Design 1. Bottom-UP Parsing. Goutam Biswas. Lect 6

Bottom up parsing. The sentential forms happen to be a right most derivation in the reverse order. S a A B e a A d e. a A d e a A B e S.

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

LR Parsing Techniques

MIT Parse Table Construction. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

PART 3 - SYNTAX ANALYSIS. F. Wotawa TU Graz) Compiler Construction Summer term / 309

3. Syntax Analysis. Andrea Polini. Formal Languages and Compilers Master in Computer Science University of Camerino

Bottom Up Parsing. Shift and Reduce. Sentential Form. Handle. Parse Tree. Bottom Up Parsing 9/26/2012. Also known as Shift-Reduce parsing

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Downloaded from Page 1. LR Parsing

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Bottom-Up Parsing LR Parsing

Parsing. Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1), LALR(1)

LR Parsing, Part 2. Constructing Parse Tables. An NFA Recognizing Viable Prefixes. Computing the Closure. GOTO Function and DFA States

UNIT-III BOTTOM-UP PARSING

A left-sentential form is a sentential form that occurs in the leftmost derivation of some sentence.

Bottom-up parsing. Bottom-Up Parsing. Recall. Goal: For a grammar G, withstartsymbols, any string α such that S α is called a sentential form

Lexical and Syntax Analysis. Bottom-Up Parsing

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

SLR parsers. LR(0) items

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Bottom-Up Parsing II. Lecture 8

Context-free grammars

Compiler Construction: Parsing


Review: Shift-Reduce Parsing. Bottom-up parsing uses two actions: Bottom-Up Parsing II. Shift ABC xyz ABCx yz. Lecture 8. Reduce Cbxy ijk CbA ijk

CSE P 501 Compilers. LR Parsing Hal Perkins Spring UW CSE P 501 Spring 2018 D-1


shift-reduce parsing

Bottom-Up Parsing II (Different types of Shift-Reduce Conflicts) Lecture 10. Prof. Aiken (Modified by Professor Vijay Ganesh.

CS308 Compiler Principles Syntax Analyzer Li Jiang

LALR stands for look ahead left right. It is a technique for deciding when reductions have to be made in shift/reduce parsing. Often, it can make the

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

LR(0) Parsing Summary. LR(0) Parsing Table. LR(0) Limitations. A Non-LR(0) Grammar. LR(0) Parsing Table CS412/CS413

Table-driven using an explicit stack (no recursion!). Stack can be viewed as containing both terminals and non-terminals.

Conflicts in LR Parsing and More LR Parsing Types

Compilers. Bottom-up Parsing. (original slides by Sam

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Lecture 8: Deterministic Bottom-Up Parsing

Parsing Wrapup. Roadmap (Where are we?) Last lecture Shift-reduce parser LR(1) parsing. This lecture LR(1) parsing

CS 4120 Introduction to Compilers

Syn S t yn a t x a Ana x lysi y s si 1

Lecture 7: Deterministic Bottom-Up Parsing

Compiler Construction 2016/2017 Syntax Analysis

Compilation 2012 Context-Free Languages Parsers and Scanners. Jan Midtgaard Michael I. Schwartzbach Aarhus University

Acknowledgements. The slides for this lecture are a modified versions of the offering by Prof. Sanjeev K Aggarwal

Bottom-Up Parsing. Lecture 11-12

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

Lecture Bottom-Up Parsing

Bottom-Up Parsing. Lecture 11-12

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Concepts Introduced in Chapter 4

Syntax Analyzer --- Parser

CSE302: Compiler Design

LR Parsing LALR Parser Generators

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Wednesday, August 31, Parsers

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

In One Slide. Outline. LR Parsing. Table Construction

LR Parsing - The Items

VIVA QUESTIONS WITH ANSWERS

LR Parsing - Conflicts

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

1 Recursive Descent (LL(1) grammars)

LR Parsing LALR Parser Generators

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Chapter 2 :: Programming Language Syntax

Monday, September 13, Parsers

Parsers. Xiaokang Qiu Purdue University. August 31, 2018 ECE 468

Example CFG. Lectures 16 & 17 Bottom-Up Parsing. LL(1) Predictor Table Review. Stacks in LR Parsing 1. Sʹ " S. 2. S " AyB. 3. A " ab. 4.

Bottom-up Parser. Jungsik Choi

CS606- compiler instruction Solved MCQS From Midterm Papers

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

4. Lexical and Syntax Analysis

Context-Free Grammars and Parsers. Peter S. Housel January 2001

4. Lexical and Syntax Analysis

Introduction. Introduction. Introduction. Lexical Analysis. Lexical Analysis 4/2/2019. Chapter 4. Lexical and Syntax Analysis.

DEPARTMENT OF INFORMATION TECHNOLOGY / COMPUTER SCIENCE AND ENGINEERING UNIT -1-INTRODUCTION TO COMPILERS 2 MARK QUESTIONS

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

How do LL(1) Parsers Build Syntax Trees?

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Algorithms for NLP. LR Parsing. Reading: Hopcroft and Ullman, Intro. to Automata Theory, Lang. and Comp. Section , pp.

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

PRINCIPLES OF COMPILER DESIGN

More Bottom-Up Parsing

LR Parsing. Table Construction

LR Parsing E T + E T 1 T

FROWN An LALR(k) Parser Generator

CSC 4181 Compiler Construction. Parsing. Outline. Introduction

Transcription:

MODULE 14 SLR PARSER LR(0) ITEMS In this module we shall discuss one of the LR type parser namely SLR parser. The various steps involved in the SLR parser will be discussed with a focus on the construction of LR (0) items. 14.1 LR Parsers LR parsers are a type of Bottom-up parsers. The bottom-up parsers that were discussed in the previous modules have lot of disadvantages. They are Simple Shift-reduce parsers have lot of Shift/Reduce conflicts Operator precedence parser is for a small class of grammars and has shift/reduce conflicts Therefore the solution is to go for a powerful class of bottom-up parsers, which is LR parser. LR (1) parsers recognize the languages in which one symbol of look-ahead is sufficient to decide whether to shift or reduce. The L stands for left-to-right scan of the input, R for reverse rightmost derivation and 1 for one symbol of look-ahead. The steps involved in the LR parsers are listed below: Read input, one token at a time Use a stack to keep track of current state The state at the top of the stack summarizes the information underneath. The stack also contains information about what has been parsed so far. Use a parsing table to determine the action to be performed based on current state of the stack and look-ahead symbol. The parsing table construction takes into account the shift, reduce, accept or error action and announces successful or unsuccessful parsing. 14.2 Types of LR parsers Depending on the complexity of design and the parsing capability, the LR parsers can be classified into Simple LR, Canonical LR and Look-ahead LR parsers. 1.2.1 Simple LR Parsers (SLR) The Simple LR parsers are the easiest to implement. It uses LR (0) items to construct the parsing table and works as a shift-reduce parser. It is easy to implement, but not powerful and still has some shift/reduce and reduce/reduce conflicts even for an unambiguous grammar. 1.2.2 Canonical LR Parsers (CALR).

The Canonical LR parses are powerful than the SLR parsers. It uses LR (1) items instead of LR(0) items, for constructing the parsing table. It is difficult to implement in comparison with the SLR counterpart. It reduces the shift/reduce conflict and the reduce/reduce conflict to a greater extent. 1.2.3 Look-ahead LR parser (LALR) This parser is a condensed version of canonical LR parser. The difficulty of implementation is between the SLR and CALR parsers. This parser also uses LR(1) items similar to CALR parser. But, this parser may introduce conflicts. In this module we will be looking at the steps involved in the SLR parser. 1.3 SLR parsers construction As LR parsers have to rely on a handle. When a SLR parser processes the input, it must identify all possible handles in the right sentential form by scanning the input from right. For example, consider the expression grammar and the input string a + b. We use a to keep track of the handle as the input will be looked from left to right and it has to take care of a handle in the right sentential form. So, to accommodate this, assume the parser has processed a and reduced it to E. Then, the current state can be represented by E +E where means E has already been parsed and + E is a potential suffix, which, if determined, yields to a successful parse. Thus our ultimate aim is to finally reach state E+E, which corresponds to an actual handle yielding to the reduction E E+E. LR parsing works by building an automata where each state represents what has been parsed so far and what we intend to parse after looking at the current input symbol. This is indicated by productions having a These productions are referred to as items. Items that have the at the end leads to the reduction by that production. This is the basis for the construction of the LR() items. The overall architecture of the LR parser is given in figure 14.1. Same architecture is applicable for SLR, CALR and LALR parsers. All the parsers will use a stack and a parsing table. The input string is appended with a $ and the stack is loaded with an initial stack symbol. The parsing table will basically have two sections, action and goto. The action field will correspond to the shift, reduce, accept and error actions. The goto field helps in manipulating the stack to decide on one of the actions.

Figure 14.1 Overall flow of the LR parser The steps involved in the SLR parser includes the following: Form the augmented grammar The augmented grammar G is constructed from grammar G (V,T,P, S) where G is defined as ({V U S }, T, P U {S S}, S ) where a new start symbol is added and this new start symbol produces the start symbol S of the grammar G. The purpose of this new start symbol is to indicate the completion of successful parsing Construction of LR(0) items The LR(0) items is constructed from the augmented grammar by placing a at appropriate places. Parsing Table construction - Construct follow() for all the non-terminals which require construction of first() for all the terminals and non-terminals for the grammar G. Using the LR(0) items, and follow( ) of the grammar G, construct the parsing table Using the parsing table and a stack, parse the input to verify whether the string belongs to the grammar or not. In this module we will discuss the construction of the LR(0) items. 14.4 LR(0) items An LR(0) item also referred to as the canonical LR(0) items of a grammar G is a production of G with a at some position of the right-hand side. Thus, a production A X Y Z has four items: [A X Y Z]

[A X Y Z] [A X Y Z] [A X Y Z ] The production A has one item [A ] The items can be divided into Kernel items and Non-kernel items. The kernel items are the ones which has a as the first symbol in the RHS of the productions. All other items are referred to as non-kernel items. The algorithm for construction of LR(0) items is based on two functions, Closure() and goto(). The Closure (I) is defined as the Closure of item I, and is constructed as the set of items from the current item I based on two rules: 1. Initially every item in I is added to closure (I) 2. If A α.bβ is in closure (I), and B, is a production, then add the item B to I if it is not already there. Keep adding productions till no more new items could be added to I The computation of closure is given in algorithm 14.1 Algorithm 14.1 Closure(I) { } J := I repeat For each item [A B ] J then for each production B in the grammar of G such, add the item [B ] to J until no new items can be added to J return J; The first step of algorithm 14.1 starts with one item, by introducing a in the beginning of the RHS of the production and copies this to a set J. Then we apply the step 2 of the rules discussed and implement it as a repeat until loop till no more items can be added in a sequential and iterative fashion.

The next step of the items construction is the goto() function. This function is defined as goto(i, X) where I is the current item being considered and X is the grammar symbol. Goto(I, X) is defined using the following rules. Compute the closure of the set of all items A αx.β such that A α.xβ is present in I Algorithm 14.2 explains the procedure to compute goto(i, X). Intuitively, goto(i,x) is the set of items that are valid for the viable prefix X when I is the set of items that are valid for. Algorithm 14.2 GOTO(I, X) { } repeat For each item [A X ] I, add the set of items closure({[a X ]}) to goto(i,x) if not already there until no more items can be added to goto(i,x) As can be understood from algorithm 14.2 the body of the repeat-until loop considers every item shifts the dot by one position to the right and form this item as the first one of a new item set. Then, for the shifted item, closure is computed using the procedure already discussed. After understanding the procedure to compute Closure(I) and goto(i, X) we could now devise the algorithm to construct, C the canonical collection of sets of LR(0) items for the augmented grammar G. The grammar G is augmented with a new start symbol S and production S S The construction procedure is given in algorithm 14.3 Algorithm 14.3 Items(G ) { C = closure({[s S]}) repeat For each set of items I C and each grammar symbol X (N T) such that goto(i,x) C and goto(i,x), add the set of items goto(i,x) to C until no more sets can be added to C }

As can be seen from algorithm 14.3, the first production of the augmented grammar is considered as the first item and compute the closure of it. From, this first Item set, we consider every item and compute goto(i, X) and keep creating item sets till no more new item set could be formed. Example 14.1 Let us construct the set of items for the Expression grammar. The expression grammar is augmented with a new start symbol E and a new production E E. To start the grammar is written with productions numbered. 0. E E 1. E E + T 2. E T 3. T T * F 4. T F 5. F (E) 6. F id From algorithm 14.3, we need to add the item E. E and find the closure of this to add it to the first item set. The complete set of items is given in Table 14.1 14.1 Itemset construction for the expression grammar Item Set of items Goto Comments E.E This is the Add the initial item and compute closure E. E +T first item, so no Goto There is a non-terminal E after the dot in the RHS, so add the productions of the nonterminal E E.T Added based on the E appearing after the dot. Now T appears after the dot. I 0 T.T * F Add productions of T T.F Here, F appears after the dot. So, add the production of F F. (E) Productions of F are added based on the previous situation F. id There is no non-terminal appearing after the I 1 E E. E E. + T dot. Therefore that complete this item set (I 0, E) = I 1 From I 0 we consider E appearing after the dot, shift the dot by one position and compute the closure for the resultant. Here there are 2 items, that has E after the dot. So, we shift the dot and add it. The dot now comes as the last symbol, thus no more items could be added to I 1 I 2 E T. (I 0, T), Two production in I 0 has a T after the. and (I 3, T) = I 2 we shift the dot by one position to get the new item T T.*F For both these items, after shifting the dot by

I 3 F (.E) E.E + T E.T (I 0, ( ), (I 3, ( ), (I 6, ( ) (I 10,( ) = I 3 T.T * F T.F F.(E) F.id I 4 T F. (I 0, F), (I 3, F), (I 6, F) = I 4 I 5 F id. (I 0, id), (I 3, id) (I 6, id), one position, it has no symbol or terminal thus no more new items could be added We shift the dot and create the first item We see a non-terminal after the dot. So, we add the non-kernel items of E Adding the items of E, results in to add the items of T and F Only one item and it terminates as it is a kernel item Only one item and it terminates as it is a kernel item From I we move the dot by one position and (I 10, id) = I 5 I 6 E E +. T (I 1, +), (I 7, +) = I 6 1 observer a T after the dot. So we add the productions of T which results in to add the productions of F T.T * F T.F F.(E) F.id I 7 F (E.) (I 3, E) = I 7 The production where there is a dot before the E in the RHS is considered and the dot is shifted by one position to the right thus resulting in two productions E E. +T I 8 E E + T. (I 6, T) = I 8 T T. * F I 9 F (E). (I 7, ) ) = I 9 I 10 T T *. F (I 8, * ), (I 2, * ) = I 10 F.(E) F.id I 11 T T * F. (I 10, F ) = I 11 So to summarize table 14.1, we observe the non-terminal / terminal appearing in the RHS after the dot and we compute goto by writing the item with the dot shifted by one position to the right. After shifting we compute closure of the shifted item. This result in adding the items if the new symbol that is appearing after the dot is a non-terminal, where we add items of that non-

terminal. As can be seen from the goto() column of table 14.1, certain items set results in repetition and thus we refer it as the same item number that is already existing This items set is stored as a DFA and is given in figure 14.2 The nodes of the DFA correspond to the item number. The edges corresponds to the grammar symbols and the transition function is derived from the goto() function. The final states are the items that have a dot at the end of the symbol. Summary: In this module we discussed the types of LR parsers and learnt the pre-processing step of LR(0) items construction in a Simple LR parsers. In the next module we will discuss the construction of the SLR parsing table and the parsing action of the SLR parser.