MODULE 14 SLR PARSER LR(0) ITEMS

MODULE 14 SLR PARSER LR(0) ITEMS In this module we shall discuss one of the LR type parser namely SLR parser. The various steps involved in the SLR parser will be discussed with a focus on the construction of LR (0) items. 14.1 LR Parsers LR parsers are a type of Bottom-up parsers. The bottom-up parsers that were discussed in the previous modules have lot of disadvantages. They are Simple Shift-reduce parsers have lot of Shift/Reduce conflicts Operator precedence parser is for a small class of grammars and has shift/reduce conflicts Therefore the solution is to go for a powerful class of bottom-up parsers, which is LR parser. LR (1) parsers recognize the languages in which one symbol of look-ahead is sufficient to decide whether to shift or reduce. The L stands for left-to-right scan of the input, R for reverse rightmost derivation and 1 for one symbol of look-ahead. The steps involved in the LR parsers are listed below: Read input, one token at a time Use a stack to keep track of current state The state at the top of the stack summarizes the information underneath. The stack also contains information about what has been parsed so far. Use a parsing table to determine the action to be performed based on current state of the stack and look-ahead symbol. The parsing table construction takes into account the shift, reduce, accept or error action and announces successful or unsuccessful parsing. 14.2 Types of LR parsers Depending on the complexity of design and the parsing capability, the LR parsers can be classified into Simple LR, Canonical LR and Look-ahead LR parsers. 1.2.1 Simple LR Parsers (SLR) The Simple LR parsers are the easiest to implement. It uses LR (0) items to construct the parsing table and works as a shift-reduce parser. It is easy to implement, but not powerful and still has some shift/reduce and reduce/reduce conflicts even for an unambiguous grammar. 1.2.2 Canonical LR Parsers (CALR).

The Canonical LR parses are powerful than the SLR parsers. It uses LR (1) items instead of LR(0) items, for constructing the parsing table. It is difficult to implement in comparison with the SLR counterpart. It reduces the shift/reduce conflict and the reduce/reduce conflict to a greater extent. 1.2.3 Look-ahead LR parser (LALR) This parser is a condensed version of canonical LR parser. The difficulty of implementation is between the SLR and CALR parsers. This parser also uses LR(1) items similar to CALR parser. But, this parser may introduce conflicts. In this module we will be looking at the steps involved in the SLR parser. 1.3 SLR parsers construction As LR parsers have to rely on a handle. When a SLR parser processes the input, it must identify all possible handles in the right sentential form by scanning the input from right. For example, consider the expression grammar and the input string a + b. We use a to keep track of the handle as the input will be looked from left to right and it has to take care of a handle in the right sentential form. So, to accommodate this, assume the parser has processed a and reduced it to E. Then, the current state can be represented by E +E where means E has already been parsed and + E is a potential suffix, which, if determined, yields to a successful parse. Thus our ultimate aim is to finally reach state E+E, which corresponds to an actual handle yielding to the reduction E E+E. LR parsing works by building an automata where each state represents what has been parsed so far and what we intend to parse after looking at the current input symbol. This is indicated by productions having a These productions are referred to as items. Items that have the at the end leads to the reduction by that production. This is the basis for the construction of the LR() items. The overall architecture of the LR parser is given in figure 14.1. Same architecture is applicable for SLR, CALR and LALR parsers. All the parsers will use a stack and a parsing table. The input string is appended with a $ and the stack is loaded with an initial stack symbol. The parsing table will basically have two sections, action and goto. The action field will correspond to the shift, reduce, accept and error actions. The goto field helps in manipulating the stack to decide on one of the actions.

Figure 14.1 Overall flow of the LR parser The steps involved in the SLR parser includes the following: Form the augmented grammar The augmented grammar G is constructed from grammar G (V,T,P, S) where G is defined as ({V U S }, T, P U {S S}, S ) where a new start symbol is added and this new start symbol produces the start symbol S of the grammar G. The purpose of this new start symbol is to indicate the completion of successful parsing Construction of LR(0) items The LR(0) items is constructed from the augmented grammar by placing a at appropriate places. Parsing Table construction - Construct follow() for all the non-terminals which require construction of first() for all the terminals and non-terminals for the grammar G. Using the LR(0) items, and follow( ) of the grammar G, construct the parsing table Using the parsing table and a stack, parse the input to verify whether the string belongs to the grammar or not. In this module we will discuss the construction of the LR(0) items. 14.4 LR(0) items An LR(0) item also referred to as the canonical LR(0) items of a grammar G is a production of G with a at some position of the right-hand side. Thus, a production A X Y Z has four items: [A X Y Z]

[A X Y Z] [A X Y Z] [A X Y Z ] The production A has one item [A ] The items can be divided into Kernel items and Non-kernel items. The kernel items are the ones which has a as the first symbol in the RHS of the productions. All other items are referred to as non-kernel items. The algorithm for construction of LR(0) items is based on two functions, Closure() and goto(). The Closure (I) is defined as the Closure of item I, and is constructed as the set of items from the current item I based on two rules: 1. Initially every item in I is added to closure (I) 2. If A α.bβ is in closure (I), and B, is a production, then add the item B to I if it is not already there. Keep adding productions till no more new items could be added to I The computation of closure is given in algorithm 14.1 Algorithm 14.1 Closure(I) { } J := I repeat For each item [A B ] J then for each production B in the grammar of G such, add the item [B ] to J until no new items can be added to J return J; The first step of algorithm 14.1 starts with one item, by introducing a in the beginning of the RHS of the production and copies this to a set J. Then we apply the step 2 of the rules discussed and implement it as a repeat until loop till no more items can be added in a sequential and iterative fashion.

The next step of the items construction is the goto() function. This function is defined as goto(i, X) where I is the current item being considered and X is the grammar symbol. Goto(I, X) is defined using the following rules. Compute the closure of the set of all items A αx.β such that A α.xβ is present in I Algorithm 14.2 explains the procedure to compute goto(i, X). Intuitively, goto(i,x) is the set of items that are valid for the viable prefix X when I is the set of items that are valid for. Algorithm 14.2 GOTO(I, X) { } repeat For each item [A X ] I, add the set of items closure({[a X ]}) to goto(i,x) if not already there until no more items can be added to goto(i,x) As can be understood from algorithm 14.2 the body of the repeat-until loop considers every item shifts the dot by one position to the right and form this item as the first one of a new item set. Then, for the shifted item, closure is computed using the procedure already discussed. After understanding the procedure to compute Closure(I) and goto(i, X) we could now devise the algorithm to construct, C the canonical collection of sets of LR(0) items for the augmented grammar G. The grammar G is augmented with a new start symbol S and production S S The construction procedure is given in algorithm 14.3 Algorithm 14.3 Items(G ) { C = closure({[s S]}) repeat For each set of items I C and each grammar symbol X (N T) such that goto(i,x) C and goto(i,x), add the set of items goto(i,x) to C until no more sets can be added to C }

As can be seen from algorithm 14.3, the first production of the augmented grammar is considered as the first item and compute the closure of it. From, this first Item set, we consider every item and compute goto(i, X) and keep creating item sets till no more new item set could be formed. Example 14.1 Let us construct the set of items for the Expression grammar. The expression grammar is augmented with a new start symbol E and a new production E E. To start the grammar is written with productions numbered. 0. E E 1. E E + T 2. E T 3. T T * F 4. T F 5. F (E) 6. F id From algorithm 14.3, we need to add the item E. E and find the closure of this to add it to the first item set. The complete set of items is given in Table 14.1 14.1 Itemset construction for the expression grammar Item Set of items Goto Comments E.E This is the Add the initial item and compute closure E. E +T first item, so no Goto There is a non-terminal E after the dot in the RHS, so add the productions of the nonterminal E E.T Added based on the E appearing after the dot. Now T appears after the dot. I 0 T.T * F Add productions of T T.F Here, F appears after the dot. So, add the production of F F. (E) Productions of F are added based on the previous situation F. id There is no non-terminal appearing after the I 1 E E. E E. + T dot. Therefore that complete this item set (I 0, E) = I 1 From I 0 we consider E appearing after the dot, shift the dot by one position and compute the closure for the resultant. Here there are 2 items, that has E after the dot. So, we shift the dot and add it. The dot now comes as the last symbol, thus no more items could be added to I 1 I 2 E T. (I 0, T), Two production in I 0 has a T after the. and (I 3, T) = I 2 we shift the dot by one position to get the new item T T.*F For both these items, after shifting the dot by

I 3 F (.E) E.E + T E.T (I 0, ( ), (I 3, ( ), (I 6, ( ) (I 10,( ) = I 3 T.T * F T.F F.(E) F.id I 4 T F. (I 0, F), (I 3, F), (I 6, F) = I 4 I 5 F id. (I 0, id), (I 3, id) (I 6, id), one position, it has no symbol or terminal thus no more new items could be added We shift the dot and create the first item We see a non-terminal after the dot. So, we add the non-kernel items of E Adding the items of E, results in to add the items of T and F Only one item and it terminates as it is a kernel item Only one item and it terminates as it is a kernel item From I we move the dot by one position and (I 10, id) = I 5 I 6 E E +. T (I 1, +), (I 7, +) = I 6 1 observer a T after the dot. So we add the productions of T which results in to add the productions of F T.T * F T.F F.(E) F.id I 7 F (E.) (I 3, E) = I 7 The production where there is a dot before the E in the RHS is considered and the dot is shifted by one position to the right thus resulting in two productions E E. +T I 8 E E + T. (I 6, T) = I 8 T T. * F I 9 F (E). (I 7, ) ) = I 9 I 10 T T *. F (I 8, * ), (I 2, * ) = I 10 F.(E) F.id I 11 T T * F. (I 10, F ) = I 11 So to summarize table 14.1, we observe the non-terminal / terminal appearing in the RHS after the dot and we compute goto by writing the item with the dot shifted by one position to the right. After shifting we compute closure of the shifted item. This result in adding the items if the new symbol that is appearing after the dot is a non-terminal, where we add items of that non-

terminal. As can be seen from the goto() column of table 14.1, certain items set results in repetition and thus we refer it as the same item number that is already existing This items set is stored as a DFA and is given in figure 14.2 The nodes of the DFA correspond to the item number. The edges corresponds to the grammar symbols and the transition function is derived from the goto() function. The final states are the items that have a dot at the end of the symbol. Summary: In this module we discussed the types of LR parsers and learnt the pre-processing step of LR(0) items construction in a Simple LR parsers. In the next module we will discuss the construction of the SLR parsing table and the parsing action of the SLR parser.