LR Parsers Aditi Raste, CCOEW 1
LR Parsers Most powerful shift-reduce parsers and yet efficient. LR(k) parsing L : left to right scanning of input R : constructing rightmost derivation in reverse k : number of symbols of lookahead that are used in making parsing decisions ( when k is omitted, k is assumed to be 1) Aditi Raste, CCOEW 2
Why LR parsers? Most general table driven non-backtracking and efficient shift-reduce parsing. Class of grammars that can be parsed using LR method is a proper superset of class of grammars that can be parsed with predictive parsers LL(1) grammars C LR(1) grammars Detects a syntactic error as soon as it is possible to do so on a left to right scan of the input. LR parsers can be constructed to recognize virtually all programming language constructs for whi h CFG s a e writte. Aditi Raste, CCOEW 3
LR parser The principle drawback of the LR method is that it is too much work to construct an LR parser by hand for a typical programming language grammar. Fortunately this process can be automated. Many parser generators are available. Parser generators can locate ambiguous constructs in the grammar or constructs that are difficult to parse in a left-to-right scan of the input and also can provide detailed diagnostic messages. Aditi Raste, CCOEW 4
Flavours of LR parsers SLR :- Simple left and right parser LR(1):- Canonical LR. Most general LR parser LALR:- Lookahead LR parser. Intermediate LR SLR, LR(1) and LALR use the same algorithm for parsing but differ only in their parsing tables. Powers relative to each other SLR 1 LALR 1 LR 1 Aditi Raste, CCOEW 5
SLR (Simple LR) Aditi Raste, CCOEW 6
SLR Parser How does a shift-reduce parser know when to shift and when to reduce? T cannot be a handle here Right Sentential form Handle Production id * id id F -> id F * id F T -> F T * id id F -> id T * F T * F T -> T * F T T E-> T Aditi Raste, CCOEW 7
SLR Parser An LR parser makes shift-reduce decisions by maintaining states to keep track of where we are in a parse. States represent set of items. An item indicates how much of a production we have seen at a given point in the parsing process. Aditi Raste, CCOEW 8
Model of LR Parser Input a + b $ Stack X Y LR Parsing Program Output Z Driver program $ s0. ACTION GOTO LR parsing table sn Aditi Raste, CCOEW 9
Tasks of the driver program Invoke lexical analyzer for next token Initialize stack with start symbol Act like an FA Determine the state Sj on tos and ai the current input symbol Determine the action corresponding to [Sj, ai] Aditi Raste, CCOEW 10
Actions of LR parser Si : Shift and stack state i rj : Reduce by production rule numbered j Accept Error Aditi Raste, CCOEW 11
LR Parser Example Input a + b $ Stack X Y LR Parsing Program Output Z $ s0. ACTION GOTO LR parsing table sn Aditi Raste, CCOEW 12
LR Parser Example Consider expression grammar 1. E -> E + T 2. E -> T 3. T -> T * F 4. T -> F 5. F -> (E) 6. F -> id Aditi Raste, CCOEW 13
Parsing table for expression grammar Parse the input string id + id * id State ACTION GOTO id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 Aditi Raste, CCOEW 14
Working of the LR parser Aditi Raste, CCOEW 15
Working of the LR parser CFG Construction of LR(0) items Construction of LR parsing table Parsing of input string Input string Output Aditi Raste, CCOEW 16
LR(0) Items An LR(0) item is a string [α] where, α is a production from Grammar with a. at some position in the RHS. The indicates how much of the item we have seen at a given state in the parse. [A = XYZ] indicates that the parser is looking for a string that can be derived from XYZ. [A = XY Z] indicates that the parser has seen a string derived from XY and is looking for one derivable from Z. Aditi Raste, CCOEW 17
LR(0) Items (no lookahead) A => XYZ generates 4 LR(0) items 1. [ A => XYZ] 2. [ A => X YZ] 3. [ A => XY Z] 4. [ A => XYZ ] Aditi Raste, CCOEW 18
Canonical LR(0) Collection of Items The SLR table construction algorithm uses a specific set of sets of LR(0) items. These sets are called canonical collection of sets of LR(0) items for grammar G. The canonical collection represents the set of valid states for an LR parser. Aditi Raste, CCOEW 19
Canonical LR(0) Collection of Items To construct the canonical LR(0) collection for a grammar, we define Augmented grammar Two functions:- CLOSURE function GOTO function Aditi Raste, CCOEW 20
Canonical LR(0) Collection of Items Augmented Grammar If G is a gra ar with start s ol the G, the augmented grammar for G, is G with a new start s ol a d produ tio => Purpose: Augmented grammar tells the parser when to stop parsing and announce the acceptance of the input. Acceptance only occurs when and only when the parser is about to reduce =>. Aditi Raste, CCOEW 21
States of the PDA (Closure of Item sets) Each LR(0) item corresponds to a point in the parse. To generate a parser state from an LR(0) item we take its closure. Aditi Raste, CCOEW 22
Closure of set of items Suppose I is a set of items, we define CLOSURE(I) as, (i) Every item in I is in CLOSURE(I) (ii) If A => α Bβ is in COSURE(I) and B => ϒ is a production then add the item B => ϒ to I (if not already in I) Apply this rule until no more new items can be added to CLOSURE(I). Aditi Raste, CCOEW 23
Set of Items Two classes a) Kernel items:- (i) I itial ite => (ii) All items whose dots are not at the left end b) Non Kernel items:- All items with their dots at the left end e ept for => Aditi Raste, CCOEW 24
Transitions of PDA ( GOTO function) There will be a transition from one state to another state for each grammar symbol in an item that immediately follows the marker in an item in that state. If an item in the state is [A => α Xβ ] then transition from this state occurs when X is processed. transition is to the state that is the closure of the item [ A => αx β ] Aditi Raste, CCOEW 25
GOTO(I, X) I : set of items X: grammar symbol GOTO ( I,X) function GOTO(I,X) is defined to be the closure of the set of all items [ A => αx β ] such that [ A => α Xβ ] is in I. Aditi Raste, CCOEW 26