Syntax Analysis 1
Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token, tokenval Get next token Parser and rest of front-end Intermediate representation Lexical error Syntax error Semantic error Symbol Table 2
The Role Of Parser A parser implements a C-F grammar The role of the parser is twofold: 1. To check syntax (= string recognizer) And to report syntax errors accurately 2. To invoke semantic actions For static semantics checking, e.g. type checking of expressions, functions, etc. For syntax-directed translation of the source code to an intermediate representation 3
Syntax-Directed Translation One of the major roles of the parser is to produce an intermediate representation (IR) of the source program using syntax-directed translation methods Possible IR output: Abstract syntax trees (ASTs) Control-flow graphs (CFGs) with triples, three-address code, or register transfer list notation WHIRL (SGI Pro64 compiler) has 5 IR levels! 4
Error Handling A good compiler should assist in identifying and locating errors Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect 5
Viable-Prefix Property The viable-prefix property of LL/LR parsers allows early detection of syntax errors Goal: detection of an error as soon as possible without further consuming unnecessary input How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language Prefix for (;) Error is detected here Prefix Error is detected here DO 10 I = 1;0 6
Error Recovery Strategies Panic mode Discard input until a token in a set of designated synchronizing tokens is found Phrase-level recovery Perform local correction on the input to repair the error Error productions Augment grammar with productions for erroneous constructs Global correction Choose a minimal sequence of changes to obtain a global least-cost correction 7
Grammars (Recap) Context-free grammar is a 4-tuple G = (N, T, P, S) where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form where (N T)* N (N T)* and (N T)* S N is a designated start symbol 8
Notational Conventions Used Terminals a,b,c, T specific terminals: 0, 1, id, + Nonterminals A,B,C, N specific nonterminals: expr, term, stmt Grammar symbols X,Y,Z (N T) Strings of terminals u,v,w,x,y,z T* Strings of grammar symbols,, (N T)* 9
Derivations (Recap) The one-step derivation is defined by A where A is a production in the grammar In addition, we define is leftmost lm if does not contain a nonterminal is rightmost rm if does not contain a nonterminal Transitive closure * (zero or more steps) Positive closure + (one or more steps) The language generated by G is defined by L(G) = {w T* S + w} 10
Derivation (Example) Grammar G = ({E}, {+,*,(,),-,id}, P, E) with productions P = E E + E E E * E E ( E ) E - E E id Example derivations: E - E - id E rm E + E rm E + id rm id + id E * E E * id + id E + id * id + id 11
Chomsky Hierarchy: Language Classification A grammar G is said to be Regular if it is right linear where each production is of the form A w B or A w or left linear where each production is of the form A B w or A w Context free if each production is of the form A where A N and (N T)* Context sensitive if each production is of the form A where A N,,, (N T)*, > 0 Unrestricted 12
Chomsky Hierarchy L(regular) L(context free) L(context sensitive) L(unrestricted) Where L(T) = { L(G) G is of type T } That is: the set of all languages generated by grammars G of type T Examples: Every finite language is regular! (construct a FSA for strings in L(G)) L 1 = { a n b n n 1 } is context free L 2 = { a n b n c n n 1 } is context sensitive 13
Parsing Universal (any C-F grammar) Cocke-Younger-Kasimi Earley Top-down (C-F grammar with restrictions) Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods Bottom-up (C-F grammar with restrictions) Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods SLR, canonical LR, LALR 14
Top-Down Parsing LL methods (Left-to-right, Leftmost derivation) and recursive-descent parsing Grammar: E T + T T ( E ) T - E T id Leftmost derivation: E lm T + T lm id + T lm id + id E E E E T T T T T T + id + id + id 15
Left Recursion (Recap) Productions of the form A A are left recursive When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs 16
General Left Recursion Elimination Method Arrange the nonterminals in some order A 1, A 2,, A n for i = 1,, n do for j = 1,, i-1 do replace each A i A j with A i 1 2 k where enddo A j 1 2 k enddo eliminate the immediate left recursion in A i 17
Immediate Left-Recursion Elimination Method Rewrite every left-recursive production A A A into a right-recursive production: A A R A R A R A R A R 18
Example Left Recursion Elim. A B C a B C A A b C A B C C a Choose arrangement: A, B, C i = 1: i = 2, j = 1: i = 3, j = 1: i = 3, j = 2: nothing to do B C A A b B C A B C b a b (imm) B C A B R a b B R B R C b B R C A B C C a C B C B a B C C a C B C B a B C C a C C A B R C B a b B R C B a B C C a (imm) C a b B R C B C R a B C R a C R C R A B R C B C R C C R 19
Left Factoring When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing Replace productions A 1 2 n with A A R A R 1 2 n 20
Predictive Parsing Eliminate left recursion from grammar Left factor the grammar Compute FIRST and FOLLOW Two variants: Recursive (recursive calls) Non-recursive (table-driven) 21
FIRST (Revisited) FIRST( ) = { the set of terminals that begin all strings derived from } FIRST(a) = {a} if a T FIRST( ) = { } FIRST(A) = A FIRST( ) for A P FIRST(X 1 X 2 X k ) = if for all j = 1,, i-1 : FIRST(X j ) then add non- in FIRST(X i ) to FIRST(X 1 X 2 X k ) if for all j = 1,, k : FIRST(X j ) then add to FIRST(X 1 X 2 X k ) 22
FOLLOW FOLLOW(A) = { the set of terminals that can immediately follow nonterminal A } FOLLOW(A) = for all (B A ) P do add FIRST( )\{ } to FOLLOW(A) for all (B A ) P and FIRST( ) do add FOLLOW(B) to FOLLOW(A) for all (B A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A) 23
Red : A Blue : First Set (2) G 0 S ase S B B bbe B C C cce C d
Red : A Step 1: Blue : First (S ase) First (S B ) First (B bbe) First Set (2) = First(a) ={a} = First(B) = First(b)={b} First (B C ) = First(C) First (C cce) = First(c) ={c} First (C d ) = First(d)={d} G 0 S ase S B B bbe B C C cce C d
Red : A Step 1: Blue : First (S ase) First (S B ) First (B bbe) First Set (2) = {a} = First(B) = {b} First (B C ) = First(C) First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d}
Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = First(B) = {b} First(C) First (B bbe) = {b} First (B C ) = First(C) First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d}
Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) First (B bbe) = {b} First (B C ) = First(C) First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C)
Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) First (B bbe) = {b} First (B C ) = First(C) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C)
Red : A Step 2: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) First (B bbe) = {b} First (B C ) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d}
Red : A Step 3: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b} First(C) = {b} {c, d} First (B bbe) = {b} First (B C ) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d}
Red : A Step 3: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b, c, d} First (B bbe) = {b} First (B C ) = {c, d} First (C cce) = {c} First (C d ) = {d} G 0 S ase S B B bbe B C C cce C d Step Step 1 Step 2 First Set S B C a b c d {a} First(B) {b} First(C) {c, d} {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d} Step {a} {b} {c, d} = {a,b,c,d} {b} {c, d} = {b,c,d} {c, d}
Red : A Step 3: Blue : First (S ase) First Set (2) = {a} First (S B ) = {b, c, d} First (B bbe) = {b} First (B C ) = {c, d} First (C cce) First (C d ) Step = {c} = {d} G 0 S ase S B B bbe B C C cce C d If no more change The first set of a terminal symbol is itself First Set S B C a b c d Step 1 {a} First(B) {b} First(C) {c, d} Step 2 {a} {b} First(C) {b} {c, d} = {b,c,d} {c, d} Step 3 {a} {b} {c, d} = {a,b,c,d} {b} {c, d} = {b,c,d} {c, d} {a} {b} {c} {d}
Another Example.
Red : A Blue : First Set (2) G 0 S ABc A a A B b B
Red : A Step 1: Blue : First (S ABc) First (A a ) First (A ) First Set (2) = First(ABc) = First(a) = First( ) First( ) First (B b ) = First(b) First (B ) = First( ) First( ) G 0 S ABc A a A B b B
Red : A Step 1: Blue : First (S ABc) First (A a ) First (A ) First (B b ) = {b} First (B ) = { } First Set (2) = First(ABc) = {a} = { } G 0 S ABc A a A B b B Step First Set S A B a b c Step 1 First(ABc) {a, } {b, }
Red : A Step 2: Blue : First Set (2) First (S ABc) = First(ABc) = {a, } = {a, } - { } First(Bc) = {a} First(Bc) First (A a )) = {a} First (A ) = { } First (B b ) = {b} G 0 S ABc A a A B b B First (B ) = { } Step First Set S A B a b c Step 1 First(ABc) {a, } {b, } Step 2 {a} First(Bc) {a, } {b, }
Red : A Step 3: Blue : First (S ABc) First (A a ) First (A ) First (B b ) First (B ) First Set (2) = {a} First(Bc) = {a} {b, } = {a} {b, } - { } First(c) = {a} {b,c} = {a} = { } = {b} Step First Set = { } G 0 S ABc A a A B b B S A B a b c Step 1 First(ABc) {a, } {b, } Step 2 {a} First(Bc) {a, } {b, } Step 3 {a} {b, c}= {a,b,c} {a, } {b, }
Red : A Step 3: Blue : First (S ABc) First (A a ) First (A ) First Set (2) = {a,b,c} = {a} = { } First (B b ) = {b} First (B ) = { } Step Step 1 Step 2 First Set G 0 S ABc A a A B b B If no more change The first set of a terminal symbol is itself S A B a b c First(ABc) {a, } {b, } {a} First(Bc) {a, } {b, } Step {a} {b, c}= {a,b,c} {a, } {b, } {a} {b} {c}
LL(1) Grammar A grammar G is LL(1) if it is not left recursive and for each collection of productions A 1 2 n for nonterminal A the following holds: 1. FIRST( i ) FIRST( j ) = for all i j 2. if i * then 2.a. j * for all i j 2.b. FIRST( j ) FOLLOW(A) = for all i j 41
Non-LL(1) Examples Grammar S S a a S a S a S a R R S S a R a R S Not LL(1) because: Left recursive FIRST(a S) FIRST(a) For R: S * and * For R: FIRST(S) FOLLOW(R) 42
Recursive Descent Parsing (Recap) Grammar must be LL(1) Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information 43
Using FIRST and FOLLOW to Write a Recursive Descent Parser expr term rest rest + term rest - term rest term id procedure rest(); begin if lookahead in FIRST(+ term rest) then match( + ); term(); rest() else if lookahead in FIRST(- term rest) then match( - ); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; FIRST(+ term rest) = { + } FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ } 44
Non-Recursive Predictive Parsing: Table-Driven Parsing Given an LL(1) grammar G = (N, T, P, S) construct a table M[A,a] for A N, a T and use a driver program with a stack input a + b $ stack X Y Z $ Predictive parsing program (driver) Parsing table M output 45
Constructing an LL(1) Predictive Parsing Table for each production A do for each a FIRST( ) do add A to M[A,a] enddo if FIRST( ) then for each b FOLLOW(A) do add A to M[A,b] enddo endif enddo Mark each undefined entry in M error 46
Example Table E T E R E R + T E R T F T R T R * F T R F ( E ) id A FIRST( ) FOLLOW(A) E T E R ( id $ ) E R + T E R + $ ) E R $ ) T F T R ( id + $ ) T R * F T R * + $ ) T R + $ ) F ( E ) ( * + $ ) F id id * + $ ) id + * ( ) $ E E T E R E T E R E R E R + T E R E R E R T T F T R T F T R T R T R T R * F T R T R T R F F id F ( E ) 47
LL(1) Grammars are Unambiguous Ambiguous grammar S i E t S S R a S R e S E b Error: duplicate table entry A FIRST( ) FOLLOW(A) S i E t S S R i e $ S a a e $ S R e S e e $ S R e $ E b b t a b e i t $ S S a S i E t S S R S R E E b S R S R e S S R 48
Predictive Parsing Program (Driver) push($) push(s) a := lookahead repeat X := pop() if X is a terminal or X = $ then match(x) // moves to next token and a := lookahead else if M[X,a] = X Y 1 Y 2 Y k then push(y k, Y k-1,, Y 2, Y 1 ) // such that Y 1 is on top invoke actions and/or produce IR output else error() endif until X = $ 49
Example Table-Driven Parsing Stack $E $E R T $E R T R F $E R T R id $E R T R $E R $E R T+ $E R T $E R T R F $E R T R id $E R T R $E R T R F* $E R T R F $E R T R id $E R T R $E R $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Production applied E T E R T F T R F id T R E R + T E R T F T R F id T R * F T R F id T R E R R 50
Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Cons: Can be automated Error messages are needed FOLLOW(E) = { ) $ } FOLLOW(E R ) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(T R ) = { + ) $ } FOLLOW(F) = { + * ) $ } id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R E R T T F T R synch T F T R synch synch T R T R T R * F T R T R T R F F id synch synch F ( E ) synch synch synch: the driver pops current nonterminal A and skips input till synch token or skips input until one of FIRST(A) is found 51
Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Cons: Can be automated Recovery not always intuitive Can then continue here id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R E R T T F T R synch T F T R synch synch T R insert * T R T R * F T R T R T R F F id synch synch F ( E ) synch synch insert *: driver inserts missing * and retries the production 52
Error Productions E T E R E R + T E R T F T R T R * F T R F ( E ) id Add error production : T R F T R to ignore missing *, e.g.: id id Pro: Cons: Powerful recovery method Cannot be automated id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R E R T T F T R synch T F T R synch synch T R T R F T R T R T R * F T R T R T R F F id synch synch F ( E ) synch synch 53
Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation) SLR, Canonical LR, LALR Other special cases: Shift-reduce parsing Operator-precedence parsing 54
Operator-Precedence Parsing Special case of shift-reduce parsing We will not further discuss (you can skip textbook section 4.6) 55
Shift-Reduce Parsing Grammar: S a A B e A A b c b B d These match These match production s right-hand sides Reducing a sentence: a b b c d e a A b c d e a A d e a A B e S Shift-reduce corresponds to a rightmost derivation: S rm a A B e rm a A d e rm a A b c d e rm a b b c d e S A A A A A A B A B a b b c d e a b b c d e a b b c d e a b b c d e 56
Handles A handle is a substring of grammar symbols in a right-sentential form that matches a right-hand side of a production Grammar: S a A B e A A b c b B d a b b c d e a A b c d e a A d e a A B e S Handle a b b c d e a A b c d e a A A e? NOT a handle, because further reductions will fail (result is not a sentential form) 57
Stack Implementation of Shift-Reduce Parsing Grammar: E E + E E E * E E ( E ) E id Find handles to reduce Stack $ $id $E $E+ $E+id $E+E $E+E* $E+E*id $E+E*E $E+E $E Input id+id*id$ +id*id$ +id*id$ id*id$ *id$ *id$ id$ $ $ $ $ Action shift reduce E id shift shift reduce E id shift (or reduce?) shift reduce E id reduce E E * E reduce E E + E accept How to resolve conflicts? 58
Conflicts Shift-reduce and reduce-reduce conflicts are caused by The limitations of the LR parsing method (even when the grammar is unambiguous) Ambiguity of the grammar 59
Shift-Reduce Parsing: Shift-Reduce Conflicts Ambiguous grammar: S if E then S if E then S else S other Stack $ $ if E then S Input $ else $ Action shift or reduce? Resolve in favor Resolve in favor of shift, so else matches closest if 60
Shift-Reduce Parsing: Reduce-Reduce Conflicts Grammar: C A B A a B a Stack $ $a Input aa$ a$ Action shift reduce A a or B a? Resolve in favor Resolve in favor of reduce A a, otherwise we re stuck! 61
LR(k) Parsers: Use a DFA for Shift/Reduce Decisions start 0 C A a Grammar: S C C A B A a B a 1 2 3 B a 4 5 goto(i 0,C) State I 0 : S C C A B A a State I 1 : S C goto(i 0,A) State I 2 : C A B B a State I 4 : C A B goto(i 2,B) goto(i 2,a) Can only reduce A a (not B a) goto(i 0,a) State I 3 : A a State I 5 : B a 62
DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,a) The states of the DFA are used to determine if a handle is on top of the stack State I 3 : A a Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 63
DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,A) The states of the DFA are used to determine if a handle is on top of the stack State I 2 : C A B B a Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 64
DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 2 : C A B B a goto(i 2,a) The states of the DFA are used to determine if a handle is on top of the stack State I 5 : B a Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 65
DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 2 : C A B B a goto(i 2,B) The states of the DFA are used to determine if a handle is on top of the stack State I 4 : C A B Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 66
DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,C) The states of the DFA are used to determine if a handle is on top of the stack State I 1 : S C Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 67
DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,C) The states of the DFA are used to determine if a handle is on top of the stack State I 1 : S C Stack $ 0 $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ aa$ a$ a$ $ $ $ Action start in state 0 shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C) 68
Model of an LR Parser stack input a 1... a i... a n $ S m X m S m-1 LR Parsing Algorithm output X m-1.. Action Table Goto Table S 1 X 1 S 0 s t a t e s terminals and $ four different actions s t a t e s non-terminal each item is a state number 69
A Configuration of LR Parsing Algorithm A configuration of a LR parsing is: ( S o X 1 S 1... X m S m, a i a i+1... a n $ ) Stack Rest of Input S m and a i decides the parser action by consulting the parsing action table. (Initial Stack contains just S o ) A configuration of a LR parsing represents the right sentential form: X 1... X m a i a i+1... a n $
Actions of A LR-Parser 1. shift s -- shifts the next input symbol and the state s onto the stack ( S o X 1 S 1... X m S m, a i a i+1... a n $ ) ( S o X 1 S 1... X m S m a i s, a i+1... a n $ ) 2. reduce A (or rn where n is a production number) pop 2 (=r) items from the stack; then push A and s where s=goto[s m-r,a] ( S o X 1 S 1... X m S m, a i a i+1... a n $ ) ( S o X 1 S 1... X m-r S m-r A s, a i... a n $ ) Output is the reducing production reduce A 3. Accept Parsing successfully completed 4. Error -- Parser detected an error (an empty entry in the action table)
Reduce Action pop 2 (=r) items from the stack; let us assume that = Y 1 Y 2...Y r then push A and s where s=goto[s m-r,a] ( S o X 1 S 1... X m-r S m-r Y 1 S m-r+1...y r S m, a i a i+1... a n $ ) ( S o X 1 S 1... X m-r S m-r A s, a i... a n $ ) In fact, Y 1 Y 2...Y r is a handle. X 1... X m-r A a i... a n $ X 1... X m Y 1...Y r a i a i+1... a n $
(SLR) Parsing Tables for Expression 1) E E+T 2) E T 3) T T*F 4) T F 5) F (E) 6) F id Grammar Action Table state id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 Goto Table 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5
Actions of A (S)LR-Parser -- Example stack input action output 0 id*id+id$ shift 5 0id5 *id+id$ reduce by F id F id 0F3 *id+id$ reduce by T F T F 0T2 *id+id$ shift 7 0T2*7 id+id$ shift 5 0T2*7id5 +id$ reduce by F id F id 0T2*7F10 +id$ reduce by T T*FT T*F 0T2 +id$ reduce by E T E T 0E1 +id$ shift 6 0E1+6 id$ shift 5 0E1+6id5 $ reduce by F id F id 0E1+6F3 $ reduce by T F T F 0E1+6T9$ reduce by E E+T E E+T 0E1 $ accept
SLR Grammars SLR (Simple LR): a simple extension of LR(0) shift-reduce parsing SLR eliminates some conflicts by populating the parsing table with reductions A on symbols in FOLLOW(A) S E E id + E E id State I 0 : S E E id + E E id State I 2 : Shift on + goto(i 0,id) E id + E E id goto(i 3,+) FOLLOW(E)={$} thus reduce on $ 75
SLR Parsing Table Reductions do not fill entire rows Otherwise the same as LR(0) 1. S E 2. E id + E 3. E id Shift on + 0 1 2 3 4 id + $ E s2 s2 s3 ac c r3 r2 1 4 FOLLOW(E)={$} thus reduce on $ 76
SLR Parsing An LR(0) state is a set of LR(0) items An LR(0) item is a production with a (dot) in the right-hand side Build the LR(0) DFA by Closure operation to construct LR(0) items Goto operation to determine transitions Construct the SLR parsing table from the DFA LR parser program uses the SLR parsing table to determine shift/reduce operations 77
LR(0) Items of a Grammar An LR(0) item of a grammar G is a production of G with a at some position of the right-hand side Thus, a production A X Y Z has four items: [A X Y Z] [A X Y Z] [A X Y Z] [A X Y Z ] Note that production A has one item [A ] 78
Constructing the set of LR(0) Items of a Grammar 1. The grammar is augmented with a new start symbol S and production S S 2. Initially, set C = closure({[s S]}) (this is the start state of the DFA) 3. For each set of items I C and each grammar symbol X (N T) such that goto(i,x) C and goto(i,x), add the set of items goto(i,x) to C 4. Repeat 3 until no more sets can be added to C 79
The Closure Operation for LR(0) Items 1. Initially, every LR(0) item in I is added to closure(i) 2. If [A B ] closure(i) then for each production B in the grammar, add the item [B ] to I if not already in I 3. Repeat 2 until no new items can be added 80
The Closure Operation (Example) closure({[e E]}) = { [E E] } Grammar: E E + T T T T * F F F ( E ) F id Add [E ] { [E E] [E E + T] [E T] } Add [T ] { [E E] [E E + T] [E T] [T T * F] [T F] } Add [F ] { [E E] [E E + T] [E T] [T T * F] [T F] [F ( E )] [F id] } 81
The Goto Operation for LR(0) Items 1. For each item [A X ] I, add the set of items closure({[a X ]}) to goto(i,x) if not already there 2. Repeat step 1 until no more items can be added to goto(i,x) 3. Intuitively, goto(i,x) is the set of items that are valid for the viable prefix X when I is the set of items that are valid for 82
The Goto Operation (Example 1) Suppose I = { [E E] [E E + T] [E T] [T T * F] [T F] [F ( E )] [F id] } Then goto(i,e) = closure({[e E, E E + T]}) = { [E E ] [E E + T] } Grammar: E E + T T T T * F F F ( E ) F id 83
The Goto Operation (Example 2) Suppose I = { [E E ], [E E + T] } Then goto(i,+) = closure({[e E + T]}) = { [E E + T] [T T * F] [T F] [F ( E )] [F id] } Grammar: E E + T T T T * F F F ( E ) F id 84
Constructing SLR Parsing Tables 1. Augment the grammar with S S 2. Construct the set C={I 0,I 1,,I n } of LR(0) items 3. If [A a ] I i and goto(i i,a)=i j then set action[i,a]=shift j 4. If [A ] I i then set action[i,a]=reduce A for all a FOLLOW(A) A (apply only if A S ) 5. If [S S ] is in I i then set action[i,$]=accept 6. If goto(i i,a)=i j then set goto[i,a]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item [S S] 85
The Canonical LR(0) Collection -- Example I 0 : E.E I 1 : E E. I 6 : E E+.T I 9 : E E+T. E.E+T E E.+T T.T*F T T.*F E.T T.F T.T*F I 2 : E T. F.(E) I 10 : T T*F. T.F T T.*F F.id F.(E) F.id I 3 : T F. I 7 : T T*.F I 11 : F (E). I 4 : F (.E) E.E+T E.T T.T*F T.F F.(E) F.id F.(E) F.id I 8 : F (E.) E E.+T
Transition Diagram (DFA) of Goto I 0 E I 1 T F id ( I 2 I 3 I 4 I 5 id + * E T F ( I 6 I 7 I 8 to I 2 to I 3 to I 4 Function T ) + F ( F id ( id I 9 to I 3 to I 4 to I 5 I 10 to I 4 to I 5 I 11 to I 6 * to I 7
Example SLR Grammar and LR(0) Items Augmented grammar: 1. C C 2. C A B 3. A a 4. B a I 0 = closure({[c C]}) I 1 = goto(i 0,C) = closure({[c C ]}) goto(i 0,C) State I 1 : C C final State I 4 : C A B start State I 0 : C C C A B A a goto(i 0,A) State I 2 : C A B B a goto(i 2,B) goto(i 2,a) goto(i 0,a) State I 3 : A a State I 5 : B a 88
Example SLR Parsing Table State I 0 : C C C A B A a State I 1 : C C State I 2 : C A B B a State I 3 : A a State I 4 : C A B State I 5 : B a a $ C A B start 0 C a A 1 2 3 B a 4 5 0 1 2 3 4 5 s3 s5 r3 ac c r2 r4 1 2 4 Grammar: 1. C C 2. C A B 3. A a 4. B a 89
SLR and Ambiguity Every SLR grammar is unambiguous, but not every unambiguous grammar is SLR Consider for example the unambiguous grammar S L = R R L * R id R L I 0 : S S S L=R S R L *R L id R L I 1 : S S I 2 : S L =R R L action[2,=]=s6 action[2,=]=r5 I 3 : S R no I 4 : L * R R L L *R L id Has no SLR parsing table I 5 : L id I 6 : S L= R R L L *R L id I 7 : L *R I 8 : R L I 9 : S L=R 90
LR(1) Grammars SLR too simple LR(1) parsing uses lookahead to avoid unnecessary conflicts in parsing table LR(1) item = LR(0) item + lookahead LR(0) item: [A ] LR(1) item: [A, a] 91
SLR Versus LR(1) Split the SLR states by adding LR(1) lookahead Unambiguous grammar 1. S L = R 2. R 3. L * R 4. id 5. R L I 2 : S L =R R L split S L =R R L lookahead=$ action[2,=]=s6 action[2,$]=r5 Should not reduce on =, because no right-sentential form begins with R= 92
LR(1) Items An LR(1) item [A, a] contains a lookahead terminal a, meaning already on top of the stack, expect to see a For items of the form [A, a] the lookahead a is used to reduce A only if the next input is a For items of the form [A, a] with the lookahead has no effect 93
The Closure Operation for LR(1) Items 1. Start with closure(i) = I 2. If [A B, a] closure(i) then for each production B in the grammar and each terminal b FIRST( a), add the item [B, b] to I if not already in I 3. Repeat 2 until no new items can be added 94
The Goto Operation for LR(1) Items 1. For each item [A X, a] I, add the set of items closure({[a X, a]}) to goto(i,x) if not already there 2. Repeat step 1 until no more items can be added to goto(i,x) 95
Constructing the set of LR(1) Items of a Grammar 1. Augment the grammar with a new start symbol S and production S S 2. Initially, set C = closure({[s S, $]}) (this is the start state of the DFA) 3. For each set of items I C and each grammar symbol X (N T) such that goto(i,x) C and goto(i,x), add the set of items goto(i,x) to C 4. Repeat 3 until no more sets can be added to C 96
Example Grammar and LR(1) Items Unambiguous LR(1) grammar: S L = R R L * R id R L Augment with S S LR(1) items (next slide) 97
I 0 : [S S, $] goto(i 0,S)=I 1 [S L=R, $] goto(i 0,L)=I 2 [S R, $] goto(i 0,R)=I 3 [L *R, =/$] goto(i 0,*)=I 4 [L id, =/$] goto(i 0,id)=I 5 [R L, $] goto(i 0,L)=I 2 I 6 : I 7 : [S L= R, $] goto(i 6,R)=I 9 [R L, $] goto(i 6,L)=I 10 [L *R, $] goto(i 6,*)=I 11 [L id, $] goto(i 6,id)=I 12 [L *R, =/$] I 1 : [S S, $] I 8 : [R L, =/$] I 2 : I 3 : I 4 : I 5 : [S L =R, $] goto(i 0,=)=I 6 [R L, $] [S R, $] [L * R, =/$] goto(i 4,R)=I 7 [R L, =/$] goto(i 4,L)=I 8 [L *R, =/$] goto(i 4,*)=I 4 [L id, =/$] goto(i 4,id)=I 5 [L id, =/$] I 9 : I 10 : I 11 : I 12 : I 13 : [S L=R, $] [R L, $] [L * R, $] goto(i 11,R)=I 13 [R L, $] goto(i 11,L)=I 10 [L *R, $] goto(i 11,*)=I 11 [L id, $] goto(i 11,id)=I 12 [L id, $] [L *R, $] 98
Constructing Canonical LR(1) Parsing Tables 1. Augment the grammar with S S 2. Construct the set C={I 0,I 1,,I n } of LR(1) items 3. If [A a, b] I i and goto(i i,a)=i j then set action[i,a]=shift j 4. If [A, a] I i then set action[i,a]=reduce A (apply only if A S ) 5. If [S S, $] is in I i then set action[i,$]=accept 6. If goto(i i,a)=i j then set goto[i,a]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item [S S,$] 99
Example LR(1) Parsing Table 0 id * = $ s5 s4 S L R 1 2 3 Grammar: 1. S S 2. S L = R 3. S R 4. L * R 5. L id 6. R L 1 2 3 4 5 6 7 8 9 1 0 11 1 2 s5 s1 2 s1 2 s4 s1 1 s1 1 s6 r5 r4 r6 ac c r6 r3 r5 r4 r6 r2 r6 8 7 10 4 10 13 100
LALR(1) Grammars LR(1) parsing tables have many states LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce table size Less powerful than LR(1) Will not introduce shift-reduce conflicts, because shifts do not use lookaheads May introduce reduce-reduce conflicts, but seldom do so for grammars of programming languages 101
Constructing LALR(1) Parsing Tables 1. Construct sets of LR(1) items 2. Combine LR(1) sets with sets of items that share the same first part I 4 : I 11 : [L * R, =] [R L, =] [L *R, =] [L id, =] [L * R, $] [R L, $] [L *R, $] [L id, $] [L * R, =/$] [R L, =/$] [L *R, =/$] [L id, =/$] Shorthand for two items in the same set 102
Example LALR(1) Grammar Unambiguous LR(1) grammar: S L = R R L * R id R L Augment with S S LALR(1) items (next slide) 103
I 0 : I 1 : [S S,$] goto(i 0,S)=I 1 [S L=R,$] goto(i 0,L)=I 2 [S R,$] goto(i 0,R)=I 3 [L *R,=/$] goto(i 0,*)=I 4 [L id,=/$] goto(i 0,id)=I 5 [R L,$] goto(i 0,L)=I 2 goto(i 0,=)= I 6 [S S,$] I 6 : I 7 : I 8 : [S L= R, $] goto(i 6,R)=I 8 [R L, $] goto(i 6,L)=I 9 [L *R, $] goto(i 6,*)=I 4 [L id, $] goto(i 6,id)=I 5 [L *R,=/$] [S L=R,$] I 2 : [S L =R,$] [R L,$] I 9 : [R L,=/$] I 3 : I 4 : [S R,$] [L * R,=/$] goto(i 4,R)=I 7 [R L,=/$] goto(i 4,L)=I 9 [L *R,=/$] goto(i 4,*)=I 4 [L id,=/$] goto(i 4,id)=I 5 [R L,=] [R L,$] Shorthand for two items I 5 : [L id,=/$] 104
Example LALR(1) Parsing Table 0 id * = $ s5 s4 S L R 1 2 3 Grammar: 1. S S 2. S L = R 3. S R 4. L * R 5. L id 6. R L 1 2 3 4 5 6 7 8 9 s5 s5 s4 s4 s6 r5 r4 r6 ac c r6 r3 r5 r4 r2 r6 9 7 9 8 105
LL, SLR, LR, LALR Summary LL parse tables computed using FIRST/FOLLOW Nonterminals terminals productions Computed using FIRST/FOLLOW LR parsing tables computed using closure/goto LR states terminals shift/reduce actions LR states nonterminals goto state transitions A grammar is LL(1) if its LL(1) parse table has no conflicts SLR if its SLR parse table has no conflicts LALR(1) if its LALR(1) parse table has no conflicts LR(1) if its LR(1) parse table has no conflicts 106
LL, SLR, LR, LALR Grammars LR(1) LALR(1) LL(1) SLR LR(0) 107
Dealing with Ambiguous Grammars 1. S E 2. E E + E 3. E id 0 1 2 3 4 id + $ s2 s2 s3 r3 s3/r 2 ac c r3 r2 E 1 4 $ 0 stack $ 0 E 1 + 3 E 4 input id+id+id$ +id$ When shifting on +: yields right associativity id+(id+id) Shift/reduce conflict: action[4,+] = shift 4 action[4,+] = reduce E E + E When reducing on +: yields left associativity (id+id)+id 108
Using Associativity and Precedence to Resolve Conflicts Left-associative operators: reduce Right-associative operators: shift Operator of higher precedence on stack: reduce Operator of lower precedence on stack: shift stack $ 0 S E E E + E E E * E E id $ 0 E 1 * 3 E 5 input id*id+id$ +id$ reduce E E * E 109
Error Detection in LR Parsing Canonical LR parser uses full LR(1) parse tables and will never make a single reduction before recognizing the error when a syntax error occurs on the input SLR and LALR may still reduce when a syntax error occurs on the input, but will never shift the erroneous input symbol 110
Error Recovery in LR Parsing Panic mode Pop until state with a goto on a nonterminal A is found, (where A represents a major programming construct), push A Discard input symbols until one is found in the FOLLOW set of A Phrase-level recovery Implement error routines for every error entry in table Error productions Pop until state has error production, then shift on stack Discard input until symbol is encountered that allows parsing to continue 111
ANTLR, Yacc, and Bison ANTLR tool Generates LL(k) parsers Yacc (Yet Another Compiler Compiler) Generates LALR(1) parsers Bison Improved version of Yacc 112
Creating an LALR(1) Parser with Yacc/Bison yacc specification yacc.y Yacc or Bison compiler y.tab.c y.tab.c C compiler a.out input stream a.out output stream 113
Yacc Specification A yacc specification consists of three parts: yacc declarations, and C declarations within %{ %} %% translation rules %% user-defined auxiliary procedures The translation rules are productions with actions: production 1 { semantic action 1 } production 2 { semantic action 2 } production n { semantic action n } 114
Writing a Grammar in Yacc Productions in Yacc are of the form Nonterminal: tokens/nonterminals { action } tokens/nonterminals { action } ; Tokens that are single characters can be used directly within productions, e.g. + Named tokens must be declared first in the declaration part using %token TokenName 115
Synthesized Attributes Semantic actions may refer to values of the synthesized attributes of terminals and nonterminals in a production: X : Y 1 Y 2 Y 3 Y n { action } $$ refers to the value of the attribute of X $i refers to the value of the attribute of Y i For example factor : ( expr ) { $$=$2; } factor.val=x ( expr.val=x ) $$=$2 116
Example 1 Also results in definition of #define DIGIT xxx %{ #include <ctype.h> %} %token DIGIT %% line : expr \n { printf( %d\n, $1); } ; expr : expr + term { $$ = $1 + $3; } term { $$ = $1; } ; term : term * factor { $$ = $1 * $3; } factor { $$ = $1; } ; factor : ( expr ) { $$ = $2; } DIGIT { $$ = $1; } ; %% int yylex() { int c = getchar(); if (isdigit(c)) { yylval = c- 0 ; return DIGIT; } return c; } Attribute of term (parent) Example of a very crude lexical analyzer invoked by the parser Attribute of factor (child) Attribute of token (stored in yylval) 117
Dealing With Ambiguous Grammars By defining operator precedence levels and left/right associativity of the operators, we can specify ambiguous grammars in Yacc, such as E E+E E-E E*E E/E (E) -E num To define precedence levels and associativity in Yacc s declaration part: %left + - %left * / %right UMINUS 118
Example 2 %{ #include <ctype.h> #include <stdio.h> #define YYSTYPE double %} %token NUMBER %left + - %left * / %right UMINUS %% Double type for attributes and yylval lines : lines expr \n { printf( %g\n, $2); } lines \n /* empty */ ; expr : expr + expr { $$ = $1 + $3; } expr - expr { $$ = $1 - $3; } expr * expr { $$ = $1 * $3; } expr / expr { $$ = $1 / $3; } ( expr ) { $$ = $2; } - expr %prec UMINUS { $$ = -$2; } NUMBER ; %% 119
Example 2 (cont d) %% int yylex() { int c; while ((c = getchar()) == ) ; if ((c ==. ) isdigit(c)) { ungetc(c, stdin); scanf( %lf, &yylval); return NUMBER; } return c; } int main() { if (yyparse()!= 0) fprintf(stderr, Abnormal exit\n ); return 0; } int yyerror(char *s) { fprintf(stderr, Error: %s\n, s); } Crude lexical analyzer for fp doubles and arithmetic operators Run the parser Invoked by parser to report parse errors 120
Combining Lex/Flex with Yacc/Bison yacc specification yacc.y Yacc or Bison compiler y.tab.c y.tab.h Lex specification lex.l and token definitions y.tab.h Lex or Flex compiler lex.yy.c lex.yy.c y.tab.c C compiler a.out input stream a.out output stream 121
Lex Specification for Example 2 %option noyywrap %{ #include y.tab.h extern double yylval; %} number [0-9]+\.? [0-9]*\.[0-9]+ %% [ ] { /* skip blanks */ } {number} { sscanf(yytext, %lf, &yylval); return NUMBER; } \n. { return yytext[0]; } Generated by Yacc, contains #define NUMBER xxx Defined in y.tab.c yacc -d example2.y lex example2.l gcc y.tab.c lex.yy.c./a.out bison -d -y example2.y flex example2.l gcc y.tab.c lex.yy.c./a.out 122
Error Recovery in Yacc %{ %} %% lines : lines expr \n { printf( %g\n, $2; } lines \n /* empty */ error \n { yyerror( reenter last line: ); yyerrok; } ; Error production: set error mode and skip input until newline Reset parser to normal mode 123