2 Goals Definition of the syntax of a programming language using context free grammars Methods for parsing of programs determine whether a program is syntactically correct Advantages (of grammars): Precise, easily comprehensible language definition Automatic construction of parsers Declaration of the structure of a programming language (important for translation and error detection) Easy language extensions and modifications F. Wotawa TU Graz) Compiler Construction Summer term / 309
3 Tasks source program lexical analyser token get next token parser parse tree rest of the front end intermediate representation symbol table Parser types: Universal parsers (inefficient) Topdownparser Bottomupparser Only subclasses of grammars (LL, LR) Collect token informations Type checking Immediate code generation F. Wotawa TU Graz) Compiler Construction Summer term / 309
4 Syntax error handling Error types: Lexical errors (spelling of a keyword) Syntactic errors (closing bracket is missing) Semantic errors (operand is incompatible to operator) Logic Errors (infinite loop) Tasks: Exact error description Error recovery consecutive errors should be detectable Error correction should not slow down the processing of correct programs F. Wotawa TU Graz) Compiler Construction Summer term / 309
5 Problems during error handling Spurious Errors: Consecutive errors created by error recovery Example: Compiler issues errorrecovery resulting in the removal of the declaration of pi Error during semantic analysis: pi undefined Error is detected late in the process error message does not point to the correct position within the code Too many error messages are issued F. Wotawa TU Graz) Compiler Construction Summer term / 309
6 Errorrecovery Panic mode: Skip symbols until input can by synchronized to a token Phraselevel recovery: Local error corrections, e.g. replacement of, by a ; Error productions: Extension of grammar to handle common errors Global correction: Minimal correction of program in order to find a matching derivation (cost intensive) F. Wotawa TU Graz) Compiler Construction Summer term / 309
7 Grammars Grammar A grammar is a 4tupel G = (V N, V T, S, Φ) whereby: V N Set of nonterminal symbols V T Set of terminal symbols S V N Start symbol Φ : (V N V T ) V N (V N V T ) (V N V T ) Set of production rules (rewriting rules) (α, β) is represented as α β Example: ({S, A, Z}, {a, b, 1, 2}, S, {S AZ, A a, A b, Z ɛ, Z 1, Z 2, Z ZZ}) F. Wotawa TU Graz) Compiler Construction Summer term / 309
8 Derivations in grammars Direct derivation σ, ψ (V T V N ). σ can be directly derived from ψ (in one step; ψ σ), if there are two strings φ 1, φ 2, so that σ = φ 1 βφ 2 and ψ = φ 1 αφ 2 and α β Φ. Example: ψ σ Rule used φ 1 φ 2 S A Z S AZ ɛ ɛ az a1 Z 1 a ɛ AZZ A2Z Z 2 A Z F. Wotawa TU Graz) Compiler Construction Summer term / 309
9 Derivation Production: A string ψ produces σ (ψ + σ), if there are strings φ 0,..., φ n (n > 0), so that ψ = φ 0 φ 1, φ 1 φ 2,..., φ n 1 φ n = σ. Example: S AZ AZZ A2Z a2z a21 Reflexive, transitive closure: ψ σ ψ + σ or ψ = σ Accepted language: A grammar G accepts the following language L(G) = {σ S σ, σ (V T ) } F. Wotawa TU Graz) Compiler Construction Summer term / 309
10 Parse trees Example: E E + E E E id 2 derivations (and parse trees) for id+id*id E E E + E E * E id E * E E + E id id id id id Grammar is ambiguous F. Wotawa TU Graz) Compiler Construction Summer term / 309
11 Classification of grammars Chomsky (restriction of production rules α β) Unrestricted Grammar: no restrictions ContextSensitive Grammar: α β ContextFree Grammar: α β and α V N Regular Grammar: α β, α V N and β is in the form of: ab or a whereby a V T and B V N F. Wotawa TU Graz) Compiler Construction Summer term / 309
12 Grammar examples Regular grammar: (a b) abb A 0 aa 0 ba 0 aa 1 A 1 ba 2 A 2 ba 3 A 3 ɛ Contextsensitive grammars: L 1 = {wcw w (a b) } But L 1 = {wcw R w (a b) } is contextfree L 2 = {a n b m c n d m n 1, m 1} L 3 = {a n b n c n n 1} F. Wotawa TU Graz) Compiler Construction Summer term / 309
13 Conversions Remove ambiguities stmnt if expr then stmnt if expr then stmnt else stmnt other 2 parse trees for if E 1 then if E 2 then S 1 else S 2. smtn smtn if expr then smtn E1 if expr then smtn else smtn if expr then smtn else smtn E1 S2 if expr then smtn E2 S1 S2 E2 S1 Prefer left tree Associate each else with the closest preceding then F. Wotawa TU Graz) Compiler Construction Summer term / 309
14 Removing left recursions A grammar is leftrecursive if there is a nonterminal A and a production A + Aα TopDownParsing can t handle left recursions Example: convert A Aα β to: A βa 1 A 1 αa 1 ɛ F. Wotawa TU Graz) Compiler Construction Summer term / 309
15 Algorithm to eliminate left recursions Input: Grammar G without cycles and ɛproductions Output: Grammar without left recursions Arrange the nonterminals in some order A 1, A 2,..., A n for i := 1 to n do for j := 1 to i 1 do Replace each production A i A j γ by the productions A i δ 1 γ... δ k γ, where A j δ 1... δ k are all the current A j production end Eliminate the immediate left recursion among the A i productions end F. Wotawa TU Graz) Compiler Construction Summer term / 309
16 Left factoring Important for predictive parsing Elimination of alternative productions stmnt if expr then stmnt else stmnt Example: if expr then stmnt Solution: For each nonterminal A find the longest prefix α for two or more alternative productions If α ɛ then replace all Aproductions A αβ 1 αβ 2... αβ n γ (γ does not start with α) with: A αa 1 γ A 1 β 1 β 2... β n Apply transformation until no prefixes α ɛ can be found F. Wotawa TU Graz) Compiler Construction Summer term / 309
17 Topdownparsing Idea: Construct parse tree for a given input, starting at root node Recursivedescent parsing (with backtracking) Example: S cad A ab a Matching of cad c S A (1) Predictive parsing (without backtracking, special case of recursivedescent parsing) Leftrecursive grammars can lead to infinite loops! d c a S A (2) b d c S A a (3) d F. Wotawa TU Graz) Compiler Construction Summer term / 309
18 Predictive parsers Recursivedescent parser without backtracking Possible if production which needs to be used is obvious for each input symbol Transition diagrams 1 Remove left recursions 2 Left factoring 3 For each nonterminal A: 1 Create a initial state and an end state 2 For each production A X 1X 2... X n create a path leading from the initial state to the end state while labeling the edges X 1,..., X n F. Wotawa TU Graz) Compiler Construction Summer term / 309
19 Predictive parsers (II) Processing: 1 Start at the initial state of the current start symbol 2 Suppose we are currently in the state s which has an edge whose label contains a terminal a and leads to the state t. If the next input symbol is a then go to state t and read a new input symbol. 3 Suppose the edge (from s) is marked by a nonterminal A. In that case go to the initial state of A (without reading a new input symbol). If we reach the end state of A then go to state t which is succeeding s. 4 If the edge is marked by ɛ then go directly to t without reading the input. Easily implemented by recursive procedures F. Wotawa TU Graz) Compiler Construction Summer term / 309
20 Example  Predictive parser E() E1() E ide 1 (E) E 1 ope ɛ if nexttoken=id then getnexttoken E1() if nexttoken=( then getnexttoken E() if nexttoken=) then akzept if nexttoken=op then getnexttoken E() else return E: E1: id 0 1 ( 2 op E ε E1 E ) 3 4 F. Wotawa TU Graz) Compiler Construction Summer term / 309
21 Nonrecursive predictive parser INPUT a + b $ STACK X Y Z $ Predictive Parsing Program OUTPUT Parsing Table M Input buffer: String to be parsed (terminated by a $) Stack: Initialized with the start symbol and contains nonterminals wich are not derivated yet (terminated by a $) Parsing table M(A, a), A is a nonterminal, a a terminal or $ F. Wotawa TU Graz) Compiler Construction Summer term / 309
22 Topdown parsing with stack Mode of operation: X is top element of stack, a the current input symbol 1 X is a terminal: If X = a = $, then the input was matched. If X = a $, pop X off the stack and read next input symbol. Otherwise an error occured. 2 X ist a nonterminal: Fetch entry of M(X, a). If this entry is an error skip to error recovery. Otherwise the entry is a production of the form X UV W. Replace X on the stack with W V U (afterward U is the top most element on the stack). F. Wotawa TU Graz) Compiler Construction Summer term / 309
23 Example Grammar E id E 1 (E) E 1 op E ɛ Parsing table M(X, a) ONTERMINAL id op ( ) $ E E id E 1 E (E) E 1 E op E E 1 ɛ E 1 ɛ Derivation of id op id. F. Wotawa TU Graz) Compiler Construction Summer term / 309
24 Example (II) STACK INPUT OUTPUT $ E id op id $ $ E 1 id id op id $ E id E 1 $ E 1 op id $ $ E op op id $ E 1 op E $ E id $ $ E 1 id id $ E id E 1 $ E 1 $ $ $ E 1 ɛ F. Wotawa TU Graz) Compiler Construction Summer term / 309
25 FIRST and FOLLOW Used when calculating parse table F IRST (α) Set of terminals, which can be derived from α (α string of grammar symbols) F OLLOW (A) Set of terminals which occur directly on the right side next to the nonterminal A in a derivation If A is the right most element of a derivation, then $ is contained in F OLLOW (A) F. Wotawa TU Graz) Compiler Construction Summer term / 309
26 Calculation of FIRST F IRST (X) for a grammar symbol X 1 X is a terminal: F IRST (X) = {X} 2 X ɛ is a production: Add ɛ to F IRST (X) 3 X is a nonterminal and X Y 1 Y 2... Y k is a production a is in F IRST (X) if: 1 An i exists; a is in F IRST (Y i) and ɛ is in every set F IRST (Y 1)... F IRST (Y i 1) 2 a = ɛ and ɛ is in every set F IRST (Y 1)... F IRST (Y k ) F IRST (X 1 X 2... X n ): Each nonɛ symbol of F IRST (X 1 ) is in the result If ɛ F IRST (X 1 ), then each nonɛ symbol of F IRST (X 2 ) is in the result and so on Is ɛ in every F IRST set, then it it also is contained in the result F. Wotawa TU Graz) Compiler Construction Summer term / 309
27 Calculation of FOLLOW In order to calculate F OLLOW (A) of a nonterminal A use following rules: 1 Add $ to F OLLOW (S), whereby S is the initial symbol 2 For each production A αbβ, add all elements of F IRST (β) except ɛ to F OLLOW (B) 3 For each production A αb and A αbβ with ɛ F IRST (β), add each element of F OLLOW (A) to F OLLOW (B) F. Wotawa TU Graz) Compiler Construction Summer term / 309
28 Example Grammar: FIRST sets: FOLLOW sets: E id E 1 (E) E 1 op E ɛ F IRST (E) = {id, (} F IRST (E 1 ) = {op, ɛ} F OLLOW (E) = F OLLOW (E 1 ) = {$, )} F. Wotawa TU Graz) Compiler Construction Summer term / 309
29 Construction of parsing tables Input: Grammar G Output: Parsing table M 1 For each production A α do Steps 2 and 3. 2 For each terminal a in F IRST (α), add A α to M(A, a). 3 If ɛ is in F IRST (α), add A α to M(A, b) for each terminal b in F OLLOW (A). If ɛ is in F IRST (α) and $ is in F OLLOW (A), add A α to M(A, $) 4 Make each undefined entry of M be error. Example: See table of last example grammar F. Wotawa TU Graz) Compiler Construction Summer term / 309
30 LL(1) Grammars Parsing table construction can be used with arbitrary grammars Multiple elements per entry may occur LL(1) Grammar: Grammar whose parsing table contains no multiple entries L... Scanning the Input from LEFT to right L... Producing the LEFTMOST derivation 1... Using 1 input symbol lookahead F. Wotawa TU Graz) Compiler Construction Summer term / 309
31 Properties of LL(1) No ambiguous or leftrecursive grammar is LL(1) G ist LL(1) For each two different productions A α β it is neccessary that: 1 No strings may be derived from both α and β which start with the same terminal a 2 At most one of the productions α or β may be derivable to ɛ 3 If β ɛ, then α may not derive any string which starts with an element in F OLLOW (A) Multiple entries in the parsing table can occasionally be removed by hand (without changing the language recognized by the automaton) F. Wotawa TU Graz) Compiler Construction Summer term / 309
32 Errorrecovery in predictive parsing Heuristics in panicmode error recovery: 1 Initially, all symbols in F OLLOW (A) can be used for synchronisation: Skip all tokens until an element in F OLLOW (A) is read and remove A from the stack. 2 If F OLLOW sets don t suffice: Use hierarchical structure of program constructs. E.g. use keywords occuring at the beginning of a statement as addition to the synchronisation set. 3 F IRST (A) can be used as well: If an element in F IRST (A) is read, continue parsing at A. 4 If a terminal which can t be matched is at the top of the stack, remove it. F. Wotawa TU Graz) Compiler Construction Summer term / 309
33 Bottomup parsing Shiftreduce parsing Reduction of an input towards the start symbol of the grammar Reduction step: Replace a substring, which matches the right side of a production with the left side of that same production Example: S aabe A Abc b B d abbcde aabcde aade aabe S F. Wotawa TU Graz) Compiler Construction Summer term / 309
34 Handles Substring, which matches the right side of a production and leads to a valid derivation (rightmost derivation) Example (ambiguous grammar): E E + E E E E E (E) E id Rightmost derivation of id + id * id: RightSentential Form Handle Reducing Production id + id * id id E id id + id * E id E id id + E * E E E E E E id + E id E id E + E E + E E E + E E F. Wotawa TU Graz) Compiler Construction Summer term / 309
35 Stack implementation Initially: Stack Input $ w$ Shift n 0 symbols from input onto stack until a handle can be found Reduce handle (replace handle with left side of production) F. Wotawa TU Graz) Compiler Construction Summer term / 309
36 Example shiftreduce parsing Stack Input Action (1) $ id + id * id $ shift (2) $ id + id * id $ reduce by E id (3) $ E + id * id $ shift (4) $ E+ id * id $ shift (5) $ E+ id * id $ reduce by E id (6) $ E + E * id $ shift (7) $ E + E id $ shift (8) $ E + E id $ reduce by E id (9) $ E + E E $ reduce by E E E (10) $ E + E $ reduce by E E + E (11) $ E $ accept F. Wotawa TU Graz) Compiler Construction Summer term / 309
37 Viable prefixes, conflicts Viable prefix: Right sentential forms which can occur within the stack of a shiftreduce parser Conflicts: (Ambiguous grammars) stmt if expr then stmt if expr then stmt else stmt other Configuration: Stack Input... if expr then stmt else... No unambiguous handle (shiftreduce conflict) F. Wotawa TU Graz) Compiler Construction Summer term / 309
38 LR parser LR(k) parsing L... Lefttoright scanning R... Rightmost derivation in reverse Advantages: Can be used for (nearly) every programming language construct Most generic backtrackfree shiftreduce parsing method Class of LRgrammars is greater than those of LLgrammars LRparsers identify errors as early as possible Disadvantage: LRparser is hard to build manually F. Wotawa TU Graz) Compiler Construction Summer term / 309
39 LRparsing algorithm INPUT a... a... 1 i a n $ STACK s m Xm s m1 Xm1... LR Parsing Program OUTPUT s 0 action goto Stack stores s 0 X 1 s 1 X 2 s 2... X m s m (X i grammar, s i state) Parsing table = action and gototable s m current state, a i current input symbol action[s m, a i ] {shift, reduce, accept, error} goto[s m, a i ] transition function of a DFA F. Wotawa TU Graz) Compiler Construction Summer term / 309
40 LRparsing mode of operation Configuration (s 0 X 1 s 1... X m s m, a i a i+1... a n ) Next step (move) is determined by reading of a i Dependent on action[s m, a i ]: 1 action[s m, a i ] = shift s New configuration: (s 0 X 1 s 1... X m s m a i s, a i+1... a n ) 2 action[s m, a i ] = reduce A β New configuration: (s 0 X 1 s 1... X m r s m r As, a i a i+1... a n ) whereby s = goto[s m r, A], r length of β 3 action[s m, a i ] = accept 4 action[s m, a i ] = error F. Wotawa TU Graz) Compiler Construction Summer term / 309
41 Example ) E E + T ) E T ) T T F ) T F ) F (E) ) F id State action goto id + * ( ) $ E T F 0 s5 s s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s r6 r6 r6 r6 6 s5 s s5 s s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 F. Wotawa TU Graz) Compiler Construction Summer term / 309
42 Construction of SLR parsing tables LR(0)items: Production with dot at one position of the right side Example: Production A XY Z has 4 items: A.XY Z, A X.Y Z, A XY.Z and A XY Z. Exception: Produktion A ɛ only has the item: A. Augmented grammar: Grammar with new start symbol S and production S S. Functions: closure and goto closure(i) (I... set of items) 1 All I are within closure 2 If A α.bβ is part of closure and B γ is a production, then add B.γ to closure F. Wotawa TU Graz) Compiler Construction Summer term / 309
43 Construction, goto goto(i, X) with I as set of items and X a symbol of the grammar goto = closure of set of all items A αx.β for all A α.xβ in I Example: I = {E E., E E. + T } goto(i, +) = {E E +.T, T.T F, T.F, F.(E), F.id} Setsofitems construction (Construction of all LR(0)items) items(g ) I 0 = closure({s.s}) C = {I 0 } repeat for each set of items I C and each grammar symbol X such that goto(i, X) is not empty and not in C do Add goto(i, X) to C until no more sets of items can be added to C F. Wotawa TU Graz) Compiler Construction Summer term / 309
44 SLR parsing table Input: Augmented grammar G Output: SLR parsing table 1 Calculate C = {I 0, I 1,..., I n }, the set of LR(0)items of G 2 State i is created by I i as follows: 1 If A α.aβ is in I i and goto(i i, a) = I j, then action(i, a) = shift j (a is a terminal symbol) 2 If A α. is in I i, then action[i, a] = reduce A α for all a F OLLOW (A) A S 3 If S S. is in I i, then action[i, $] = accept 3 For all nonterminal symbols A: goto[i, A] = j if goto(i i, A) = I j 4 Every other table entry is set to error 5 Initial state is determined by the item set with S.S F. Wotawa TU Graz) Compiler Construction Summer term / 309
45 SLR(1), conflicts, error handling If we recieve a table without multiple entries using the SLRparsingtablealgorithm then the grammar is SLR(1) Otherwise the algorithm fails and an algorithm for extended languages (like LR) needs to be utilized generally results in increased processing requirements Shift/reduceconflicts can be partially resolved The process usually involves the determination of operator binding strength and associativity Error handling can be directly incorporated into the parsing table F. Wotawa TU Graz) Compiler Construction Summer term / 309
