Exercises II. Exercise: Lexical Analysis

xercises II Text adapted from : Alessandro Artale, Free University of Bolzano les adapted from : nrico Cimitan, Università di Padova xercise: Lexical Analysis Describe the notions of token, token name, lexeme, and attribute, and prove examples of their use 1

xercise: Lexical Analysis Input : y = 42 The lexemes are y, =, 42 The tokens are, y, assign, num, 42 The token names are, assign, num The attributes for first and third tokens are y, 42 ; second token does not need an attribute xercise: Lexical Analysis To describe the set of lexemes we need patterns xample of patterns expressed by means of Rs Idenifier: [a-za-z][a-za-z_0-9]* (N.B. keywords ignored) Number : [0-9] + 2

xercise: Lexical Analysis During LA there are two kinds of conflicts everal portions of a lexeme are recognized by the same Rs The same lexeme is recognized by several Rs Describe how to resolve these conflicts xercise: Lexical Analysis The conflict between several portions of different lengths is resolved by taking the longest match The conflict between several Rs on the same lexeme is resolved by taking the Rs with highest precedence 3

xercise: Finite Automata Describe a lexer that recognizes entifiers and numbers (integers), and show the finite automaton xercise: Finite Automata Flex code ws [\n \t]+ %% [0-9]+{ws} {printf ( int\n );} [a-za-z][a-za-z_0-9]*{ws {printf( \n );} %% 4

xercise: Finite Automata ɛ [a-za-z] ws print start [a-za-z_0-9] ɛ [0-9] ws print «integer» [0-9] xercise: Top-Down Parsing Conser the following grammar with terminals T = { [, ], a, b, c, +, - } : [ X ] a X + Y Y b ɛ Y - X c ɛ Prove the parsing table for the LL(1) top down parser 5

Recall: FIRT() FIRT(α) = the set of terminals that begin all strings derived from α FIRT(a) = {a} if a T FIRT(ε) = {ε} FIRT(A) = A α FIRT(α) for A α P FIRT(X 1 X 2 X k ) : if for all j = 1,, i-1 : ε FIRT(X j ) then add FIRT(X i )\{ε} to FIRT(X 1 X 2 X k ) if for all j = 1,, k : ε FIRT(X j ) then add ε to FIRT(X 1 X 2 X k ) xercise: Top-Down Parsing FIRT(a) = {a}, if a T = { [, ], a, b, c, +, - } FIRT(Y) = { -, ε } FIRT() = { [, a } FIRT(X) = {+} FIRT(Y)\{ε} {b} {ε} (since Y derives ε) = { +, -, b, ε } 6

xercise: Top-Down Parsing A α FIRT(α) [ X ] [ a a X + Y + X Y b - b X ɛ ε Y - X c - Y ɛ ε Recall: FOLLOW() FOLLOW(A) = the set of terminals that can immediately follow nonterminal A FOLLOW(A) = for all (B α A β) P do add FIRT(β)\{ε} to FOLLOW(A) for all (B α A β) P and ε FIRT(β) do add FOLLOW(B) to FOLLOW(A) for all (B α A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol then add $ to FOLLOW(A) 7

xercise: Top-Down Parsing FOLLOW(X) = { ], c } FOLLOW(Y) = FOLLOW(X) {b} = { ], c, b } FOLLOW() = {$} FIRT(X)\{ε} {]} ( [X] and X ɛ) FIRT(X)\{ε} {c} (Y -Xc and X ɛ) FIRT(Y)\{ε} FOLLOW(X) (X +Y and Y ɛ) (since FIRT(Y) FIRT(X) ) = {$} FIRT(X) \{ε} {]} {c} FOLLOW(X) = = { $, +, -, b, ], c } xercise: Top-Down Parsing A X Y FOLLOW(A) $ + - b ] c ] c ] c b 8

Recall: Constructing an LL(1) Predictive Parsing Table for each production A α do for each a FIRT(α) do add A α to M[A, a] enddo if ε FIRT(α) then for each b FOLLOW(A) do add A α to M[A, b] enddo endif enddo Mark each undefined entry in M error xercise: Top-Down Parsing FIRT & FOLLOW as computed before : A α FIRT(α) A FOLLOW(A) [ X ] [ a a $ + - b ] c X + Y + X Y b - b X ] c X ɛ ε Y - X c - Y ɛ ε Y ] c b 9

xercise: Top-Down Parsing [ ] a b c + - $ [ X ] a X X ɛ (FOLLOW) X Y b X ɛ (FOLLOW) X + Y X Y b Y Y ɛ (FOLLOW) Y ɛ (FOLLOW) Y ɛ (FOLLOW) Y - X c Productions marked as FOLLOW are inserted in the second phase of the algorithm xercise: Top-Down Parsing Prove the stack and the moves of the LL(1) parser on input [ a b ] 10

xercise: Top-Down Parsing tack $ $ ] X [ $ ] X $ ] X a $ ] X $ ] b Y $ ] b $ ] $ Input [ a b ] $ [ a b ] $ a b ] $ a b ] $ b ] $ b ] $ b ] $ ] $ $ Production applied [ X ] (match!) a (match!) X Y b Y ɛ (match!) (match!) (end! accept.) xercise: Top-Down Parsing xplain the backtracking technique for top-down parsing and prove an example consering the grammar [ X ] a X + Y Y b ɛ Y - X c ɛ 11

xercise: Top-Down Parsing Backtracking is exploited in top-down parsers that do not use predictive parsing The parser keeps a record of all previous decisions for production application, and sets up a trial-and-error strategy Note that choosing a wrong production leads the repeated reading of a portion of the input xercise: Top-Down Parsing tack $ $ ] X [ $ ] X $ ] X ] X [ Input [ a b ] $ [ a b ] $ a b ] $ a b ] $ Production applied [ X ] (match!) [ X ] Not matching! Back to last choice that can be changed (8 ) 8 choice (1/2) 8 choice (1/2) 8 marks choices where alternatives are available 12

xercise: Top-Down Parsing tack $ $ ] X [ $ ] X $ ] X a $ ] X $ ] Y + Input [ a b ] $ [ a b ] $ a b ] $ a b ] $ b ] $ b ] $ Production applied [ X ] (match!) a (match!) X + Y Not matching! Back to last choice that can be changed (8 ) 8 choice (1/2) choice (2/2) 8 choice (1/3) xercise: Top-Down Parsing tack $ $ ] X [ $ ] X $ ] X a $ ] X $ ] b Y $ ] b c X - Input [ a b ] $ [ a b ] $ a b ] $ a b ] $ b ] $ b ] $ b ] $ Production applied [ X ] (match!) a (match!) X Y b Y - X c Not matching! Back to last choice that can be changed (8 ) 8 choice (1/2) choice (2/2) 8 choice (2/3) 8 choice (1/2) 13

xercise: Top-Down Parsing tack $ $ ] X [ $ ] X $ ] X a $ ] X $ ] b Y $ ] b $ ] $ Input [ a b ] $ [ a b ] $ a b ] $ a b ] $ b ] $ b ] $ b ] $ ] $ $ Production applied [ X ] (match!) a (match!) X Y b Y ɛ (match!) (match!) (end! accept.) 8 choice (1/2) choice (2/2) 8 choice (2/3) choice (2/2) xercise: Bottom-Up Parsing Conser the grammar L ; L T VL T array Idx of T int VL, VL Idx num Prove the parsing table for the LR bottom-up parser 14

Recall: Function closure() 1. tart with closure(i) = I 2. If [A α Bβ] closure(i) then for each production B γ in the grammar, add the item [B γ] to I if not already in I 3. Repeat 2 until no new items can be added Recall: Function goto() 1. For each [A α Xβ] I, add [A αx β] to goto(i, X), if not already there 2. Compute closure() of the resulting set 15

Recall: build LR(0) collection teps: - augment grammar with initial production - start from the state containing the closure of the set containing only the item derived from the initial production: closure( {[ ʹ ]} ) - add iteratively the states that are reachable from the existing states using goto(i,x), for some state I already added and some symbol (terminal or nonterminal) X Recall: build the collection Procedure: C = { closure( {[ ʹ ]} ) } repeat for each set of items I in C and each grammar symbol X such that goto(i, X) is not empty and not in C do add goto(i, X) to C until no new sets of items can be added to C 16

xercise: Bottom-Up Parsing Augmented grammar L ʹ L L ; L T VL T array Idx of T int VL, VL Idx num In the next sle, 8 means kernel item xercise: Bottom-Up Parsing tate 0 8 L ʹ L L ; L L T VL T array Idx of T T int tate 1 8 T array Idx of T Idx num tate 2 8 T int tate 3 8 L ʹ L tate 4 8 L ; L 8 L tate 5 8 T VL VL, VL VL tate 6 8 T array Idx of T tate 7 8 Idx num tate 8 8 L ; L L ; L L T VL T array Idx of T T int tate 9 8 T VL tate 10 8 VL, VL 8 VL tate 11 8 T array Idx of T T array Idx of T T int tate 12 8 L ; L tate 13 8 VL, VL VL, VL VL tate 14 8 T array Idx of T tate 15 8 VL, VL 17

start xercise: Bottom-Up Parsing Idx 6 1 num array 7 int 2 int array L 0 3 8 ; 4 T T VL 5 9 of 11 T array 14 int L 12 15, VL 10 13 Recall: LR parsing table 1. Augment the grammar with L L P done 2. Construct the set C={I 0,I 1,,I n } of LR(0) states P done 3. If [A α aβ] I i and goto(i i, a)=i j then set action[i, a]=shift j 4. If [A α ] I i then set action[i, a]=reduce A α for all a FOLLOW(A) (apply only if A L ) O need FOLLOW! 5. If [L L ] is in I i then set action[i, $]=accept 6. If goto(i i, A)=I j then set goto[i, A]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item [ ] 18

xercise: Bottom-Up Parsing L ʹ L L ; L T VL T array Idx of T int VL, VL Idx num A FOLLOW(A) Idx of T FIRT(VL)= L $ ; FOLLOW(L)= $ VL FOLLOW()= ; $ 1. L ʹ L 2. L ; L 3. L 4. T VL 5. T array Idx of T 6. T int 7. VL, VL 8. VL 9. Idx num states action array int num, ; of $ L VL T Idx 0 s1 s2 3 4 5 1 s7 6 2 r6 3 acc 4 s8 r3 5 s10 9 6 s11 7 r9 8 s1 s2 12 4 5 9 r4 r4 10 s13 r8 r8 sx : shift & go to state x 11 s1 s2 14 ry : reduce using y 12 r2 Red entries: LR 13 s10 15 reduce actions derived from 14 r5 lookahead (FOLLOW) 15 r7 r7 goto 19

xercise: Bottom-Up Parsing how the stack and the moves of the LR parser on input array 5 of int x xercise: Bottom-Up Parsing tack $ 0 $ 0 $ 0 array 1 $ 0 array 1 5 7 $ 0 array 1 Idx 6 $ 0 array 1 Idx 6 of 11 $ 0 array 1 Idx 6 of 11 int 2 $ 0 array 1 Idx 6 of 11 T 14 $ 0 T 5 $ 0 T 5 x 10 $ 0 T 5 VL 9 $ 0 4 $ 0 L 3 Input array 5 of int x $ array 5 of int x $ 5 of int x $ of int x $ of int x $ int x $ x $ x $ x $ $ $ $ $ Action start from state 0 shift & goto 1 shift & goto 7 reduce with 9: Idx num goto(1,idx)=6 shift & goto 11 shift & goto 2 reduce with 6: T int goto(11,t)=14 reduce with 5: T array Idx of T goto(0,t)=5 shift & goto 10 reduce with 8: VL goto(5,vl)=9 reduce with 4: T VL goto(0,)=4 reduce with 3: L goto(0,l)=3 action(3,$)= accept! 20

xercise: emantic Analysis 4) Conser the DD in the next sles, where newlabel() generates a fresh symbolic label newtemp() generates a fresh variable name gen() generates strings, is string concatenation code is the attribute containing 3AC.place is the name of the variable associated to the token relop.op is a comparison operator (<, <=, =, ) xercise 4: emantic Analysis Productions Prog 1 ; 2 if Test then{ 1 } = emantic rules.next = newlabel(); Prog.code =.code gen(.next : ) 1.next = newlabel(); 2.next =.next;.code = 1.code gen( 1.next : ) 2.code Test.true = newlabel(); Test.false =.next; 1.next =.next;.code = Test.code gen(test.true : ) 1.code.code =.code gen(.place =.place) 21

xercise 4: emantic Analysis Productions Test 1 relop 2 1 + emantic rules Test.code = gen( if 1.place relop.op 2.place goto Test.true) gen( goto Test.false).place = newtemp();.code = 1.code gen(.place = 1.place +.place).place =.place;.code = xercise 4: emantic Analysis Conser the input: if y > w then { y = x + z}; x = z + v 22

xercise 4: emantic Analysis 4.1) how the annotated parse tree (without the code attribute) for the input together with the values of the attributes xercise 4: emantic Analysis Prog ; if Test relop then { } = + = + if y > w then { y = x + z } ; x = z + v 23

xercise 4: emantic Analysis Prog.next = LABL3.next = LABL2 ;.next = LABL3 if then Test.true = LABL1 Test.false = LABL2 {.next = LABL2 }.place = y =.place =t1.place = x =.place = z +.place =t2.place = v.place =x +.place = z.place = y.place = w.place = z relop.op = > Id.place = x if y > w then { y = x + z } ; x = z + v xercise 4: emantic Analysis Prog ; if Test then { } : = relop : = + + if y > w then { y : = x + z } ; x : = z + v 24

xercise 4: emantic Analysis code next = LABL3 code Prog ; code next = LABL3 place = t2 code Place = x = + place = z code Place = z Place = v xercise 4: emantic Analysis Prog ; if Test relop then { } = + = + if y > w then { y = x + z } ; x = z + v 25

xercise 4: emantic Analysis true = LABL1 false = LABL2 code code next = LABL2 code next = LABL2 if Test then { } = place = t1 code relop Place = y + Place = z op = > place = x code Place = y Place = w Place = x xercise 4: emantic Analysis 4.2) how the three-address code produced by the semantic actions for the given input 26

xercise 4: emantic Analysis Prog Test if y > x goto LABL1 goto LABL2 LABL1 : t1 = x + z y = t1 LABL2 : t2 = z + v x = t2 LABL3 : xercise 4: emantic Analysis 4.3) Give the definition of inherited attribute. For the rule if Test then { 1 } show what are the synthesized attributes and what are the inherited attributes. 27

xercise 4: emantic Analysis In semantic analysis, an attribute of a nonterminal A 1 is called inherited if it is derived only from operations on attributes of A 1 s parent or siblings in the parsing tree; i.e., if it is computed in the semantic rule associated to a production with A 1 in the right-hand se. On the other hand, synthesized attributes of A 1 are generated only from the below, i.e. from the attributes of the children of A 1. In other words, in the rule where A 1 appears in the left-hand se. xercise 4: emantic Analysis Production if Test then { 1 } emantic rules Test.true = newlabel(); ynthesized (convention) Test.false =.next; Inherited 1.next =.next; Inherited.code = Test.code gen(test.true : ) 1.code ynthesized 28