CSCI565 Compiler Design Spring 2015 Homework 2 - Solution Problem 1: Attributive Grammar and Syntax-Directed Translation [25 points] Conser the grammar fragment below. It allows for the declaration of scalar or single dimensional arrays in a style similar to PASCAL where the type specifier and its storage size indications precedes the variable entifiers. To simply the implementation we only allow some basic types. Also integer stands for a terminal symbol with an integer value attribute named ival. VarDeclaration TypeSpecifier Dimensions IdentifierList TypeSpecifier Dimensions IdentifierList ';' vo char float int double '[' integer ']' ε ',' IdentifierList a. [15 points] Without rewriting the grammar develop an attributive grammar and syntax-directed definition (SDD) that inserts all the symbols associated with the variable declaration in a symbol table as part of the semantic action associated with the VarDeclaration production. Your solution should also check, and report, duplicate variable symbols, in which case only the first occurrence should be inserted in the symbol table. b. [10 points] Show the values of your attributes as well as the order in which they need to be evaluated for the snippet of code fragment "int [6] a, b;". Can the evaluation of your attributes be carried out in a single pass over the parse tree? Why or why not? a. There are many possible solutions to this simple problem. Here we present an L-attributed grammar solution with the list of attributes and corresponding types as follows: Dimensions: basesize (integer, inherited); size (integer, synthesized) TypeSpecifier: basesize (integer, synthesized) IdentifierList: list (list of symbols, synthesized) vardeclaration: list (list of symbols, synthesized); size (integer, synthesized) using these attributes we can define the semantics rules as follows: VarDeclaration TypeSpecifier Dimensions IdentifierList ';' { for each in IdentifierList.list do currentsymboltable.insert(, Dimensions.size); end for TypeSpecifier vo { TypeSpecifier.baseSize = 0; char { TypeSpecifier.baseSize = 1; float { TypeSpecifier.baseSize = 4; int { TypeSpecifier.baseSize = 4; double { TypeSpecifier.baseSize = 8; Dimensions '[' integer ']' { Dimensions.Size = Dimensions.baseSize * integer.ival; /* array case */ ε { Dimensions.Size = Dimensions.baseSize; /* scalar case */ IdentifierList 0 ',' IdentifierList 1 { IdentifierList.list 0 = appendandcheck(identifierlist.list 1, ); { IdentifierList.list = { 1 of 7
b. Using these rules one possible evaluation order is from left-to right following a depth-first search pre-order traversal. VarDeclaration list = {a,b size = 24 TypeSpecifier Dimensions IdentifierList ';' list = {a,b basesize = 4 basesize = 4 size = 6 x 4 int '[' integer ']' Id list = {a ',' IdentifierList list = {b 6 a Id list = {b b 2 of 7
Problem 2: Static-Single Assignment Representation [15 points] For the sequence of instructions shown below depict an SSA-form representation (as there could be more than one). Comment on the need to save all the values at the end of the loop and how the SSA representation helps you in your evaluation of the code. Do not forget to include the φ-functions. b = 0; d =...; a =...; i =...; L1: if(i > 0) { if(a < 0){ d = 0; b = 0; else { b = b + 1; d = 1; i = i - 1; if(i < 0) goto Lbreak; goto L1; Lbreak: x = a; y = b; A possible representation in SSA is as shown below where each value associated with each variable is denoted by a subscripted index. Notice that in fact there is no loop in this code. a 1 =... b 1 = 0 i 1 =... d 1 =... L: i 2 = φ(i 1,i 3 ) b 2 = φ(b 1,b 5 ) if (i 2 <= 0) then goto Lbreak if (a 1 >= 0) then goto X d 2 = 0 b 3 = 0 goto Y X: b 4 = b 2 + 1 d 3 = 1 Y: i 3 = i 2-1 b 5 = φ(b 3,b 4 ) if (i 3 < 0) then goto Lbreak goto L1 Lbreak: b 5 = φ(b 2,b 5 ) x 1 = a 1 y 1 = b 5 As can be observed by inspection each use has a single definition point that reaches it and each value is defined only once. This is particularly tricky for variable 'b'. 3 of 7
Problem 3: Symbol Table Organization [10 points] For the PASCAL code below answer the following questions: 01: procedure main 02: integer a, b, c; 03: procedure f1(w,x); 04: integer w, x; 05: f2(w,x); 06: end; 07: procedure f2(y,z); 08: integer a, y, z; 09: procedure f3(m,n); 10: integer b, m, n; 11: c = a * b * m + f3(y,z); 12: b = a * (x + 1); 13: end; 14: f3(c,z); 15: end; 16: function f4(k) : integer; 17: integer k; 18: f4 := (k + 1); 19: end; 20:... 21: f1(a,b); 22: end; a) [05 points] Draw the symbol tables for each of the procedures in this code (including main) and show their nesting relationship by linking them via a pointer reference in the structure (or record) used to implement them in memory. Include the entries or s for the local variables, arguments and any other information you find relevant for the purposes of code generation, such as its type and location at run-time. b) [05 points] For the statement in line 12 what are the specific instance of the variables used in this statement the compiler needs to locate? Explain how the compiler obtains the data corresponding to each of these variables table at compile time. a) The figure below depicts the hierarchical structure of the procedure in this PASCAL program. main kind symbol type size var a integer 4 var b integer 4 var c integer 4 kind param param f1 symbol w x type size integer 4 integer 4 f2 kind symbol type size var a integer 4 param y integer 4 param z integer 4 f4 kind symbol type size param k integer 4 kind f3 symbol type size var b integer 4 param m integer 4 param n integer 4 b) For the statement in line 12 we simply follow the symbol table entries to find out the specific instance of each of the symbols. Given than the statement is located lexically inse the body of procedure f3 the search for symbols always begins in the symbol table for f3. In this statement "b = a*(x + 1);" the symbol b refers to the local variable of procedure f3, the symbol a to the local variable in procedure f2 and the x symbol is undefined. This is a semantic error, which the compiler needs to report as this symbol is out of scope. 4 of 7
Problem 4: Intermediate Code Generation [30 points] Conser the code generation scheme for expressions described in class. Assume that the grammar does allow for pointer dereferencing expression to consists of a simple entifier such as a->f1 or b.f2 where the symbol a refers to a reference or address of a C struct and b for a location of a struct in C. Assume for the purpose of this exercise that during a semantic analysis phase you have computed the offset of the first byte of each of the s f1 and f2 being referenced in thee expressions. In this context answer the following: a) [20 points] Derived a SDT code generation scheme that handles simple pointer references such as the ones above as well as more complicated ones with multiple pointer indirections, e.g., a->f1->f2. Not that you can combine the "->" and the "." operators. You need to include as part of your answer relevant productions of the grammar. b) [10 points] Show your code generation scheme for the simple expression a = b->f1 + c.f2 assuming that b and c are declared in the C programming language as shown below. typedef struct { int f1; B; typedef struct { int y; int f2; C; int a; B* b; C* c; a) Below we depict a possible parse tree that will help us structure the grammar and corresponding productions, attributes and semantic rules for this problem. All the attributes should be synthesized as that helps the integration with a bottom-up parser and if not also with a single bottom-up traversal of the tree. type: pointer: place: code: symbol boolean temporary name list of instructions '=' Using these attributes we also defined a set of simple auxiliary functions, namely: b f2 type :: getoffset(user-defined type, _name); symboltable :: gettype(symbol_name); symboltable :: gettypeof Field(type_name, _name); a f1 As to the semantic rules we can define them as follows: assign '=' exp 1 exp 1 exp 2 '+' exp 3 { t4 = newtemp(); assign.place = t4; assign.code = append(exp 1.code, gen('t4 = exp 1';)); assign.pointer = exp 1.pointer; assign.type = igettype(.symbol); { t3 = newtemp(); exp 1.place = t3; exp 1.code = append(exp 2.code, exp 3.code,gen(t3 = exp 2.place + exp 3.place)); exp 1.pointer = exp 2.pointer; exp 1.type = exp 2.type; 5 of 7
exp { exp.code =.code; exp.type =.type; exp.pointer =.pointer; exp.place =.place; 1 2 1 2 '.' 1 { offset = getfieldoffset( 2.type,.symbol); t2 = newtemp(); 1.place = t2; 1.type = gettypeoffield( 2.type,.symbol); if( ispointertype( 2.type) ){ 1.code = append( 2.code,{ gen('t2 = 2.place + offset'; 't2 = *t2;)); 1.pointer = true; else { error { offset = getfieldoffset( 2.type,.symbol); t2 = newtemp(); 1.place = t2; 1.type = gettypeoffield( 2.type,.symbol); if( ispointertype( 2.type) ){ error; else { 1.code = append( 2.code,{ gen('t2 = 2.place + offset'; 't2 = *t2;)); 1.pointer = ispointertype( 1.type); { 1.type = gettype(.symbol); 1.pointer = ispointertype(.symbol)); t1 = newtemp(); 1.place = t1; 1.code = { gen('t1 =.symbol;'); b) Using the attribute grammar outlined above for the expression: '=' place= a t3 = c; t4=t3+4; t4 = *t4; t5 = t3 + t4; c = t5; place= a code= null '+' place= t5 t3 = c; t4=t3+4; t4 = *t4; t5 = t3 + t4; a exp place= t2 place= t2 exp place= t4 code= t3 = c; t4=t3+4; t4 = *t4; place= t4 code= t3 = c; t4=t3+4; t4 = *t4; type= B* place= t1 pointer= true code= 't1 = b' f1 type= C* place= t3 pointer= true code= 't3 = c' f2 b c 6 of 7
Problem 5: Back-patching of Loop Constructs [20 points] We have covered in class an SDT scheme to generated code using the back-patching technique for a while loop construct. In this exercise you will develop a similar scheme for the repeat-while construct using the production below and also taking into account continue and break statements. Argue that your solution works for the case of nested loops and break and continue statements at different nesting levels. (1) S repeat L while E; (2) S continue; (3) S break; (4) L S ; L (4) L S Do not forget to show the augmented production with the marker non-terminal symbols, M and possibly N along with the corresponding rules for the additional symbols and productions. Argue for the correctness of your solution without necessarily having to show an example. We have seen in class a possible approach to this SDT scheme is to have additional synthesized attributes for the statements, respectively a nextlist a skiplist and a breaklist. In the skiplist are the addresses of unresolved goto instructions that correspond to continue statements whereas in the break list are the addresses of unresolved goto instructions that correspond to break statements. The nextlist corresponds to addresses of unresolved goto instructions that follow the regular control-flow, as it is the case of regular instructions or if-then-else constructs. While the skiplist need to be patched with the addresses of the first instruction of the current nesting level, i.e. the first instructions that evaluates the control predicate of the loop, the breaklist needs to be patched with the first address following the current S construct. This cannot be immediately recognized at this level in the back-patching and thus the address of the goto in the breaklist is passed up as part of the synthesized attribute nextlist of S. (1) S repeat M 1 L while M 2 E; { backpatch(l.nextlist,m 2.quad); backpatch(l.skiplist,m 2.quad); backpatch(e.truelist.m 1.quad); S.nextlist = merge(e.falselist,l.breaklist); S.breaklist = nil; (2) S continue; { S.skiplist = newlist(nextaddr()); emit( goto ); S.breaklist = nil; S.nextlist = nil; (3) S break; { S.breaklist = newlist(nextaddr()); emit( goto ); S.nextlist = nil; S.skiplist = nil; (4) L 1 S ; M 2 L 2 { backpatch(s.nextlist, M 2.quad); L 1.nextlist = L 2.nextlist; L 1.breaklist = merge(s.breaklist,l 2.breaklist); L 1.skiplist = merge(s.skiplist,l 2.skiplist); (5) L S { L.breaklist = S.breaklist; L.nextlist = S.nextlist; L.skiplist = S.skiplist; (6) M 1 ε { M 1.quad = nextaddr; (7) M 2 ε { M 2.quad = nextaddr; Regarding the first production, the first back-patching command fills in the places where the control in L is transferred to the next iteration, that is, to the evaluation of the conditional E that is given by the M 2.quad value. The second backpatching command links the places where the evaluation of E is false to M 1.quad that is to the top of the loop. Next we merge the places where are goto instructions with the E.falselist as both these have addresses where the goto instructions will transfer control to the first instruction following the loop. The continue generates a single entry in a skiplist whereas the break generates a single entry in a breaklist of the corresponding S symbol. Regarding the sequencing of statement in production (4) we have to link the addresses from continue instructions in S with the first instruction in the predicate of the while construct in which the continue is nested. This is accomplished by using the skiplist attribute. The addresses that correspond to break instructions in either and S or L 2 need to be merged whereas the L 1.nextlist is simply the locations that need to be filled in with the addresses after L 2 which is only known at the next level up. Note that nested loop will have the break instruction just to the nest level up (see the role of L.breaklist in (5) and S.nextlist in (1) and then the S.nextlist in (4) where it is patched to M 2.quad. 7 of 7