The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER COMPILERS ANSWERS. Time allowed TWO hours

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER 2009 2010 COMPILERS ANSWERS Time allowed TWO hours Candidates may complete the front cover of their answer book and sign their desk card but must NOT write anything else until the start of the examination period is announced. Answer THREE questions No calculators are permitted in this examination. Dictionaries are not allowed with one exception. Those whose first language is not English may use a standard translation dictionary to translate between that language and English provided that neither language is the subject of this examination. Subject-specific translation directories are not permitted. No electronic devices capable of storing and retrieving text, including electronic dictionaries, may be used. Note: ANSWERS Turn Over

2 Question 1 (a) The following is a valid MiniTriangle program (the line numbers are for reference only). Introduce five compile-time errors into the code by making small changes (like adding, changing, or deleting a lexical symbol or two). There should be two different syntactic errors, two different type errors, and one other contextual error. For each error, explain why it is an error and what kind of an error it is. You can consider each error in isolation: it is not necessary to consider any interaction between the introduced errors. The syntax of MiniTriangle is given in Appendix A. 1 let 2 var k : Integer 3 in 4 begin 5 k := 10; 6 while k > 0 do 7 let 8 const k2 : Integer = k * k 9 in 10 begin 11 putint(k2); 12 k := k - 1 13 end 14 end Answer: Many possibilities. For example: (10) Change var to varr at line 2. This is a syntax error as the keyword let has to be followed by a declaration, and a declaration must start with either the keyword var or the keyword const according to the MiniTriangle concrete syntax. Leave out = k * k from line 8. This is a syntax error as a constant definition must include a defining expression according to the MiniTriangle concrete syntax. Declare k to be of type Char. This will result in a type error e.g. at line 5 where an integer literal is being assigned to k. Leave out > 0 from line 6. This means the condition of the while loop will no longer be an expression of type Bool which will cause a type error to be reported. Replace k2 by e.g. z at line 11. This will result in a contextual error as no variable named z has been defined.

3 (b) Draw the abstract syntax tree for the following MiniTriangle fragment: let var p : Bool := k > 42 in if p then putint(((k+1))) else k := 0 The relevant grammar is given in Appendix A. Start from the production for Command. (5) Answer: Abstract sytax tree. Variable spelling terminals have been annotated with their spellings. These annotations are typeset in typewriter font. CmdLet DeclVar 1 Name TDBaseType ExpApp p Name ExpVar ExpVar ExpLitInt Bool Name Name IntegerLiteral > k 42 1 CmdIf ExpVar CmdCall CmdAssign Name ExpVar ExpApp ExpVar ExpLitInt p Name ExpVar ExpVar ExpLitInt Name IntegerLiteral putint Name Name IntegerLiteral k 0 + k 1 Turn Over

4 (c) Construct a BNF grammar for expressions and numeric literals according to the following: The only form of basic expressions are numeric literals and the literal null (the empty list). The operators are as follows, in decreasing order of precedence, along with their arity (binary or unary) and associativity: Operator Arity Associativity unary N/A + binary left <, =, > binary non : binary right Unary operators do not have any associativity, hence N/A (not applicable). Note that the three operators <, =, and > have the same precedence. Non-associative means that e.g. 3 < 5 < 7 should not be allowed: explicit parentheses would have to be used to make the meaning clear. Parentheses are used for grouping in the standard way. The numeric literals are non-empty sequences of decimal digits (0 9), possibly including a single decimal point. A numeric literal may not start or end with a decimal point. The entire grammar should be unambiguous and constructed in such a way that the parse tree of an expression reflects operator precedence and associativity. Answer: For example: (10) Exp1 Exp2 : Exp1 Exp2 Exp2 Exp3 < Exp3 Exp3 = Exp3 Exp3 > Exp3 Exp3 Exp3 Exp3 + Exp4 Exp4 Exp4 - Prim Prim Prim LitNum Null ( Exp1 ) LitNum Digits Digits. Digits Digits Digit Digits Digit Digit 0 1 2 3 4 5 6 7 8 9

5 Question 2 (a) Consider the following HTML-like document markup language. The terminals of the language are word and the following element tags: Element Start Tag End Tag Level 1 Heading <h1> </h1> Level 2 Heading <h2> </h2> Paragraph <p> </p> Bold <b> </b> The following BNF grammar specifies the syntax of valid documents. D is the start symbol: D H1 D1 D1 H1 D1 H2 D1 P D1 ǫ H1 <h1> Ws </h1> H2 <h2> Ws </h2> P <p> Ws </p> Ws W Ws W W word <b> Ws </b> Construct recursive-descent parsing functions for the non-terminals D, D1, H1, and W. We are only interested in checking whether the sequence of input tokens conforms to the grammar, not in building any structured representation Work in a functional style in Haskell, where a parsing function takes a list of tokens (along with any other arguments needed) and, in the case of a successful parse, using an option type like Haskell s Maybe, returns the list of remaining tokens, otherwise returns an indication of failure, like Nothing. The parsing functions for H2, P, and Ws are assumed to be given with signatures (in Haskell syntax): parseh2 :: [Token] -> Maybe [Token] parsep :: [Token] -> Maybe [Token] parsews :: [Token] -> Maybe [Token] The token type is defined as follows (in Haskell syntax): data Token = H1S H1E -- Heading 1 start and end H2S H2E -- Heading 2 start and end PS PE -- Paragraph start and end BS BE -- Bold start and end Word [Char] -- Word (10) Turn Over

6 Answer: The grammar is not left-recursive. Thus no need to first transform the grammar to eliminate left recursion. The grammar is also trivially suitable for predictive parsing. Thus the grammar can be transliterated straightforwardly into e.g. Haskell as follows ( all parsing functions are given for the sake of completeness): parsed :: [Token] -> Maybe [Token] parsed ts = case parseh1 ts of Nothing -> Nothing Just ts1 -> parsed1 ts1 parsed1 :: [Token] -> Maybe [Token] parsed1 ts = case ts of H1S : _ -> case parseh1 ts of Just ts1 -> parsed1 ts1 H2S : _ -> case parseh2 ts of Just ts1 -> parsed1 ts1 PS : _ -> case parsep ts of Just ts1 -> parsed1 ts1 _ -> Just ts -- Success on epsilon parseh1 :: [Token] -> Maybe [Token] parseh1 ts = case ts of H1S : ts1 -> case parsews ts1 of Just (H1E : ts2) -> Just ts2 parseh2 :: [Token] -> Maybe [Token] parseh2 ts = case ts of H2S : ts1 -> case parsews ts1 of Just (H2E : ts2) -> Just ts2 parsep :: [Token] -> Maybe [Token]

7 parsep ts = case ts of PS : ts1 -> case parsews ts1 of Just (PE : ts2) -> Just ts2 parsews :: [Token] -> Maybe [Token] parsews ts = case parsew ts of Just ts1 -> case parsews ts1 of Just ts2 -> Just ts2 _ -> Just ts1 parsew :: [Token] -> Maybe [Token] parsew ts = case ts of Word _ : ts1 -> Just ts1 BS : ts1 -> case parsews ts1 of Just (BE : ts2) -> Just ts2 Turn Over

8 (b) Consider the following context-free grammar (CFG): S A B aab cs B eab Af g d S, A and B are nonterminal symbols, S is the start symbol, and a, b, c, d, e, f, and g are terminal symbols. Explain how a bottom-up (LR) parser would parse the string ccaeegf ddb according to this grammar by reducing it step by step to the start symbol. Also state what the handle is for each step. (7) Answer: A bottom-up parser traces out a right-most derivation in reverse. It would reduce the word ccaeegfddb as follows: ccaeegfddb rm (reduce by A g) ccaeeafddb rm (reduce by A Af) ccaeeaddb rm (reduce by B d) ccaeeabdb rm (reduce by A eab) ccaeadb rm (reduce by B d) ccaeabb rm ccaab rm ccs rm cs rm S (reduce by A eab) (reduce by S aab) (reduce by S cs) (reduce by S cs) The handle has been underlined in each step.

9 (c) The DFA below recognizes the viable prefixes for the above CFG. I0 S aab S cs S B B d a I1 S a Ab A eab A Af A g g I2 A g B d c A e g I3 S B I4 B d I5 S aa b A A f I6 A e AB A eab A Af A g I7 B S c S S aab d S cs S B a B d I8 b S aab f I9 A Af f I10 A ea B A A A f B d S c d B I11 S cs I12 A eab e (i) Give five distinct examples of viable prefixes, of which at least one should be at least six symbols long and at least one should contain at least two non-terminals. (ii) If an LR(0) parser is in state I6, with cae on the parse stack and the remaining input egfddb, what action does the parser take and what is the resulting state, parse stack, and remaining input? (iii) If an LR(0) parser is in state I8, with caab on the parse stack and no remaining input, what action does the parser take and what is the resulting state, parse stack, and remaining input? Answer: (8) E.g. ǫ, a, c, ccaeeg, caeab (An LR(0) parser DFA recognises viable prefixes. So just use the given DFA to find prefixes satisfying the requirements.) Shift e, stack caee, state I6, and remaining input gfddb. Reduce by S aab, stack cs, state I11, remaining input ǫ. Turn Over

10 Question 3 (a) Given the abstract syntax for MiniTriangle in Appendix A, suggest a suitable Haskell representation (Haskell datatype definitions) for Mini- Triangle expressions, declarations, and type denoters; i.e., the nonterminals Expression, Declaration, TypeDenoter. Use the type Name to represent terminals of kind Name, and the type Integer to represent terminals of the kind IntegerLiteral. Explain your construction. (10) Answer: Each of the nonterminals Expression, Declaration and TypeDenoter is mapped to a Haskell datatype definition with one constructor for each form of the syntactic construct, i.e., one constructor for each production. These constructors are named by the labels given in the rightmost column. Each non-terminal and variable spelling terminal of the RHS of a production gets mapped to a field of the corresponding constructor of the type corresponding to the non-terminal or variablespelling terminal. A sequence of non-terminals (the EBNF *-construct) is translated into a list of elements of the type of the non-terminal, and an optional non-terminal into a Maybe of the type in question. (Additionally, these fields may be named, but here we ll go for the most basic option.) If we apply this method to the given MiniTriangle abstract syntax grammar, we get: data Expression = ExpLitInt Integer ExpVar Name ExpApp Expression [Expression] data Declaration = DeclConst Name TypeDenoter Expression DeclVar Name TypeDenoter (Maybe Expression) data TypeDenoter = TDBaseType Name (b) The following is a fragment of a Happy parser specification for MiniTriangle, dealing with expressions and declarations. The semantic actions for constructing an abstract syntax tree as a result of successful parsing have been left out (indicated by a boxed number, like 3 ). Complete the fragment by providing suitable semantic actions using the representation of MiniTriangle expressions and declarations you defined in (a) above. Assume that the semantic value of the non-terminals unary operator and binary operator is of type Name, representing the name of a unary or binary function corresponding to the operator in question. Assume further that the type of the semantic value of the terminal LITINT is Integer and the one of ID is Name.

11 expressions :: [Expression] expressions : expression { 1 } expression, expressions { 2 } expression :: Expression expression : primary_expression { 3 } expression binary_operator primary_expression { 4 } primary_expression :: Expression : LITINT { 5 } ID { 6 } unary_operator primary_expression { 7 } ( expression ) { 8 } declaration :: Declaration declaration : CONST ID : type_denoter = expression { 9 } VAR ID : type_denoter { 10 } VAR ID : type_denoter := expression { 11 } Answer: (15) 1 = [$1] 2 = $1 : $3 3 = $1 4 = ExpApp (ExpVar $2) [$1, $3] 5 = ExpLitInt $1 6 = ExpVar $1 7 = ExpApp (ExpVar $1) [$2] 8 = $2 9 = DeclConst $2 $4 $6 10 = DeclVar $2 $4 Nothing 11 = DeclVar $2 $4 (Just $6) Marking: 2 marks each for 4, 7, 10, 11, 1 mark each for the remaining seven fragments. (4 2 + 7 = 15) Turn Over

12 Question 4 (a) Formally state the progress and preservation theorems, explain what they mean in plain English, and why, when taken together, these theorems mean that well-typed programs do not go wrong. (7) Answer: THEOREM [PROGRESS]: Suppose that t is a well-typed term (i.e., t : T), then either t is a value or else there is some THEOREM [PRESERVATION]: If t : T and t t then t : T. The progress theorem says that any well-typed term either is a value, i.e. the end result, or it will be possibly to evaluate the term one step further. The preservation theorem says that well-typed terms evaluate to welltyped terms. Thus, taken together, if we have a well typed term which isn t a value, then it can be evaluated one step into another well-typed term. If this is still not a value, then it can be evaluated again, and so on, until we eventually, if the evaluation terminates, get a value. Of course, the evaluation may fail to reach a normal form, but at least it will not get stuck in a normal form which is not a value. This is what it means for a program to not go wrong. (b) Consider the following expression language: e expressions: n natural numbers, n N x variables, x Name (e,e) pair constructor if e then e else e conditional where Name is the set of variable names. The types are given by the following grammar: t types: Nat natural numbers Bool Booleans t t pair (product) type The ternary relation Γ e : t says that expression e has type t in the typing context Γ and it is defined by the following typing rules:

13 Γ n : Nat x : t Γ Γ x : t Γ e 1 : t 1 Γ e 2 : t 2 Γ (e 1,e 2 ) : t 1 t 2 Γ e 1 : Bool Γ e 2 : t Γ e 3 : t Γ if e 1 then e 2 else e 3 : t (T-NAT) (T-VAR) (T-PAIR) (T-COND) A typing context, Γ in the rules above, is a comma-separated sequence of variable-name and type pairs, such as x : Nat, y : Bool, z : Nat or empty, denoted. Typing contexts are extended on the right, e.g. Γ, z : Nat, the membership predicate is denoted by, and lookup is from right to left, ensuring recent bindings hide earlier ones. (i) Use the typing rules given above to formally prove that the expression if b then 42 else x has type Nat in the typing context Γ 1 = b : Bool, x : Nat The proof should be given as a proof tree. (5) Answer: b : Bool Γ 1 Γ 1 b : Bool T-VAR Γ 1 42 : Nat T-NAT x : Nat Γ 1 Γ 1 x : Nat T-VAR T-COND Γ 1 if b then 42 else x : Nat (ii) The expression language defined above is to be extended with function abstraction and application as follows: e expressions:......... λx: t.e function abstraction e e function application t types:......... t t function (arrow) type Turn Over

14 For example, the function abstraction λx : Nat.x is the identity function on Nat, the natural numbers, and (λx:nat.x) 42 is the application of the identity function on natural numbers to 42. Provide a typing rule for each of the new expression constructs, in the same style as the existing rules, reflecting the standard notions of function abstraction and function application. (6) Answer: Γ, x : t 1 e : t 2 Γ (λx:t 1.e) : t 1 t 2 Γ e 1 : t 2 t 1 Γ e 2 : t 2 Γ e 1 e 2 : t 1 (T-ABS) (T-APP) (iii) Use the extended set of typing rules to formally prove that the expression fst ((λf:nat Nat.(f 3,11)) dec) has type Nat in the typing context Γ 2 = dec : Nat Nat, fst : Nat Nat Nat The proof should be given as a proof tree. (7) Answer: Let g = λf :Nat Nat.(f 3,11). Let us first determine the type of the expression g: f : Nat Nat Γ 3 Γ 3 3 : Nat Nat T-VAR Γ 3 3 : Nat T-NAT T-APP Γ 3 f 3 : Nat Γ3 11 : Nat T-NAT Γ 3 = Γ 2, f : Nat Nat (f 3,11) : Nat Nat Γ 2 g : (Nat Nat) (Nat Nat) We then check the type of the application g dec T-PAIR T-ABS Γ 2 g : (Nat Nat) (Nat Nat) above dec : Nat Nat Γ 2 Γ 2 dec : Nat Nat T-VAR Γ 2 g dec : Nat Nat Finally, we turn to the complete expression: T-APP fst : Nat Nat Nat Γ 2 Γ 2 fst : Nat Nat Nat T-VAR Γ 2 g dec : Nat Nat above Γ 2 fst (g dec) : Nat T-APP

15 Question 5 (a) Consider the following code skeleton (note: nested procedures): var a, b, c: Integer proc P var x, y, z: Integer proc Q var u, v: Bool... begin... Q()... end proc R var w: Bool... begin... Q()... end begin... Q()... R()... end begin... P()... end The variables a, b, and c are global, the variables x, y, and z are local to procedure P, and procedures Q and R are also local to P. Assume a stack-based memory allocation scheme with dynamic and static links. (i) Show the layout of the activation records on the stack after the main program has invoked the procedure P. Explain how global and local variables are accessed from P. (4) (ii) Show the layout of the activation records on the stack once procedure Q has called itself recursively once after having being called from R which in turn was called from P which was called from the main program. Explain how global variables, P s variables, and Q s own local variables (and what instance of the latter) are accessed from Q. (6) Answer: (i) Activation record layout (stack grows downwards): SB LB a b c ret. addr ST x y z Turn Over

16 SB is Stack Base, LB is Local Base (or Frame Pointer), ST is Stack Top. The activation record of the currently active procedure/function is the one between LB and ST. The solid arrow from the current activation record represents the dynamic link, i.e. it refers to the activation record of the caller and is thus equal to the previous value of LB. The dashed arrow represents the static link. Global variables are accessed relative to SB. Variables local to P are accessed relative to LB. (ii) Activation record layout (stack grows downwards): SB a b c ret. addr x y z ret. addr w ret. addr LB u v ret. addr ST u v SB is Stack Base, LB is Local Base (or Frame Pointer), ST is Stack Top. The solid arrows represent the dynamic link, the dashed ones is the static link. Global variables are accessed relative to SB. Variables local to Q are are accessed relative to LB. Hence the second instance of these variables are being accessed (corresponding to the currently active procedure). P s variables are accessed by following the static links until the activation record corresponding to the correct enclosing scope has been reached. In this case, only one step as P s scope directly encloses Q s scope. So P s variables can be accessed relative to the static link in Q s activation record.

17 (b) This question concerns register allocation. (i) Explain what register allocation is and why it is useful. Illustrate by a small example (a code fragment before and after register allocation). (7) (ii) Outline how the graph-colouring register allocation method works. You need to explain the notions of liveness and interference graph. Again, illustrate with an example or examples. (8) Answer: See lecture notes. Turn Over

18 Appendix A This appendix contains the grammars for the MiniTriangle lexical, concrete, and abstract syntax. The following typographical conventions are used to distinguish between terminals and non-terminals: nonterminals are written like this terminals are written like this terminals with variable spelling and special symbols are written like this MiniTriangle Lexical Syntax: Program (Token Separator) Token Keyword Identifier IntegerLiteral Operator ; : := = ( ) eot Keyword begin const do else end if in let then var while Identifier Letter Identifier Letter Identifier Digit except Keyword IntegerLiteral Digit IntegerLiteral Digit Operator ^ * / + - < <= ==!= >= > &&! Letter A B... Z a b... z Digit 0 1 2 3 4 5 6 7 8 9 Separator Comment space eol Comment // (any character except eol) eol

19 MiniTriangle Concrete Syntax: Note: According to this version of the grammar, all binary operators are left associative and have the same precedence. Program Command Commands Command Command ; Commands Command VarExpression := Expression VarExpression ( Expressions ) if Expression then Command else Command while Expression do Command let Declarations in Command begin Commands end Expressions Expression Expression, Expressions Expression PrimaryExpression Expression BinaryOperator PrimaryExpression PrimaryExpression IntegerLiteral VarExpression UnaryOperator PrimaryExpression ( Expression ) VarExpression Identifier BinaryOperator ^ * / + - < <= ==!= >= > && UnaryOperator -! Declarations Declaration Declaration ; Declarations Declaration const Identifier : TypeDenoter = Expression var Identifier : TypeDenoter var Identifier : TypeDenoter := Expression TypeDenoter Identifier Turn Over

20 MiniTriangle Abstract Syntax: Name includes both Identifier and Operator; i.e., Identifier Name and Operator Name Program Command Program Command Expression := Expression CmdAssign Expression ( Expression ) CmdCall if Expression then Command CmdIf else Command while Expression do Command CmdWhile let Declaration in Command CmdLet begin Command end CmdSeq Expression IntegerLiteral ExpLitInt Name ExpVar Expression ( Expression ) ExpApp Declaration const Name : TypeDenoter DeclConst = Expression var Name : TypeDenoter DeclVar (:= Expression ǫ) TypeDenoter Name TDBaseType End