SECOND PUBLIC EXAMINATION. Compilers

A10401W1 SECOND PUBLIC EXAMINATION Honour School of Computer Science Honour School of Mathematics and Computer Science Honour School of Computer Science and Philosophy Compilers TRINITY TERM 2016 Thursday 2nd June, 2:30pm 4:00pm Candidates should answer no more than two questions. Please start the answer to each question on a new page. Do not turn over until told that you may do so. 1

Question 1 The following context-free grammar describes expressions expr, built up from identifiers Ident and opening and closing parentheses Lpar and Rpar. expr Ident Lpar expr Rpar expr expr. An example string is the following: equal (fac n) (if (equal n zero) one (times n (fac (minus n one )))) As you can see, this expression must end in exactly four right parentheses. An expression expr 1 expr 2 denotes application of a function expr 1 to an argument expr 2. An implementation of the language uses abstract syntax trees with type tree, defined by type tree = Var of string Apply of tree tree. As in the languages Haskell and OCaml, application associates to the left, so that the subexpression equal n zero corresponds to the abstract syntax tree, Apply (Apply (Var "equal", Var "n"), Var "zero"), the same as the expression (equal n) zero. (a) Show that the grammar given above is ambiguous, and suggest an unambiguous grammar for the same language, respecting the interpretation that application associates to the left. (8 marks) (b) Using your grammar, write rules for an ocamlyacc script that parses expressions and builds the corresponding abstract syntax tree. (5 marks) Experience shows that programmers find it laborious to count the many right parentheses that may be needed at the end of an expression, and a new language feature is proposed that allows an opening parenthesis to be replaced by a left square bracket Lbrack, to be matched by a subsequent right square bracket Rbrack. The closing bracket implies as many closing parentheses as are needed to make the expression balanced. Like parentheses, the square brackets may be nested arbitrarily deep. Thus, the expression shown earlier may be replaced with the following. equal (fac n) (if (equal n zero) one [times n (fac (minus n one ]) The closing bracket implies the two parentheses needed to finish the expressions beginning (fac and (minus; one more closing parenthesis is needed to match (if. (c) Write an unambiguous grammar for this extended language, expressing it in the form of an ocamlyacc script that produces the corresponding abstract syntax tree. (12 marks) A10401W1 2

Question 2 The following program is written in a Pascal-like language, with keywords in upper case. Procedure one has a parameter x that is passed by reference; this parameter is accessed from the body of the nested procedure two. The main program calls procedure one, passing it a global variable y; the output is 37. PROC one(var x: integer); PROC two(); BEGIN x := x+1 END; BEGIN two() END; VAR y: integer; BEGIN (* main program *) y := 36; one(y); print_num(y); newline() END. There are two implementations of the programming language. One of them is based on postfix code for a simple stack machine, where variable access is implemented using arithmetic on addresses. This implementation generates code by a simple syntax-directed translation, and uses little optimisation. The second implementation generates machine code for a RISC machine, and employs Common Subexpression Elimination (CSE) within basic blocks. In both implementations, access to non-local variables uses static links. On the RISC machine, both procedure arguments and static links are passed in registers but saved in the stack frame as part of the procedure prologue. (a) Describe the run-time structures needed to represent activations of the procedures one and two, showing a possible layout of their stack frames, together with sample postfix code for the procedures one and two and for the call one(y) in the main program. (8 marks) (b) Explain how the statement x := x+1 could be represented internally in the second compiler in a form suitable as the input to CSE, and show how the result of CSE would be represented. (9 marks) (c) Suggest rules for selecting instructions for elements of the internal representation used in part (b), and give a translation of the optimised statement into code for a typical RISC machine. (8 marks) In each part of the question, you need not use code that corresponds to any particular machine, provided the meaning of each operation is clear and your assumptions are stated. A10401W1 3 TURN OVER

Question 3 Two robotics researchers, Jack and Tom, have proposed a new control structure, designed to help in the description of finite state machines. An example is shown in Figure 1, with keywords in upper case. This machine prints the number of groups of consecutive 1 bits in the binary representation of a positive integer val; thus if initially val is 819 = 1100110011 2, then the program prints 3. This machine has four states, init, done, flop and flip. It starts in the state that appears first in the construct in this case init. The action of the machine in each state is described by ordinary statements of the language, except that embedded statements of the form NEXT flop are allowed, and cause execution of the code for the state to be abandoned, and the code for the state flop to start running, until another NEXT statement activates a different state, and so on. If the code for any state finishes without the machine encountering a NEXT statement, then the whole construct terminates; in the example, this happens after state done is entered. The construct is to be added to an existing compiler that implements the other statement forms shown in the example. The compiler builds and annotates an abstract syntax tree, then generates code for a typical stack-based abstract machine. (a) Suggest an abstract syntax for this construct. (3 marks) (b) Describe the information that should be associated by the semantic analyser of a compiler with the name of each state, and what annotations need to be added to the abstract syntax tree to support code generation. (5 marks) (c) Identify two errors that could be detected during semantic analysis, and one warning that the semantic analyser could helpfully give. Explain how these conditions can be detected during semantic analysis. (4 marks) (d) Using appropriate conventions, show how the new construct can be implemented in a code generator. (7 marks) (e) A direct translation of the construct may contain many jumps and labels and may be untidy and inefficient. Referring to the example in Figure 1, discuss the sorts of inefficiency that may arise, and the extent to which the code can be improved by a simple optimiser. (6 marks) A10401W1 4

MACHINE STATE init: count := 0; NEXT flop STATE done: print_num(count); newline() STATE flop: IF val MOD 2 = 1 THEN count := count+1; NEXT flip END; val := val DIV 2; IF val = 0 THEN NEXT done ELSE NEXT flop END STATE flip: IF val MOD 2 = 0 THEN NEXT flop END; val := val DIV 2; NEXT flip END Figure 1: Program text for Question 3 A10401W1 5 LAST PAGE