CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia

CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia This is a closed book exam; no notes; no calculators. Answer in the space provided. There are 8 questions on 14 pages, totaling 100 marks. You have 80 minutes to complete the exam. NAME: STUDENT NUMBER: SIGNATURE: The following are the rules governing formal examinations: 1. Each candidate must be prepared to produce, upon request, a UBCcard for identification; 2. Candidates are not permitted to ask questions of the invigilators, except in cases of supposed errors or ambiguities in examination questions; 3. No candidate shall be permitted to enter the examination room after the expiration of one-half hour from the scheduled starting time, or to leave during the first half hour of the examination; 4. Candidates suspected of any of the following, or similar, dishonest practices shall be immediately dismissed from the examination and shall be liable to disciplinary action; Having at the place of writing any books, papers or memoranda, calculators, computers, sound or image players/recorders/transmitters (including telephones), or other memory aid devices, other than those authorized by the examiners; Speaking or communicating with other candidates; Purposely exposing written papers to the view of other candidates or imaging devices. The plea of accident or forgetfulness shall not be received; 5. Candidates must not destroy or mutilate any examination material; must hand in all examination papers; and must not take any examination material from the examination room without permission of the invigilator; and 6. Candidates must follow any additional examination rules or directions communicated by the instructor or invigilator.

1 /15 2 /20 3 /4 4 /12 5 /14 6 /10 7 /10 8 /15 Total /100 2

Use this page for extra space if needed. 3

1. (15 marks) High Level Concepts So far you have implemented 5 compiler stages. The initial input is a program represented as ASCII text characters. The final output is Linearized Intermediate Representation (Linearized IR). In the space below, in order from first to last, write the name of each stage and briefly describe what it does. Stage 1 Name: Input: String of ASCII characters Output: Description: Stage 2 Name: Input: Output: Description: Stage 3 Name: Input: Output: Description: Stage 4 Name: Input: Output: Description: Stage 5 Name: Input: Output: Linearized Intermediate Representation (Linearized IR) Description: 4

2. (20 marks) Visitors In lectures and in the book, we discussed at least two options for implementing routines that operate on recursive tree data structures like abstract syntax trees, intermediate representation trees, and other similar representations in an object-oriented language like Java. In all cases, we represent trees by using a class hierarchy of tree elements. In the method-based approach, operations are implemented using methods, and the code for a routine is spread throughout the class hierarchy. In the visitor-based approach, the routine is encapsulated in a subclass of a Visitor class that is designed to interact with all of the tree elements. Consider the following code for a small language of arithmetic and boolean expressions. It implements a type checker for the language using the method-based approach. Your job is to reimplement from scratch the class hierarchy and type checker using the visitor-based approach. The abstract Visitor class and the Type enumeration have been provided for you. Method-Based Approach: public enum Type {INT, BOOL public abstract class Expr { public abstract Type typeo f ( ) ; public class Num extends Expr { public f i n a l i n t n ; Num( i n t n ) { this. n = n ; public Type typeo f ( ) { return Type. INT ; public class Plus extends Expr { public f i n a l Expr l h s ; public f i n a l Expr rhs ; Plus ( Expr lhs, Expr rhs ) { this. l h s = l h s ; this. rhs = rhs ; public Type t y p e o f ( ) { i f ( l h s. typeof ( ) == Type. INT && rhs. t y p e o f ( ) == Type. INT ) { return Type. INT ; else { throw new E r r o r ( Type E r r o r ) ; public class Equal extends Expr { public f i n a l Expr l h s ; public f i n a l Expr rhs ; Equal ( Expr lhs, Expr rhs ) { this. l h s = l h s ; this. rhs = rhs ; public Type t y p e o f ( ) { i f ( l h s. typeof ( ) == Type. INT && rhs. t y p e o f ( ) == Type. INT ) { return Type.BOOL; else { throw new E r r o r ( Type E r r o r ) ; 5

Visitor-Based Approach: public enum Type {INT, BOOL public abstract class V i s i t o r <R> { public abstract R v i s i t ( Plus e ) ; public abstract R v i s i t ( Equal e ) ; public abstract R v i s i t (Num e ) ; Solution: 6

Use this page for extra space if needed. 7

3. (4 marks) Dispatch Circle one answer for each of the following questions: The Visitor pattern uses O to determine which implementation of the visit method to call within the concrete visitor class. Dynamic Dispatch Overloading The Visitor pattern uses O among the expression classes. to determine which implementation of the accept method to call Dynamic Dispatch Overloading 8

4. (12 marks) Parsing Consider the following grammar for a super-simple expression language: Exp ::= Exp "+" Term Term Term ::= Term "*" Factor Factor Factor ::= Identifier Number "(" Exp ")" Identifier ::= "a" "b"... "z" Number ::= "0" "1"... "9" For simplicity, we ignore whitespace, and treat single characters as the tokens of our language. This grammar represents all strings of tokens that count as legal programs in our language. Unfortunately, this grammar cannot directly be used as the basis for a predictive, top-down, recursivedescent parser (i.e. an LL(1) parser). Briefly explain what the problem is with this grammar, explain how to fix it, and provide a new grammar that represents the same strings and is an acceptable LL(1) grammar. If needed, use /* empty */ to mark any empty productions in your grammar. Problem: 9

5. (14 marks) Translation to IR Using the IR syntax provided in the appendix, write out plausible IR code that a hypothetical translator might produce. You may assume that the code is from a valid program and that any variables in the code are allocated to a TEMP of the same name (e.g. x TEMP(x) ) 1. sum = sum + i; 2. i = i + 1; 3. i = j < 7? ( i + 1) : ( i + 2); (with a completely strict evaluation strategy that evaluates both branches) 4. i = j < 7? ( i + 1) : ( i + 2); (with a lazy evaluation strategy that only evaluates what it must) 10

6. (10 marks) Canonicalization We have seen a technique to transform Intermediate Representation (IR) terms into a more manageable canonical form, where all sequencing is at the top-level and all function calls contain only simple arguments. The algorithm conceptually implements a set of rewriting rules that, among other things, eliminate ESEQ nodes from an IR tree. The rewriting rules in the book are a subset of the rules necessary to eliminate all ESEQs from expressions. Provide rewrites for each of the following IR expressions to eliminate the ESEQ expressions. In some cases, you can improve the rewrite if you know that underlying terms can commute. If that s the case, provide rewrites for both the commuting and non-commuting case and say explicitly which terms must commute to justify the rewrite. 1. MOVE(TEMP t, ESEQ(s, e)) 2. EXP(CALL(e0, ESEQ(s, e1), e2)) 11

7. (10 marks) True and false For each of the following statements, indicate whether it is true or false. Explain your answer briefly. 7a. (2 marks) Every language that can be defined by a grammar (in BNF notation) can also be defined by a regular expression. 7b. (2 marks) In a statically typed language commands do not need to be type checked. 7c. (2 marks) The intermediate representation we are using (IR trees) would be strictly less expressive if it did not have the ESEQ primitive. 7d. (2 marks) Whether one or two passes over the parse tree is required to determine whether identifiers are being properly used is primarily determined by the compiler, and not by the language being compiled. 7e. (2 marks) In the majority of statically typed languages, the type of an expression is determined both by the type of its sub-expressions and the context in which it appears. 12

8. (15 marks) Liveness Analysis Consider the following list of X86-like instructions, extended with temporaries. To help us with register allocation, we would like to determine the liveness of variables along the control flow. Recall that X86 instructions have the source location as the first argument and the destination location (which may also be a source location depending on the instruction) as the second argument. Compute the liveness of temporaries (and registers) along this list of instructions. Between each instruction, write the temporaries that are live along that edge. 1: movq $1, t1 2: movq $46, t2 3: movq t1, t3 4: addq $7, t3 5: movq t3, t4 6: addq $4, t4 7: movq t3, t5 8: addq t2, t5 9: movq t5, %rax 10: subq t4, %rax 13

APPENDIX: IR Tree Syntax Items between /*..*/ are comments indicating the purpose of an element. They are not part of the actual syntax. E.g MOVE contains two sub expressions, the first one is the destination. Exp,* should be taken to mean a list of expressions separated by commas. Exp ::= CONST( int ) NAME( Label ) TEMP( Temp ) BINOP( Op, Exp, Exp) MEM(Exp) CALL( Exp /*fun*/, Exp,* /*args*/) ESEQ( Stm, Exp ) Stm ::= MOVE(Exp /*dst*/, Exp /*src*/) EXP(Exp) JUMP(Label) CJUMP(RelOp, Exp, Exp, Label /*thn*/, Label /*els*/) SEQ( Stm, Stm) LABEL( Label ) Op ::= PLUS, MINUS, MUL, DIV, AND, OR, LSHIFT,... RelOp ::= EQ, NE, LT, GT, LE, GE, ULT, ULE,... Label ::= <IDENTIFIER> TEMP ::= <IDENTIFIER> To make IR code more readable, it is acceptable to use SEQ(s1, s2, s3,..., sn) as shorthand for SEQ(s1, SEQ(s2, SEQ(s3,..., sn)))) Use meaningful formatting and indentation conventions to make the tree/nesting structure of the IR clear. For example: ESEQ( SEQ( MOVE(TEMP(x), CONST(0)), MOVE(TEMP(y), CALL(NAME(dothings), TEMP(y)) TEMP(x), CONST(5)))), 14