Examination in Compilers, EDAN65 Department of Computer Science, Lund University 2016 10 28, 08.00-13.00 Note! Your exam will be marked only if you have completed all six programming lab assignments in advance. Start each solution (1, 2, 3, 4) on a separate sheet of paper. Write your personal identifier 1 on every sheet of paper. Write clearly and legibly. Try to find clear readable solutions with meaningful names. Unnecessary complexity will result in point reduction. The following documents may be used during the exam: Reference manual for JastAdd2 x86 Cheat Sheet You may also use a dictionary from English to your native language. Max points: 60 For grade 3: Min 30 For grade 4: Min 40 For grade 5: Min 50 1 The personal identifier is a short phrase, a code or a brief sentence of your choice. It can be anything, but not something that can reveal your identity. The purpose of this identifier is to make it possible for you to identify your exam in case something goes wrong with the anonymous code on the exam cover (such as if it is confused with another code due to sloppy writing). 1
1 Lexical analysis The following token definitions are all part of a compiler for a programming language: ARROW = "->" MINUS = "-" GT = ">" DECREMENT = "--" a) Consider the following string: "-->" Suppose no disambiguation rules are used. List all token sequences that this string could be interpreted as. (3p) b) There are two common rules for disambiguating lexical rules. Which of them is relevant for disambiguating the string "-->", and which token sequence will be the result when using that rule? (2p) c) Whitespace in the language consists of arbitrarily long non-empty sequences of blanks, tabs, newlines and return characters. Give a regular expression WHITE- SPACE for such whitespace sequences. (2p) d) Draw a combined DFA covering ARROW, MINUS, GT, DECREMENT, and WHITE- SPACE that has as few states as possible. Mark the final states with the appropriate tokens. (5p) 2
2 Context-Free Grammars Consider the following program in a language that has a simple form of higher-order functions, where a function can take another function as a parameter. The integrate function is an example of such a higher-order function. It has a function parameter f, and computes an approximation of the integral of f over an interval low..hi. The main function shows an example of calling integrate with the function g as the third argument. The type of f is declared as (float) -> float, meaning that f takes a float parameter and returns a float. As we can see, this matches the definition of g. In general, a function type in this language is written as (t1, t2,..., tn) -> t for n 0, where t1, t2,..., tn are the parameter types and t is the return type of the function. The function example shows other examples of function parameters: The function parameter f1 takes an int and a float and returns nothing (void). The function parameter f2 takes no parameters and returns an int. The function parameter f3 takes an (int)->int function as its parameter and returns an int. float integrate( float low, float hi, ( float) -> float f ) { return ( f(low) + 4*( f( (low+hi)/2 ) ) + f(hi) ) * (hi-low) / 6; float g( float x) { return 3*x*x*x + 2*x + 7; void main() { print( integrate(0, 10, g)); void example( (int, float) -> void f1, () -> int f2, ((int)->int) -> int f3){... Below, parts of the abstract grammar for the language are shown. Program ::= FunDecl*; FunDecl ::= Type IdDecl Param* Body; abstract Type; FloatType : Type; IntType : Type; VoidType : Type; FunType : Type ::= ParamType: Type* ReturnType: Type; IdDecl ::= <ID >; Param ::= Type IdDecl; Body ::= Stmt*; abstract Stmt;... 3
a) Construct an unambiguous context-free grammar for the part of the language described by the abstract grammar above. The grammar should be on EBNF form (Extended Backus-Naur Form), i.e., allowing optionals and lists. Your EBNF grammar should be as similar as possible to the abstract grammar: for each class in the abstract grammar, there should be a corresponding nonterminal, with the same name, and you should avoid using additional nonterminals. Your grammar should cover the example program above, except for the statements inside functions. You may assume there is a predefined token ID for identifiers. (6p) b) Below, some more parts of the abstract grammar are shown. ReturnStmt : Stmt ::= Expr; CallStmt : Stmt ::= Call; abstract Expr; Call : Expr ::= IdUse Arg: Expr*; IdUse : Expr ::= <ID >; abstract BinExpr : Expr ::= Left: Expr Right: Expr; Add : BinExpr; Sub : BinExpr; Mul : BinExpr; Div : BinExpr; FloatLit : Expr ::= <FLOAT >; IntLit : Expr ::= <INT >; Construct an unambiguous context-free grammar for this part of the language, on BNF or canonical form, i.e., without using optionals or lists. The parse trees should reflect the usual associativity and precedence rules: binary operators are left-associative, and with multiplication and division having higher precedence than addition and subtraction. Make sure your grammar covers the statements used in the example program. You may assume there are predefined tokens INT and FLOAT for integer and float literals. If you construct an ambiguous grammar that is otherwise correct, you will get half of the points. (8p) c) Prove that the following expression can be derived from the nonterminal Expr of your BNF grammar, by drawing a parse tree for it. Make sure to include all nonterminals and terminals in the tree so that it matches the grammar exactly. The root of the tree should be an Expr nonterminal. It is fine to abbreviate the nonterminals in the tree, e.g., to write E instead of Expr, as long as the abbreviations are obvious. x + f(x) * 2 If you did not solve the previous task completely, and your grammar is ambiguous for this expression, provide two different parse trees for the expression, to get full points on this task. (4p) 4
3 Program analysis We will continue to work with the language with higher-order functions introduced in problem 2. The language does not permit the use of the void type for parameters, but since this is not prohibited by the abstract grammar, we will instead check this using attributes. Whereas the example given at page 3 shows legal uses of void types, the example below shows a program with illegal uses: int f( void x) { // Line 1, col 7: Illegal use of " void". return 0; int g(( void) -> int p) { // Line 4, col 8: Illegal use of " void". return 0; We would like to compute a set of error messages for illegal uses of void types. For the above example program, the set should contain the two error message strings above. For the example program on page 3, the set should be empty. To compute the line and column numbers, you can assume that each ASTNode has int attributes getline() and getcol(). a) Solve this problem by using attribute grammars, and without using Java s instanceof keyword. The result of the computation should be an attribute of type Set in the Program node. Hint! Use a collection attribute. (8p) b) Solve the problem by using a visitor. The visitor should have a static method static Set result( Program node) {... that computes the result. No attributes or inter-type methods may be used in this solution. You may assume that there is an interface Visitor, with a method void visit(c node); for each concrete class C in the abstract grammar. Assume also that there is an abstract method void accept( Visitor v); for the general class ASTNode, and an implementation of accept for each concrete class C in the abstract grammar. Each such implementation forwards the call to the appropriate visit method in v as follows: void accept( Visitor v) { v. visit( this); Hint! You don t have to visit all nodes. (7p) 5
4 Code generation and run-time systems We will continue to work with the language with higher-order functions. Consider the following program: int h(int a, (int) -> int f ) { return 3*(a + f(2*a)); int g( int x) { (** PC **) return 4 * x + 5; void print(int x) {... void main() { print(h(1, g)); // Will print 42 The same calling convention is to be used as in the labs, i.e., where parameters are passed on the stack, and the return value is passed in rax. For a function parameter, it is the address to the function that should be passed. For example, when main calls h with the parameter g, it is the address of g that should be passed. This can be done with the instruction leaq, which moves an address into a register. This is in contrast to the instruction movq, which moves the content at an address into a register. In subproblem a) you will make a drawing and in subproblems b) and c) you will write x86 code. The code should be consistent with your drawing. Use only the instructions on the x86 Cheat Sheet. Use rbp as frame pointer and rsp as stack pointer. You are encouraged to comment your code to help us understand your intention. For simplicity and readability, you may leave out the characters q, $, %, and, in the code. For example, you may write add 8 rax instead of addq $8, %rax. a) Draw the situation on the stack at the location **PC**. Your drawing should include stack pointer, frame pointer, dynamic links, and parameters. You may leave out possible temporaries from the drawing. Include the actual values as far as possible, including dynamic links, and mark which frame is which. (5p) b) Translate the statement print(h(1, g)) to unoptimized x86 code. (5p) c) Translate the function h to unoptimized x86 code. Also, draw a table showing the addresses of the parameters. (5p) 6