// the current object. functioninvocation expression. identifier (expressionlist ) // call of an inner function

SFU CMPT 379 Compilers Spring 2015 Assignment 4 Assignment due Thursday, April 9, by 11:59pm. For this assignment, you are to expand your Bunting-3 compiler from assignment 3 to handle Bunting-4. Project submission format is the same as for assignment 3 (an archive of your src directory inside a folder named with your sfu username). Bunting-4 is backwards compatible with Bunting-3. All restrictions and specifications from Bunting-3 are still in force, unless specifically noted. I also omit unchanged productions in the Tokens and Grammar sections. If "..." is listed as a possibility for a production, it means "all possibilities for this production from Bunting-3". Language Bunting-4 Grammar: S globaldefinition globaldefinition* main body functiondefinition typedefinition classdefinition classdefinition class identifier ( parameterlist ) classbody classbody { innerdefinition* main body functiondefinition* innerdefinition typedefinition declaration statement break ; continue ; release expressionlist ; expression create user? identifier (expressionlist) // object creation. no immutability annotation create user? mutarraytype ( expression ) [ user? mutannotation? expressionlist ] user? mutannotation? stringconstant copy user? mutannotation? expression [+ user? mutannotation? expression, expression (, expression)* +] this // the current object functioninvocation expression. identifier (expressionlist ) // call of an inner function type identifier // identifier can be a class name 1. Classes Classes are a new type in Bunting-4. 1.1. Class definitions

Classes are limited-functionality objects (they can perform information hiding, but one cannot subclass them or use polymorphism on them). The identifier following the keyword class is the name of the class (i.e. the class name). Class names may not be overloaded (there may not be two classes with the same name, even if the number of parameters are different.) Class names are in the same namespace as all other names, so one cannot declare a class and a global function of the same name. 1.2. The parts of a class The body of the class (body child of classdeclaration) consists of three distinct parts: (1) innerdefinitions. These are definitions for the class variables (fields), and type that will be used inside the class. We sometimes call a field an inner variable and a type declared here as an inner type(def). (2) main block. This is the constructor, called whenever an object of this class is created. (3) functiondefinitions. We call these inner functions. The class definition s parameterlist lists the parameters to be used when a create is issued for this class. They are to be declared as immutable variables in a new scope placed on the class declaration node. That is, they may be overridden in the main body scope (or in any nested scopes). The record for the class should include all parameters and inner variables, but not local variables for the main block or any functions. To implement this, creation parameters should be passed on the frame stack to the constructor but then copied to the class s record at the start of that constructor. 1.3. Class visibility All classes (and inner functions) are visible throughout the program, even before their definition. Collect the names and creation code signatures in a separate pass as was done for function names. You may combine this pass with the function-name collector, or make a separate pass. 1.4. Variable and inner function visibility Inner variables and inner types are visible only within the class definition, and only to code that is past the point of declaration of these variables or types. This is like java private fields. All inner functions are visible throughout the program (like java public ), even before their definition. Variables and types declared in a scope nested anywhere inside the class body are visible only within that nested scope. This includes variables and parameters declared inside the inner functiondeclarations. 1.5. Class creation The only way to create a class instance is with a class creation expression: create user? identifier (expressionlist) The identifier must be a class name. The expressionlist must be type-compatible with the corresponding class definition s parameterlist. This creation expression allocates memory for the class s record and runs the class s main block with the given arguments. The values of all creation parameters and inner variables are kept in the record until the record is released. The visibility of inner variables is discussed above. We will simply use the class name (identifier in the creation expression) as the indicator for the type of object generated by the creation. For example, in imm chessie := create Cat(9);

the type of the variable chessie is Cat. 1.6. Class records A class creation expression results in the allocation of a new record (block of memory obtained from the memory manager). The record for a class type Name has the following format: Type identifier (4 bytes) Status (4 bytes) Variables for instance of Name (size of the scope on Name s definition) The type identifier for a class is an integer that identifies the class type (i.e. the class). Start allocating class type identifiers with the integer 128. That is, give the integer 128 to the first class declared in the program, give 129 to the next class declared, etc. The entire record takes 8 + size-of-name s-definition-scope bytes. Note that the definition scope size includes the sizes of any creation parameters and inner variables. The variables in the main block or in inner functions are only visible while that block or function is executing, so you should use a stack frame for them. Use a subscope for the scope attached to the class s body node; it will (as subscopes do) use its parent s allocator, which in this case is the allocator given to the class definition node s scope (which is where I said to place the creation parameters). Allocation of offsets for variables inside a class definition scope is done in a positive sense, starting with the offset 8. Thus, if a class has a parameter x and inner variables f and g, with x being an integer (4 bytes) and both f and g being floats (8 bytes apiece), x will get offset 8, f will get offset 12, and g will get offset 20. You may have to create a new type of scope (with the proper MemoryAccessMethod) to handle these class definition scopes. The status flags used in arrays are also defined for classes: bit 0 (mask 1) contains the immutability status of the class, which is currently unused, so set it to zero. This bit is unused because the class provides immutability designation separately for each variable it contains. Bit 1 (mask 2) contains the subtype-is-reference status, which is also not used for classes (so set it to zero). (As we shall see later, classes have a more complex method for handling subelements that are reference variables. Bit 2 (mask 4) contains the user-managed status, which is set to 1 if the creation expression contains the optional keyword user, and set to 0 otherwise. 1.7. Class variables If Name is the name of a class, then a variable of type Name is a pointer to a class record; it thus occupies 4 bytes. It does not occupy 8 + size-of-name s-body-scope bytes. In other words, class types are all reference (a.k.a. pointer) types. Class variables obey semantics much like java objects (which are themselves reference types): Class variables declared with imm do not change their pointer; however, the contents of the record they point at may change. Imagine that the setcolor call in the following sets a variable inside the Cow s record: imm bessie := create Cow(1); call bessie.setcolor(4); This is valid Bunting-4 and results in bessie pointing at a record that now has color 4. Assignment or initialization of classes with other classes is simply a pointer copy. imm bessie := create Cow(2); imm freddie := bessie; is valid Bunting-4 and results in bessie and freddie being two pointers to the same record. If this is followed by

call bessie.setcolor(7); then not only will bessie s color be 7, but freddie s color will, as well. Class variables declared with mut may change their pointer as well as the record contents. mut bessie := create Cow(3); call bessie.setcolor(2); imm nextcolor := 15; imm tempcow := create user Cow; call tempcow.setcolor(nextcolor); mutate bessie := tempcow; is valid Bunting-4. However, class variables may not be assigned a class of a different type. imm bessie := create Cow(4); imm jessie := create Unicorn(5); mut pet := bessie; mutate pet := jessie; is not valid Bunting-4. The mut declaration sets pet s type as Cow, and the mutate tries to update it with a Unicorn, so this should generate a typechecking error. 1.8. Class printing If an expression in a print statement is of class type, then Bunting-4 will print the name of the class. class main { imm buzz := create Bee; print a& buzz& c& nl&; print a, buzz, c& nl&; Will produce: abeec a Bee c 2. Class types Each class definition in a Bunting-4 program creates a new type, known as a class type. You will need to create a new subclass of Type to handle class types, as you did for array types. Each class definition you encounter will lead to the creation of a new instance of ClassType (or whatever you call it). Each class type has a name, which is the name given in the class definition. No two class types (or two class definitions) may have the same name. Like arrays, classes are reference types (a.k.a. pointer types): the value of a class variable is a pointer to the class record. The user can define types based on class types (just like every other type).

3. The expression this The lexeme this is now a keyword. When used as an expression, it denotes the instance of the class that contains the function (or creation code) that is currently running. Note that it can not be used explicitly by the Bunting programmer for accessing variables in a class, but it can be used as a function argument or as a qualifier in a functioninvocation. To implement this, we will use a technique known as lambda lifting. In this technique, the compiler adds a parameter to each function (including the constructor) and passes in the correct value of this as the argument. In Bunting, we will put this new parameter at the end of the parameter list. For example, in: class Sheep(int numbaas) { mut wool := 2; main { def int makenoise(int n) { imm scaledwool = 3 * wool; The inner function makenoise is treated as if it were declared: def int makenoise(int n, Sheep thisptr) { Then, in the following sequence: imm garry := create Sheep(21); imm x := garry.makenoise(14); The second statement is treated as: imm x := Sheep.makeNoise(14, garry); Here Sheep.makenoise is not legal Bunting, it just indicates which makenoise to call. In general, expression.functionname(args) is treated as (Type of expression).functionname(args, expression) The last argument to a function is always at offset 0 from the frame pointer, so we can rely on the this pointer being located at the frame pointer. If a inner function accesses any variables or creation parameters of the class, it uses the this pointer to access the correct instance s version of the variable. In the example above, the statement imm scaledwool = 3 * wool; uses the variable wool from the creation code of the class Sheep, and so this can be thought of as imm scaledwool = 3 * thisptr.wool; (but note that the syntax thisptr.wool is not legal Bunting-4.) No explicit conversion to this form is necessary, however. You can simply create a new memory access method that uses double indirection from the frame pointer to calculate the base address. The code for this looks like: PushD $FRAME_POINTER LoadI

LoadI (the two LoadI s give it double indirection). The first LoadI gets the value of the frame pointer, and the second LoadI get the value of thisptr. Use your new memory access method for variables declared in the class body scope, and in nested scopes that are not within functions. this is not a targetable expression. Translate the expression this as thisptr (i.e., the last argument); in other words, address code for this is the first four instructions of the sequence above, or: PushD $FRAME_POINTER LoadI PushI 4 Add The type for this is the class type for the class that encloses it. 4. Handling create, the constructor, and its parameters The main block in a class definition can be thought of as a function that modifies the class being created, but has no return statements. We can pass the record for the class being created to the creation code as its this pointer, if we allocate the record before the call. Then the call can return the this pointer using the usual return mechanism, providing the value that the creation expression expects (the pointer to the new class). So implement create for class instances as a modified function call. This can be done (for example, for create Horse(expr1, expr2) ) by: 1) calculating the argument expressions, in order, and pushing them on the frame stack, 2) allocating a record (of size 8+Horse s-scope-size) and pushing the pointer to that record on the frame stack, so that it becomes the this-pointer argument to the constructor function, 3) calling the constructor, which should return its this pointer 4) performing the usual caller s-exit handshaking Note that steps 1 to 3 are just a modified version of the caller s-function-call handshaking. The constructor is made by turning the main block of the class into a function. It should: 1) do the usual callee-enter handshake 2) copy the parameters (not including the thisptr) from the frame stack into the record (thisptr) at the proper parameter offsets. 3) execute the main block s body code. 4) return the this pointer, 5) do the usual callee-exit handshake. Since the creation parameters are available to all functions nested in the class, we do step 2) above to ensure that these arguments are stored in the record for the class. This means that during semantic analysis we must allocate space for them in both the parameter scope for the function call to the main body, and in the class scope that gets made into the class record. The parameter scope need not enclose the class body scope, as these named variables can be referenced as part of the class record. Thus, we may place the parameter scope at any convenient node in the AST, perhaps at a ParameterListNode or something similar. Then, you can start a class scope (whatever you decide to call it) at the ClassDeclaration node, and enter the parameter definitions into both the parameter scope and the class scope. The class body scope then can be a subscope placed on the class s Body node.

The above is only one method that works; you are free to use others if you so desire. For instance, you could essentially return void from the creation code and use a saved value of the allocated record as the value for the creation expression. Or maybe you could allocate the new class record inside the creation-code call, in the callee-entrance handshake. 5. Class attribute tables Bunting-4 requires us to have information about each class type available at runtime. The two pieces of information we need are (1) a string name for the class (to use when printing the class), and (2) a list of all offsets in the class that contain a reference variable. The easiest way to do this is to have either one combined table (each element being a string pointer followed by a list pointer), or to have one table for the string pointers, and one for the list pointers. Either way is fine, but the combined table requires an extra offset to access (say) the list pointers. I ll give a few details of the separate-table idea. First, somewhere in RunTime, we need to place labelled strings in data memory, one for each class type. Suppose your program has three class types: Cat, Dog, and Rabbit, which have been allocated the typecodes 128, 129, and 130, respectively. Then you might issue the following ASM instructions to allocate the strings: DLabel $class-cat-string DataS Cat DLabel $class-dog-string DataS Dog DLabel $class-rabbit-string DataS Rabbit This would be done from a loop that loops over the defined class types, issuing a DLabel and and DataS for each. To create the table, you would issue something like: DLabel $class-names-table DataD $class-cat-string DataD $class-dog-string DataD $class-rabbit-string Then, to get the pointer to the name of the type of a record, take the record s typecode, and calculate the address $class-names-table + (typecode-128)*4 Where the 4 is the size of an entry in the table. This address contains the desired pointer. The idea for the list pointers is similar, where each class s list could be a negative-number terminated array of integers. For example, if Cat s list were to contain the numbers 4, 12, and 20, you d want something like: DLabel $class-cat-referenceoffsets DataI 4 DataI 12 DataI 20 DataI -1 We use a negative number here rather than zero because zero is a valid element for the offset list. You can use any other method you please to store these lists; you could for instance prefix the list with the number of elements rather than having an end sentinel: DLabel $class-cat-referenceoffsets

DataI 3 DataI 4 DataI 12 DataI 20 6. Function invocations In the function invocation expression. identifier (expressionlist ) the expression must have a class type SomeClass that is a defined class, and identifier must be a inner function in SomeClass, with the argument types in the invocation matching the parameter types in the inner function. As stated above, we lambda-lift this invocation, treating it as if it were identifier (expressionlist, expression ) in order to implement the this pointer. Any function invocation without a member operator identifier ( expressionlist ) is treated as if it was a call to a inner function on this, viz.: this.identifier ( expressionlist ) and then lambda-lifting is applied. 7. The release statement, the user annotation, and hard release We now can use the keyword user to annotate any expression that creates a record. Using this notation will set the user-managed status bit of the created record to 1; otherwise it will be set to 0. The only way to release a record with the user-managed bit set is by having that record as one of the expressions in the release statement: release expressionlist ; Any expression in this immediately subjected to hard release (or explicit release). This is exactly like soft release, except that the user-managed bit is not checked in hard release, and the recursive releasing is hard. The recursive release is a little harder to implement, though. It involves going through the class s list of offsets that contain reference variables, and releasing each. Pseudocode for the release operation (hard or soft) becomes: if(classid == 0) done; if(classid == 9) { if(subtype-is-reference) { loop through array elements, releasing each

assert classid >= 128; classindex = classid-128; offsettable = offsettables[classindex]; for each offset in offsettable { release the element at that offset Aside from this complication in the recursion, the release mechanism for class records is the same as they were for arrays. Most of the release code that you already have should work with class records. 8. Break and continue statements These statements are only allowed inside the body of a while or for loop. This includes nested inside statements in the loop body. A break statement immediately jumps to the code immediately after the closest (most deeply nested) loop that contains it. A continue immediately jumps to the code for checking the condition on the closest loop. 9. Operator precedence The precedence of operators is Highest precedence (prefix unary operators are right-associative) parentheses populated array creation empty array creation class creation function invocation concatenation inner function access, array indexing not, copy, length, stringprinting casting : multiplicative operators * / additive operators + ( ) [ ] create[ ]( ) create ( ) ( ) [+ +]. [ ]! copy length $ comparisons < > <= >= ==!= and && Lowest precedence or These are all left-associative operators, except as noted.