Compilers Project 3: Semantic Analyzer CSE 40243 Due April 11, 2006 Updated March 14, 2006 Overview Your compiler is halfway done. It now can both recognize individual elements of the language (scan) and determine the structure of a string of the language (parse). The next step is to determine the meaning of the string based on its parse, which is what we call semantic analysis. The main goal of the semantic analysis is to build an Abstract Syntax Tree (AST), which the next project can use to generate code. In order for the code generator to do its job correctly, you also need to resolve names in the AST to symbols so that operations are consistent in the generated code. Finally, you will need to make sure types and their associated operations match up, otherwise your code generator may not be able to make sense of what. Abstract Syntax Tree You will once again use Bison for this part of the assignment. For each syntax rule of Bison, you will add actions for Bison to perform whenever it encounters such a rule, placing the necessary elements of the parse into the abstract syntax tree. As discussed in class, for each element of the AST (decl, stmt, expr, etc.), you should create a function that generates the element: decl make, stmt make, expr make, and so on. For example, a parser rule matching an if statement should create a statement structure like this: stmt : TOKEN_IF TOKEN_LPAREN expr TOKEN_RPAREN stmt { $$ = stmt_make( STMT_KIND_IF, $3, $5, 0, 0, 0 ); } Remember that yyparse() only returns an integer indicating success or failure. Thus, for the top-level rule, you must save the root of the abstract syntax tree in a global pointer to be examined later. Do something like this: %{ struct decl * ast_root; %} %% program : decl_list { ast_root = $1 }; 1
By default, Bison assumes that the result of each rule production is an integer. Although this makes sense for constructing a calculator interpreter, it is no good for an AST; each rule produces a pointer to a structure. Thus, we must tell Bison that each rule is going to produce a pointer to an arbitrary object like so: #define YYSTYPE void * Before embarking on the remaining steps, it is very important to ensure that you have generated the AST correctly. The easiest way to do this is to traverse the AST and print the program back out: you should see a program that is equivalent to the input, although missing comments and some whitespace. To do this, you should implement for each element of the AST a routine that prints the element on the standard output stream. Of course, these routines should traverse the entire AST: decl display should use stmt display which should use expr display, and so forth. Then to display the entire AST, simply invoke decl display(ast root); Name Resolution Once you have constructed the AST, you must then perform the name resolution algorithm described in class. The name resolution does two things while traversing the AST: it enters symbols into the scope table using scope bind and it matches names in expressions using scope lookup. As each use of a name in an expression is matched, the symbol field of the expression should be changed to point to the symbol. Note that we will provide you with a working scope table to use. Like the other routines, name resolution is also recursive: decl resolve should call stmt resolve and so forth. To help debug this step, your compiler should output a line every time that it resolves a name to a symbol. Typechecking Once the AST is constructed and names are bound to symbols, we are ready to begin checking types. As you might expect, this routine traverses down the AST, starting with decl typecheck, then stmt typecheck, expr typecheck, and so forth. Let s talk about these from the bottom up. As you know, every expression has a return type: (1+(a+b)) has type integer, while (a==b) has type boolean. The function expr typecheck must return the type of an expression in the form of a struct type *. This is a recursive action: expr typecheck(1+(a+b)) must result in calls to expr typecheck(1), expr typecheck(a+b), expr typecheck(a), and expr typecheck(b) in order to check the entire expression down to its leaves. For constants, the result of expr typecheck is obvious: it must return the type corresponding to the constant. For names, expr typecheck should return the type of the symbol corresponding to the name, as determined by the name resolution step. For operators, expr typecheck should evaluate the left and right sides of the expression, and then return the type that results from the operator: arithmetic expressions return type integer, comparison operators return type boolean. Once you have implemented typechecking for expressions, then implement decl typecheck and stmt typecheck which traverse the tree like all of the other operations. 2
Working together, these routines should enforce the following type rules: 1. Arithmetic operations must only accept integer operands. 2. Comparison operations may operate on any type, but both sides of the comparison must have the same type. 3. An assignment must have a name on the left side and an expression with the same type on the right side. 4. The arguments to a function call must match the types of the function parameters. 5. The controlling expression of an if or while statement must be boolean. 6. The type of an expression in a return statement must match the return type of the function it is in. In addition, the following semantic rules must be checked: 1. In a function or a code block, all variable declarations must come before any code statements. 2. A function may only appear in a global declaration. Required Output Your compiler must produce the following output: The input program, printed out from the AST. One line for each time a name is resolved to a symbol. If an undefined name is used, the compiler should stop immediately and print X is undefined. where X is the invalid name. If a type error is detected, the compiler should stop immediately and print a meaningful error message: e.g. type error: cannot assign string to char If there are no type errors, then display No type errors. For example, consider the following program: /* here is a comment */ int b=6; int main() { int x=3; int y; } y = x*b + c ; A compiler processing this program would produce: 3
int b=6; int main() { int x=3; int y; y=x*b+ c ; } y is local 1 x is local 0 b is global type error: cannot add int to char Starting Files As usual, the solution to the previous assignment will be provided as a starter in case that your own code was not entirely correct. We will also provide the header files which define the C-flat abstract syntax tree and symbol table. All these files can be located at the following URLs: file:///afs/nd.edu/coursesp.06/cse/cse40243.01/projects/semantics/code http:///www.nd.edu/courses/cse/cse40243.01/projects/semantics/code These files should be copied into your project s source code directory in order to complete your assignment. Grading The assignment will be graded out of 100 points and is worth 10 percent of the course grade. Grading will be based on functionality, but partial credit will be assigned if, due to clearly formatted and commented code, the grader can determine what is going on. As usual, up to 5 points extra credit will be given for any significant improvement to the compilation process or extension to the language and will be based logarithmically on the difficulty of the addition. Meaningful error handling would be a good example. Deliverable This project is due at 9:30 am on Tuesday, April 11. Project source should be archived into a tar file (compression optional) and placed in the /afs/nd.edu/coursesp.06/cse/cse40243.01/ dropbox/netid/semantics directory, where NETID is your Notre Dame NetID. An extra credit attempt for this project should be documented in a file called EXTRA.semantics at the root of your submitted project package. The documentation should explain what has been done to extend/enhance the project and how to go about testing this claim. The submitted package should also contain any test code if required by the nature of the extension/improvement. 4
Coding Your code should conform to the coding conventions mentioned in class. Failure to do so will result in inoperable code. For more information on naming functions and other coding guidelines, see the C AST handout available on the website and the provided header files. 5