CS 91.406/534 Compiler Construction University of Massachusetts Lowell Professor Li Xu Fall 2004 Lab Project 2: Parser and Type Checker for NOTHING Due: Sunday, November 14, 2004, 11:59 PM 1 Introduction This project is intended to give you experience building a parser and conducting context-sensitive analysis. You will build a parser for the demonstration language NOTHING. You will either use an automatic parser generator tool, or build a hand-coded, recursive-descent parser. In either case, you will get experience manipulating a programming language grammar and performing some limited context-sensitive analysis. Read this document and the accompanying NOTHING document completely before embarking on this adventure. 2 Project Requirements Your task is to construct a parser that accepts the NOTHING programming language. The parser will obtain input by calling a lexical analyzer (scanner) on a word-by-word basis. The parser will perform the following functions: 1. It will determine whether or not the input NOTHING program presented for compilation is, in fact, a syntactically correct NOTHING program. If the program is not syntactically correct, a listing of the various syntax errors should be printed to the standard error file (stderr). 2. It will perform some limited context-sensitive analysis to discover whether the alleged NOTHING program presented for compilation adheres to the context-sensitive rules for NOTHING programs. If the program has context-sensitive errors, they should be reported to the user on the stderr file. 3. It will build an abstract syntax tree (AST) for the input program if the program can be successfully parsed. The parser then performs a tree walk on the AST tree and prints out the following statistics about the input program: the number of INTEGER, FLOAT, CHARACTER and ARRAY variables that have been declared in both the main program and subprograms. Part of the documents are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr. Linda Torczon s teaching materials at Rice University. All rights reserved. 1
Operator Precedence *, MOD 5 +, - 4 <, <=, >, >=, =, <> 3 NOT 2 AND, OR 1 Figure 1: Operator precedences in NOTHING the number of user functions and procedures that have been defined. the number of Invocation and WhileStmtstatements in the program. For WhileStmt, the parser should also print out an itemized breakdown ordered by the nesting levels of the statements. The statistics data should be printed to the stdout file. 3 The Parser You are to construct a parser. Either use an automatic parser generator like ANTLR or yacc/bison, or hand-code a recursive-descent parser. The directions in this section assume that you are using ANTLR. Similar steps apply to other parser generators. For a hand-coded recursive-descent parser, you will follow different, but analogous steps. You should proceed in the following steps: 1. Read the ANTLR documentation and familiarize yourself with the tool. In the course directory ~cs406/lab2/parser example, we have installed a sample arithmetic expression parser along with the AST tree generator. A brief description of the sample parser and how to build and run the parser can be found on the course web page. You may also look at other examples from the ANTLR source package. 2. Convert the grammar to a form suitable for use with ANTLR. This includes translating it into the ANTLR input format, massaging it to ensure that the proper associativity and precedence are enforced, and ensuring that the resulting grammar is accepted by ANTLR with no errors (Note: the NOTHING grammar, as written, is ambiguous). Check the robustness of your parser against the test files provided in ~cs406/lab2/test files. 3. Extend the simple parser by adding actions to be performed on the various productions. The code in these actions should check for compliance with the context-sensitive rules described later in this document and in the NOTHING document. 4. Improve the context-free error handling of your parser. ANTLR provides an exception-based error handling mechanism. You need to define parsing exception handlers to recover from common syntax errors and provide meaningful diagnosis of possible errors. The parser should be able to detect as many syntax errors as possible during a single pass over the input program. 5. Build the AST tree representation of the input program. You can either annotate the ANTLR grammar and use the built-in AST tree generator, or generate the tree nodes from scratch in the action code for the grammar rules. Write a separate AST tree walker to traverse the tree in post-order and print out the required statistics of the input program. 4 NOTHING Specification The syntax of NOTHING is specified in a document titled NOTHING: A Language For Practice Implementation. The grammar given in that document will need to be massaged to create one that is acceptable for input to ANTLR. The following additional information may be of use. 2
1. All the operators in NOTHING should be left associative. 2. Operator precedences in NOTHING are specified in Figure 1. Multiplication and MOD have the highest priority; AND, OR, and NOT have the lowest. 3. The scope of a name is the region of the program in which the name can be used. Variables declared in a subprogram are only visible within that subprogram. They obscure an identically named variable or procedure in the surrounding scope. The scopes of distinct subprograms are disjoint a name declared in one subprogram is not visible inside another subprogram. The scope defined by the main program is called the global scope. Names declared in the global scope are accessible from the point of declaration to the end of the program, including the body of the main program and any subprograms that do not redeclare the same name. All variables declared in the main program are in the global scope. 4. Function names are in the global scope. This has two important implications. First, a function can be called from any other function, provided that the calling function has not redefined that name. (This is true, even if the function being called appears later in the source text.) Second, any function can be called recursively. 5. The lifetime of a variable is limited to the execution of the procedure (main program or function) that defines the scope in which it is declared. Thus, the value of a variable x declared in function f ceases to exist after the function returns. 5 Type Checking and Context-Sensitive Analysis To compile a NOTHING program requires a large amount of knowledge that cannot be encoded into a context-free grammar. Thus, you must augment the standard ANTLR parser to compute the required information. For this lab, your context-sensitive analysis should detect the following errors. 1. a variable is referenced but not declared 2. a variable is declared multiple times in a single scope 3. a variable is declared but never referenced (defined or used) 4. any type mismatch (illegal mixed type expression), see Section 6. 5. incorrect subscript type in array variable reference 6. a constant-valued subscript that is outside the declared bounds of an array 7. a procedure call that invokes a function (incorrectly discarding the return value) 8. a function call that invokes a procedure (incorrectly using a non-existent return value) 9. any type mismatch between actual parameters at a call site and the definition of the called procedure. This list should not be considered exhaustive. As you discover other errors, add them to your project s list. Write a test program that exposes the error and include it in the materials that you submit as your final report. 6 Mixed Type Expressions NOTHING supports three basic data types: integer, floating point, and character. Each expression and subexpression has a type that can be determined at compile time. Your lab should determine (1) the type of each subexpression, (2) where coercions must be inserted, and (3) where invalid type combinations exist. Figure 2 gives type conversion tables for several of the NOTHING operators. Several operators are idiosyncratic. 3
Table for +, -, * int float int int float float float float Table for AND, OR char int float char char error error int error int int float error int float Table for <, <=, >, >=, =, <> char int float char int error error int error int int float error int int Figure 2: Conversion tables for mixed mode expressions 1. The type of a subscripted name is wholly determined by the array s type declaration it is independent of the type of the subscript expression. The type of the subscript expression must match the type in the array s dimension declaration. 2. Similarly, the type of a function call is determined by the function s definition rather than by the types of any actual parameters at the call site. 3. Assignment uses an idiosyncratic and asymmetric rule. For a left-hand side of type integer or float, the right-hand side is converted to the type of the left hand side. If the left-hand side is of type character, the right-hand side must have type character. 4. The result of a NOT always has type integer. 5. Relational operators always produce an integer. Comparisons between characters and numbers make no sense; they are illegal. Comparisons between integers and floats produce integer results. To perform the comparison, the integer is converted to a float. Your lab will need to recognize when conversions are required and report any expressions that would require an illegal coercion. It may be useful during debugging to have the parser report all coercions, both legal and illegal. This is a good use for a command-line debugging flag. 7 Debugging Advice It is next to impossible to debug your parser by entering grammar rules and action statements for the whole NOTHING language and attempting to debug it all at once. You should enter actions for a few rules and test, then continue to add more rules and actions and test. ANTLR provides several useful debugging aids. You can turn on the -traceparser option to trace the entry/exit of matching production rules. You may also use the more advanced parse-tree debugging feature to print out the derivations during parsing. Refer to the ANTLR manual for more details. 8 Error Handling Your parser should recover from errors found during parsing by printing an appropriate error message and continuing to process the input. ANTLR provides an exception-based error recovery and handling mechanism. You can define exception handlers for a specific production rule or non-terminal symbol rule set. ANTLR will also generate an exception handler if no explicit handler is defined. The default exception handler will report an error, synchronize to the follow set of the rule, and return from that rule. You should extend this relatively simple error handling to provide more informative diagnosis of parsing errors. 9 Electronic Turn-in and Due Date The due date of lab2 is Sunday, November 14, 2004, 11:59 PM (including documentation). We will run and test your code on the CS network machines. You should make sure your code can compile and run on mercury.cs.uml.edu or similar CS machines. Your turn-in package should include your lab implementation with all supporting files source code, make/build file, test files, etc. Your turn-in 4
should also include a README file and a brief lab report (3 5 pages). The README file should list and describe the source files in your directory, and gives directions on how to build and run your parser; your lab report should provide a brief discussion of your implementation, a summary of your testing procedures (and pointers to any test files you created on your own), and a discussion of your parser s error handling capabilities. Your lab report should be in either PostScript or PDF format. The files in your turn-in package including the documentation should have the last modification time no later than 11:59 PM on the due date. To turn in the lab, leave all the code and the documentation in a directory named Lab2 on your CS account and send email to cs406@cs.uml.edu, indicating the directory location in the message body. Be sure to set the permissions so that the professor can read the directory and execute your code. 10 Grading Criteria The criteria for grading this lab are as follows: 50% of your grade will be based on the functionality and error detection capability of your parser; 20% will be based on your testing procedures and the coverage of your test cases; 15% will be based on the AST tree generation and tree walker functionality; and 15% for documentation. 5