CS 91.406/534 Compiler Construction University of Massachusetts Lowell Professor Li Xu Fall 2004 NOTHING: A Language for Practice Implementation 1 Introduction NOTHING is a programming language designed for practice implementation. NOTHING is a simplified version of Pascal that supports simple integer and floating point calculations, as well as simple string manipulation. It is not intended as a replacement for Pascal, C, or other more full-fledged languages. By design, it has few of the features that programmers found useful in writing programs. It is intended to be simple enough to implement in a single semester, but powerful enough to illustrate many common programming language features. It avoids complications, like arbitrarily deep nesting of blocks, that have little instructional value while retaining some features, like recursion, that illuminate fundamental problems of compiler design. The various features included in the language were added specifically to illustrate some problem that arises in the design and implementation of a compiler. NOTHING supports three basic data types: integer, floating point, and character. Each of these types may be aggregated into one dimensional arrays. A number of operators are defined for each type. You can assume that the underlying hardware supports integers with 32-bit, two s-complement arithmetic and floating point with a 32-bit implementation of the IEEE floating point standard. Control structures in NOTHING are limited. It has an if statement, a while statement, a case statement, and a compound statement. Procedures may be recursive. They can only be declared within the main program, but not within other procedures. (NOTHING supports only one level of lexical nesting.) The notion of separate compilation is foreign to NOTHING. The entire program is presented to the compiler in a single compilation step. This simplifies some kinds of context-sensitive analysis that are otherwise difficult. For example, the compiler can check argument lists at call sites against the definitions of the corresponding formal parameters. The language has a trivial type system: the type of each expression can be determined at compile time. Some coercions from one type to another are permitted. Since there is no boolean data type, integers are used as logicals in a manner similar to C. 2 Lexical Properties of NOTHING 1. In NOTHING, blanks are significant. 2. NOTHING is case sensitive, that is, X and x are distinct names. Keywords are written in capital letters. All keywords are reserved the programmer cannot use a NOTHING keyword as the name Part of the documents are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr. Linda Torczon s teaching materials at Rice University. All rights reserved. 1
of a variable. The valid keywords are: AND, ARRAY,, CASE, CHARACTER, DO, ELSE,, FLOAT, FUNCTION, IF, INTEGER, MOD, NOT, OF, OR, PROCEDURE, PROGRAM, READ, RETURN, THEN, VAR, WHILE, WRITE. Thus, is a keyword, but end can be a variable name. 3. The following special characters have meaning in a Nothing program: { } < > = + - * [ ] ( )., ; : The grammar and section notes provide the details. 4. Comments are delimited by the characters { and }. A { begins a comment; it is valid in no other context. A } ends a comment; it cannot appear inside a comment. Comments cannot be nested. { can appear inside a comment; the first } closes the comment. Comments may appear before or after any other token. 5. Identifiers are written with upper and lowercase letters and are defined as follows: Letter a b c... z A B... Z Digit 0 1 2... 9 Identifier Letter ( Letter Digit ) The implementor may restrict the length of identifiers to any number larger than 31 characters. 6. Constants are defined as follows: Constant IntNum FloatNum CharConst Positive 1 2 3... 9 Sign + - ɛ IntNum Positive Digit 0 FloatNum IntNum. IntNum. IntNum IntNum. E Sign IntNum IntNum. IntNum E Sign IntNum CharConst Letter StringConstant Letter Multi-letter string constants are acceptable in WRITE statements. 7. Operators: RelOp < <= >= > = <> Note: <> denotes inequality. AddOp + - MulOp * MOD LogOp OR AND 3 NOTHING Syntax The syntax of NOTHING is described using a modernized BNF grammar. (See Section 3.2 of Cooper & Torczon, or the lecture notes.) Following the BNF are implementation notes for the various parts of the grammar. The grammar, as stated, defines the language. It may require some massaging before implementation with any particular parser generator system. For example, you may need to remove left-recursions if you use ANTLR or other top-down parser generators. The following grammar describes the context-free syntax of NOTHING: 2
Program PROGRAM Identif ier; CaseStmt CASE Expr OF Cases DeclSet Cases CaseElt SubProgs Cases ; CaseElt Block CaseElt CaseLabels : Stmt DeclSet VAR Decls ɛ ɛ CaseLabels Constant Decls Idents: Type ; CaseLabels, Constant Decls Idents : Type ; IfStmt IF Expr THEN Stmt ELSE Stmt Idents Identif ier IF Expr THEN Stmt Idents, Identifier IOStmt READ ( V ariable ) Type StdType WRITE ( Expr ) ArrayType WRITE ( StringConstant ) StdT ype INTEGER Invocation Identif ier () FLOAT Identif ier ( Exprs ) CHARACTER Block Stmts ArrayType ARRAY [ Dim ] OF StdType Stmts Stmt Dim IntNum.. IntNum Stmts ; Stmt CharConst.. CharConst Return RETURN Expr SubProgs SubProgs SubProg ; RETURN SubProg Head DeclSet Block Exprs Expr Head FUNCTION Identif ier Exprs, Expr Args : StdType Expr Expr Op Expr PROCEDURE Identif ier NOT Expr Args ; Factor Args ( Params ) Op LogOp Params Idents : Type RelOp Params ; Idents : Type AddOp Stmt Assignment MulOp IfStmt Factor V ariable WhileStmt Constant CaseStmt ( Expr ) Invocation Function IOStmt V ariable Identif ier Block Identifier [ Expr ] Return Function Identif ier () Assignment V ariable := Expr Identif ier ( Exprs ) WhileStmt WHILE Expr DO Stmt 4 NOTHING Specification Notes 4.1 Declarations NOTHING supports three standard data types: INTEGER, FLOAT and CHARACTER. Integers and floats occupy in a single machine word, while a character is stored in a single byte. These standard types may be composed into the structured ARRAY type. An identifier may represent one of four types of objects: 1. an integer variable or array 2. a floating point variable or array 3. a character variable or array 4. a procedure or function name 3
Identifiers are declared to be variables or arrays by a VAR declaration. They are declared to be procedure names by PROCEDURE and FUNCTION declarations. Only singly dimensioned arrays are permitted in NOTH- ING, but arbitrary upper and lower index bounds are permitted. Arrays may be indexed by characters. As mentioned earlier, NOTHING is case-sensitive. Procedure names are drawn from the same set as variable names. Thus, foo can be either a variable or a procedure. No single name scope can contain both a procedure named foo and a variable named foo, but both FOO and Foo can be declared in the same scope as foo. Example: VAR x,y : INTEGER; c1, c2, c3 : CHARACTER; a : ARRAY [ 1.. 15 ] OF INTEGER; s1, s2 : ARRAY [ 0.. 79 ] OF CHARACTER; table : ARRAY [ a.. z ] OF INTEGER; 4.2 Procedure Declarations The distinction between a function and a procedure lies in the mechanism for returning a value to the calling procedure. A FUNCTION returns a value; a PROCEDURE does not. A FUNCTION returns the value of the expression specified in the RETURN statement that it executes. The RETURN statement also transfers control back to the calling procedure, to the point immediately after the FUNCTION s invocation. A PROCEDURE returns control to the calling routine by executing the last statement in the PROCEDURE or the blank RETURN statement which has no return value. Example: FUNCTION max ( a, b: INTEGER ) : INTEGER; IF a < b { return the larger value } THEN RETURN b ELSE RETURN a { tie goes to 1st value } ; 4.3 Assignment Statement The assignment statement requires that its left-hand side (the V ariable) and its right-hand side (the Expr) evaluate to the same type. If they have different types, either coercion is required or a context-sensitive error has occurred. The coercion rules for assignment are simple. If both sides are numeric (of type INTEGER or FLOAT), the right-hand side is converted to the type of the left-hand side. If either side is of type CHARACTER, both sides must be CHARACTER (or the program contains a context-sensitive error). 4.4 If Statement The grammar for the IF-THEN-ELSE construct embodies one of the classic examples of a context-free ambiguity the dangling else problem. You should rewrite that portion of the grammar to resolve the ambiguity. The language designer intends that an ELSE be bound to the nearest unbound THEN. To evaluate an IF statement, the expression is evaluated. If the expressions type is CHARACTER, the procedure contains a context-sensitive error. If its type is FLOAT, it should be converted to an INTEGER. For an integer value, NOTHING defines 0 as false; any other value is equivalent to true. Examples: IF c=d THEN d := a IF b=0 THEN b := 2*a ELSE b := b/2 4
4.5 While Statement The WHILE statement provides a simple mechanism for iteration. The WHILE statement executes the statement under its control, sometimes called the loop body, until the controlling expression becomes false. Again, 0 is treated as false while any other value is treated as true. The controlling expression will be treated as a boolean value encoded into an INTEGER expression. If the expression is not of type INTEGER, the same coercion rules apply as in the IF statement. 4.6 Case Statement The expression in the CASE statement must evaluate to the same type as the case label constants. (Of course, this implies that all the label constants must be of the same type within a single CASE statement. Example: CASE i OF 1: x:=a; 2: x:=b; 3,4: x:= c 4.7 Procedure Invocation NOTHING uses parentheses to indicate invocation and square brackets to indicate subscripting of an array. Any procedure invocation has parentheses even if it has no arguments. (These are called niladic parentheses.) This avoids an ambiguity that occurs if (1) zero-argument functions have no parentheses, or (2) parentheses are used for both arrays and procedure calls. Nothing passes all parameters as call-by-reference formal parameters. At execution, actual parameters are evaluated left-to-right. The compiler must create unique storage copies of any literal constants passed as actual parameters. 4.8 Input-Output Statements NOTHING provides two primitives for input and output. The READ and WRITE statements are intended to provide direct access to primitives implemented in the target abstract machine. Examples: READ (x) WRITE (x+y) WRITE (error) 4.9 Expressions Nothing expressions compute simple values of type INTEGER, FLOAT, or CHARACTER. Addition, subtraction, multiplication, and comparison are defined for both integer and floating point numbers. (Division is omitted deliberately.) For characters, comparison is the only defined operation. The standard ASCII collating sequence is assumed. Coercion If an expression contains operands of only one type, evaluation is straight forward. When an operand contains mixed types, the situation is more complex. Characters cannot appear as operands of any addop or mulop. Such usage constitutes a context-sensitive error. If an addop or mulop has an INTEGER operand and a FLOAT operand, the INTEGER operand should be converted to a FLOAT before the operation is performed. The relational operators are only defined when both operands have the same type. For numbers, comparison is based on both sign and magnitude. For characters, comparison is based on the standard ASCII collating sequence. Any comparison between unlike types constitutes a context-sensitive error. 5
Note In an assignment, the value of a numeric expression gets converted to match the type of the variable that appears on its left hand side. (See Section 4.3) Booleans Because NOTHING has no boolean values, relational expressions are defined to yield integer results. Thus, a relational expression of the form a = b is considered to be an arithmetic expression whose value is 1 if the relation holds and 0 otherwise. Hence, both the IF-THEN-ELSE and WHILE statements test integer values; the expression is considered false if it evaluates to 0 and to true if it evaluates to anything else. Consider the following example which tests for either of two conditions being true: READ (a); READ (b); READ (c); READ (d); IF (a = b) + (c < d) THEN WRITE ( error ) Note that relational expressions must be enclosed in parentheses because they have very low precedence. In the above example, a, b, c, and d may be variables of any type. In the above example, the OR operator could have been used instead of +. TheOR operator takes two integer operands and produces the result 0 if both operands evaluate to 0; otherwise, it produces 1. The operator AND evaluates to 1 if both operands are nonzero; otherwise it evaluates to 0. The unary logical operator NOT evaluates to 1 if its argument is zero and to 0 otherwise. Notice that using OR would make the parentheses redundant. Unary Minus NOTHING does not include either a unary minus operator or an optional negative sign on the front of a numeric constant. If you finish your parser early, consider adding a unary minus to the grammar. Of course, it should have highest precedence. 5 An Example Program The following program represents a simple example program written in NOTHING. This program successively reads pairs of integers from the input and prints out their greatest common divisor. PROGRAM example; VAR x, y : INTEGER; FUNCTION gcd (a,b: INTEGER):INTEGER; IF b=0 THEN RETURN a ELSE RETURN gcd(b, a MOD b) ; READ (x); READ (y); WHILE (x <> 0) OR (y <> 0) DO WRITE (gcd (x,y)); READ (x); READ (y) 6