CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

Similar documents
Lexical Considerations

The PCAT Programming Language Reference Manual

1 Lexical Considerations

Lexical Considerations

DEMO A Language for Practice Implementation Comp 506, Spring 2018

CS /534 Compiler Construction University of Massachusetts Lowell

The SPL Programming Language Reference Manual

Language Reference Manual simplicity

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

EECS 6083 Intro to Parsing Context Free Grammars

Decaf Language Reference Manual

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

This book is licensed under a Creative Commons Attribution 3.0 License

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part2 3.3 Parse Trees and Abstract Syntax Trees

CS 406/534 Compiler Construction Parsing Part I

CPS 506 Comparative Programming Languages. Syntax Specification

CA4003 Compiler Construction Assignment Language Definition

Decaf Language Reference

Sprite an animation manipulation language Language Reference Manual

Mirage. Language Reference Manual. Image drawn using Mirage 1.1. Columbia University COMS W4115 Programming Languages and Translators Fall 2006

The Decaf Language. 1 Lexical considerations

A Short Summary of Javali

SMPL - A Simplified Modeling Language for Mathematical Programming

ASML Language Reference Manual

Reference Grammar Meta-notation: hfooi means foo is a nonterminal. foo (in bold font) means that foo is a terminal i.e., a token or a part of a token.

IC Language Specification

A Simple Syntax-Directed Translator

2.2 Syntax Definition

CHIL CSS HTML Integrated Language

CSc 10200! Introduction to Computing. Lecture 2-3 Edgardo Molina Fall 2013 City College of New York

Angela Z: A Language that facilitate the Matrix wise operations Language Reference Manual

UNIT- 3 Introduction to C++

Petros: A Multi-purpose Text File Manipulation Language

Typescript on LLVM Language Reference Manual

L-System Fractal Generator: Language Reference Manual

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

M/s. Managing distributed workloads. Language Reference Manual. Miranda Li (mjl2206) Benjamin Hanser (bwh2124) Mengdi Lin (ml3567)

A simple syntax-directed

The Warhol Language Reference Manual

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

GAWK Language Reference Manual

KU Compilerbau - Programming Assignment

The Decaf language 1

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Principles of Programming Languages COMP251: Syntax and Grammars

Spoke. Language Reference Manual* CS4118 PROGRAMMING LANGUAGES AND TRANSLATORS. William Yang Wang, Chia-che Tsai, Zhou Yu, Xin Chen 2010/11/03

TML Language Reference Manual

Crayon (.cry) Language Reference Manual. Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361)

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

SECTION II: LANGUAGE BASICS

Assoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

CSCE 110 PROGRAMMING FUNDAMENTALS

Reference Grammar Meta-notation: hfooi means foo is a nonterminal. foo (in bold font) means that foo is a terminal i.e., a token or a part of a token.

Learning Language. Reference Manual. George Liao (gkl2104) Joseanibal Colon Ramos (jc2373) Stephen Robinson (sar2120) Huabiao Xu(hx2104)

Syntax. A. Bellaachia Page: 1

Chapter 3: Operators, Expressions and Type Conversion

ARG! Language Reference Manual

Flow Language Reference Manual

Parsing II Top-down parsing. Comp 412

FRAC: Language Reference Manual

Full file at

3. Context-free grammars & parsing

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

DINO. Language Reference Manual. Author: Manu Jain

Syntax Intro and Overview. Syntax

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Lecture 2 Tao Wang 1

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

SMURF Language Reference Manual Serial MUsic Represented as Functions

Introduction to Programming Using Java (98-388)

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

There are four numeric types: 1. Integers, represented as a 32 bit (or longer) quantity. Digits sequences (possibly) signed are integer literals:

VENTURE. Section 1. Lexical Elements. 1.1 Identifiers. 1.2 Keywords. 1.3 Literals

CLIP - A Crytographic Language with Irritating Parentheses

EZ- ASCII: Language Reference Manual

Introduction. Following are the types of operators: Unary requires a single operand Binary requires two operands Ternary requires three operands

The Chirp Language Specification

CS415 Compilers. Procedure Abstractions. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

EXPRESSIONS AND ASSIGNMENT CITS1001

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Scheme Quick Reference

COLOGO A Graph Language Reference Manual

d-file Language Reference Manual

Operators. Java operators are classified into three categories:

JME Language Reference Manual

Introduction to Parsing

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Objectives. Chapter 2: Basic Elements of C++ Introduction. Objectives (cont d.) A C++ Program (cont d.) A C++ Program

String Computation Program

Chapter 2: Basic Elements of C++

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

Chapter 2: Basic Elements of C++ Objectives. Objectives (cont d.) A C++ Program. Introduction

Flow Control. CSC215 Lecture

Functional Programming Languages (FPL)

egrapher Language Reference Manual

Transcription:

CS 91.406/534 Compiler Construction University of Massachusetts Lowell Professor Li Xu Fall 2004 NOTHING: A Language for Practice Implementation 1 Introduction NOTHING is a programming language designed for practice implementation. NOTHING is a simplified version of Pascal that supports simple integer and floating point calculations, as well as simple string manipulation. It is not intended as a replacement for Pascal, C, or other more full-fledged languages. By design, it has few of the features that programmers found useful in writing programs. It is intended to be simple enough to implement in a single semester, but powerful enough to illustrate many common programming language features. It avoids complications, like arbitrarily deep nesting of blocks, that have little instructional value while retaining some features, like recursion, that illuminate fundamental problems of compiler design. The various features included in the language were added specifically to illustrate some problem that arises in the design and implementation of a compiler. NOTHING supports three basic data types: integer, floating point, and character. Each of these types may be aggregated into one dimensional arrays. A number of operators are defined for each type. You can assume that the underlying hardware supports integers with 32-bit, two s-complement arithmetic and floating point with a 32-bit implementation of the IEEE floating point standard. Control structures in NOTHING are limited. It has an if statement, a while statement, a case statement, and a compound statement. Procedures may be recursive. They can only be declared within the main program, but not within other procedures. (NOTHING supports only one level of lexical nesting.) The notion of separate compilation is foreign to NOTHING. The entire program is presented to the compiler in a single compilation step. This simplifies some kinds of context-sensitive analysis that are otherwise difficult. For example, the compiler can check argument lists at call sites against the definitions of the corresponding formal parameters. The language has a trivial type system: the type of each expression can be determined at compile time. Some coercions from one type to another are permitted. Since there is no boolean data type, integers are used as logicals in a manner similar to C. 2 Lexical Properties of NOTHING 1. In NOTHING, blanks are significant. 2. NOTHING is case sensitive, that is, X and x are distinct names. Keywords are written in capital letters. All keywords are reserved the programmer cannot use a NOTHING keyword as the name Part of the documents are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr. Linda Torczon s teaching materials at Rice University. All rights reserved. 1

of a variable. The valid keywords are: AND, ARRAY,, CASE, CHARACTER, DO, ELSE,, FLOAT, FUNCTION, IF, INTEGER, MOD, NOT, OF, OR, PROCEDURE, PROGRAM, READ, RETURN, THEN, VAR, WHILE, WRITE. Thus, is a keyword, but end can be a variable name. 3. The following special characters have meaning in a Nothing program: { } < > = + - * [ ] ( )., ; : The grammar and section notes provide the details. 4. Comments are delimited by the characters { and }. A { begins a comment; it is valid in no other context. A } ends a comment; it cannot appear inside a comment. Comments cannot be nested. { can appear inside a comment; the first } closes the comment. Comments may appear before or after any other token. 5. Identifiers are written with upper and lowercase letters and are defined as follows: Letter a b c... z A B... Z Digit 0 1 2... 9 Identifier Letter ( Letter Digit ) The implementor may restrict the length of identifiers to any number larger than 31 characters. 6. Constants are defined as follows: Constant IntNum FloatNum CharConst Positive 1 2 3... 9 Sign + - ɛ IntNum Positive Digit 0 FloatNum IntNum. IntNum. IntNum IntNum. E Sign IntNum IntNum. IntNum E Sign IntNum CharConst Letter StringConstant Letter Multi-letter string constants are acceptable in WRITE statements. 7. Operators: RelOp < <= >= > = <> Note: <> denotes inequality. AddOp + - MulOp * MOD LogOp OR AND 3 NOTHING Syntax The syntax of NOTHING is described using a modernized BNF grammar. (See Section 3.2 of Cooper & Torczon, or the lecture notes.) Following the BNF are implementation notes for the various parts of the grammar. The grammar, as stated, defines the language. It may require some massaging before implementation with any particular parser generator system. For example, you may need to remove left-recursions if you use ANTLR or other top-down parser generators. The following grammar describes the context-free syntax of NOTHING: 2

Program PROGRAM Identif ier; CaseStmt CASE Expr OF Cases DeclSet Cases CaseElt SubProgs Cases ; CaseElt Block CaseElt CaseLabels : Stmt DeclSet VAR Decls ɛ ɛ CaseLabels Constant Decls Idents: Type ; CaseLabels, Constant Decls Idents : Type ; IfStmt IF Expr THEN Stmt ELSE Stmt Idents Identif ier IF Expr THEN Stmt Idents, Identifier IOStmt READ ( V ariable ) Type StdType WRITE ( Expr ) ArrayType WRITE ( StringConstant ) StdT ype INTEGER Invocation Identif ier () FLOAT Identif ier ( Exprs ) CHARACTER Block Stmts ArrayType ARRAY [ Dim ] OF StdType Stmts Stmt Dim IntNum.. IntNum Stmts ; Stmt CharConst.. CharConst Return RETURN Expr SubProgs SubProgs SubProg ; RETURN SubProg Head DeclSet Block Exprs Expr Head FUNCTION Identif ier Exprs, Expr Args : StdType Expr Expr Op Expr PROCEDURE Identif ier NOT Expr Args ; Factor Args ( Params ) Op LogOp Params Idents : Type RelOp Params ; Idents : Type AddOp Stmt Assignment MulOp IfStmt Factor V ariable WhileStmt Constant CaseStmt ( Expr ) Invocation Function IOStmt V ariable Identif ier Block Identifier [ Expr ] Return Function Identif ier () Assignment V ariable := Expr Identif ier ( Exprs ) WhileStmt WHILE Expr DO Stmt 4 NOTHING Specification Notes 4.1 Declarations NOTHING supports three standard data types: INTEGER, FLOAT and CHARACTER. Integers and floats occupy in a single machine word, while a character is stored in a single byte. These standard types may be composed into the structured ARRAY type. An identifier may represent one of four types of objects: 1. an integer variable or array 2. a floating point variable or array 3. a character variable or array 4. a procedure or function name 3

Identifiers are declared to be variables or arrays by a VAR declaration. They are declared to be procedure names by PROCEDURE and FUNCTION declarations. Only singly dimensioned arrays are permitted in NOTH- ING, but arbitrary upper and lower index bounds are permitted. Arrays may be indexed by characters. As mentioned earlier, NOTHING is case-sensitive. Procedure names are drawn from the same set as variable names. Thus, foo can be either a variable or a procedure. No single name scope can contain both a procedure named foo and a variable named foo, but both FOO and Foo can be declared in the same scope as foo. Example: VAR x,y : INTEGER; c1, c2, c3 : CHARACTER; a : ARRAY [ 1.. 15 ] OF INTEGER; s1, s2 : ARRAY [ 0.. 79 ] OF CHARACTER; table : ARRAY [ a.. z ] OF INTEGER; 4.2 Procedure Declarations The distinction between a function and a procedure lies in the mechanism for returning a value to the calling procedure. A FUNCTION returns a value; a PROCEDURE does not. A FUNCTION returns the value of the expression specified in the RETURN statement that it executes. The RETURN statement also transfers control back to the calling procedure, to the point immediately after the FUNCTION s invocation. A PROCEDURE returns control to the calling routine by executing the last statement in the PROCEDURE or the blank RETURN statement which has no return value. Example: FUNCTION max ( a, b: INTEGER ) : INTEGER; IF a < b { return the larger value } THEN RETURN b ELSE RETURN a { tie goes to 1st value } ; 4.3 Assignment Statement The assignment statement requires that its left-hand side (the V ariable) and its right-hand side (the Expr) evaluate to the same type. If they have different types, either coercion is required or a context-sensitive error has occurred. The coercion rules for assignment are simple. If both sides are numeric (of type INTEGER or FLOAT), the right-hand side is converted to the type of the left-hand side. If either side is of type CHARACTER, both sides must be CHARACTER (or the program contains a context-sensitive error). 4.4 If Statement The grammar for the IF-THEN-ELSE construct embodies one of the classic examples of a context-free ambiguity the dangling else problem. You should rewrite that portion of the grammar to resolve the ambiguity. The language designer intends that an ELSE be bound to the nearest unbound THEN. To evaluate an IF statement, the expression is evaluated. If the expressions type is CHARACTER, the procedure contains a context-sensitive error. If its type is FLOAT, it should be converted to an INTEGER. For an integer value, NOTHING defines 0 as false; any other value is equivalent to true. Examples: IF c=d THEN d := a IF b=0 THEN b := 2*a ELSE b := b/2 4

4.5 While Statement The WHILE statement provides a simple mechanism for iteration. The WHILE statement executes the statement under its control, sometimes called the loop body, until the controlling expression becomes false. Again, 0 is treated as false while any other value is treated as true. The controlling expression will be treated as a boolean value encoded into an INTEGER expression. If the expression is not of type INTEGER, the same coercion rules apply as in the IF statement. 4.6 Case Statement The expression in the CASE statement must evaluate to the same type as the case label constants. (Of course, this implies that all the label constants must be of the same type within a single CASE statement. Example: CASE i OF 1: x:=a; 2: x:=b; 3,4: x:= c 4.7 Procedure Invocation NOTHING uses parentheses to indicate invocation and square brackets to indicate subscripting of an array. Any procedure invocation has parentheses even if it has no arguments. (These are called niladic parentheses.) This avoids an ambiguity that occurs if (1) zero-argument functions have no parentheses, or (2) parentheses are used for both arrays and procedure calls. Nothing passes all parameters as call-by-reference formal parameters. At execution, actual parameters are evaluated left-to-right. The compiler must create unique storage copies of any literal constants passed as actual parameters. 4.8 Input-Output Statements NOTHING provides two primitives for input and output. The READ and WRITE statements are intended to provide direct access to primitives implemented in the target abstract machine. Examples: READ (x) WRITE (x+y) WRITE (error) 4.9 Expressions Nothing expressions compute simple values of type INTEGER, FLOAT, or CHARACTER. Addition, subtraction, multiplication, and comparison are defined for both integer and floating point numbers. (Division is omitted deliberately.) For characters, comparison is the only defined operation. The standard ASCII collating sequence is assumed. Coercion If an expression contains operands of only one type, evaluation is straight forward. When an operand contains mixed types, the situation is more complex. Characters cannot appear as operands of any addop or mulop. Such usage constitutes a context-sensitive error. If an addop or mulop has an INTEGER operand and a FLOAT operand, the INTEGER operand should be converted to a FLOAT before the operation is performed. The relational operators are only defined when both operands have the same type. For numbers, comparison is based on both sign and magnitude. For characters, comparison is based on the standard ASCII collating sequence. Any comparison between unlike types constitutes a context-sensitive error. 5

Note In an assignment, the value of a numeric expression gets converted to match the type of the variable that appears on its left hand side. (See Section 4.3) Booleans Because NOTHING has no boolean values, relational expressions are defined to yield integer results. Thus, a relational expression of the form a = b is considered to be an arithmetic expression whose value is 1 if the relation holds and 0 otherwise. Hence, both the IF-THEN-ELSE and WHILE statements test integer values; the expression is considered false if it evaluates to 0 and to true if it evaluates to anything else. Consider the following example which tests for either of two conditions being true: READ (a); READ (b); READ (c); READ (d); IF (a = b) + (c < d) THEN WRITE ( error ) Note that relational expressions must be enclosed in parentheses because they have very low precedence. In the above example, a, b, c, and d may be variables of any type. In the above example, the OR operator could have been used instead of +. TheOR operator takes two integer operands and produces the result 0 if both operands evaluate to 0; otherwise, it produces 1. The operator AND evaluates to 1 if both operands are nonzero; otherwise it evaluates to 0. The unary logical operator NOT evaluates to 1 if its argument is zero and to 0 otherwise. Notice that using OR would make the parentheses redundant. Unary Minus NOTHING does not include either a unary minus operator or an optional negative sign on the front of a numeric constant. If you finish your parser early, consider adding a unary minus to the grammar. Of course, it should have highest precedence. 5 An Example Program The following program represents a simple example program written in NOTHING. This program successively reads pairs of integers from the input and prints out their greatest common divisor. PROGRAM example; VAR x, y : INTEGER; FUNCTION gcd (a,b: INTEGER):INTEGER; IF b=0 THEN RETURN a ELSE RETURN gcd(b, a MOD b) ; READ (x); READ (y); WHILE (x <> 0) OR (y <> 0) DO WRITE (gcd (x,y)); READ (x); READ (y) 6