DEMO A Language for Practice Implementation Comp 506, Spring 2018

Similar documents
CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

1 Lexical Considerations

Lexical Considerations

Lexical Considerations

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

EECS 6083 Intro to Parsing Context Free Grammars

This book is licensed under a Creative Commons Attribution 3.0 License

Part 5 Program Analysis Principles and Techniques

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

The SPL Programming Language Reference Manual

Syntax. A. Bellaachia Page: 1

Parsing II Top-down parsing. Comp 412

Lexical Analysis (ASU Ch 3, Fig 3.1)

The ILOC Simulator User Documentation

Chapter 3. Describing Syntax and Semantics

The PCAT Programming Language Reference Manual

Habanero Extreme Scale Software Research Project

The ILOC Simulator User Documentation

Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Decaf Language Reference

The ILOC Simulator User Documentation

FRAC: Language Reference Manual

The Decaf Language. 1 Lexical considerations

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

Building a Parser Part III

Typescript on LLVM Language Reference Manual

Objectives. Chapter 2: Basic Elements of C++ Introduction. Objectives (cont d.) A C++ Program (cont d.) A C++ Program

JME Language Reference Manual

Chapter 2: Basic Elements of C++

Chapter 2: Basic Elements of C++ Objectives. Objectives (cont d.) A C++ Program. Introduction

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

Full file at C How to Program, 6/e Multiple Choice Test Bank

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

IC Language Specification

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Where We Are. CMSC 330: Organization of Programming Languages. This Lecture. Programming Languages. Motivation for Grammars

Decaf Language Reference Manual

Principles of Programming Languages COMP251: Syntax and Grammars

3. DESCRIBING SYNTAX AND SEMANTICS

A Simple Syntax-Directed Translator

The Decaf language 1

CMSC 330: Organization of Programming Languages

Syntax Analysis Check syntax and construct abstract syntax tree

CPS 506 Comparative Programming Languages. Syntax Specification

Computer Programming : C++

CSC 467 Lecture 3: Regular Expressions

CMSC 330: Organization of Programming Languages

Handling Assignment Comp 412

TML Language Reference Manual

Template Language and Syntax Reference

Programming Languages Third Edition

Project 2 Interpreter for Snail. 2 The Snail Programming Language

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CMSC 330: Organization of Programming Languages. Context Free Grammars

Describing Syntax and Semantics

CS415 Compilers. Lexical Analysis

UNIT I Programming Language Syntax and semantics. Kainjan Sanghavi

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CMSC 330: Organization of Programming Languages. Architecture of Compilers, Interpreters

Syntax Analysis Part IV

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CMPS Programming Languages. Dr. Chengwei Lei CEECS California State University, Bakersfield

2.2 Syntax Definition

Fundamental of Programming (C)

CMSC 330: Organization of Programming Languages

SMURF Language Reference Manual Serial MUsic Represented as Functions

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

CS143 Handout 03 Summer 2012 June 27, 2012 Decaf Specification

ASML Language Reference Manual

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

Reference Grammar Meta-notation: hfooi means foo is a nonterminal. foo (in bold font) means that foo is a terminal i.e., a token or a part of a token.

Chapter 3. Describing Syntax and Semantics ISBN

Object oriented programming. Instructor: Masoud Asghari Web page: Ch: 3

Architecture of Compilers, Interpreters. CMSC 330: Organization of Programming Languages. Front End Scanner and Parser. Implementing the Front End

UNIT - I. Introduction to C Programming. BY A. Vijay Bharath

Compiler Techniques MN1 The nano-c Language

Syntax. In Text: Chapter 3

2 nd Week Lecture Notes

Chapter 2 Basic Elements of C++

Introduction to Parsing

Chapter 3. Describing Syntax and Semantics ISBN

CMSC 330: Organization of Programming Languages. Context Free Grammars

CSE 401 Midterm Exam Sample Solution 2/11/15

Language Reference Manual

Time : 1 Hour Max Marks : 30

Programming for Engineers Introduction to C

Full file at

Objectives. In this chapter, you will:

Defining syntax using CFGs

Parsing Part II. (Ambiguity, Top-down parsing, Left-recursion Removal)

Lexical Analysis. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

ECE 2400 Computer Systems Programming Fall 2018 Topic 1: Introduction to C

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

Syntax Intro and Overview. Syntax

A Short Summary of Javali

IT 374 C# and Applications/ IT695 C# Data Structures

Transcription:

DEMO A Language for Practice Implementation Comp 506, Spring 2018 1 Purpose This document describes the Demo programming language. Demo was invented for instructional purposes; it has no real use aside from programming assignments in a compilerconstruction course. In different years, we expect to use Demo in different ways; thus, this document is separate from the handout that describes your assignment, and the document contains information that may seem irrelevant to your current assignment. It is, however, the definitive definition of Demo for your current assignment. 2 The DEMO Language Demo is a simple, Algol-like language. It is not intended as a replacement for any known programming language. In fact, by design it has few of the features that programmers find useful in writing large programs. It is intended to be simple enough to implement in a single semester, but powerful enough to illustrate many of the common features of Algollike languages. It avoids many complications; each of its features was added to illustrate a specific problem that arises in the design and implementation of a compiler. Demo supports two data types: integer and character. Each of these types can be aggregated into vectors and arrays. You can assume that the underlying hardware supports integers with 32 bit, twos-complement arithmetic, and that four characters fit into a word. The type system is trivial; coercions from integer to character and back are not allowed. Demo supports a number of simple control structures: a compound statement, while and for loops, and an if-then-else construct. Limited input and output primitives are provided. At the moment, procedure calls are not defined in the language, although they are likely to be added in the future. Thus, a Demo program is written as a single function.

2.1 Syntax The following the context-free grammar describes the syntax of Demo. It is written in a modified Backus-Naur Form [1, p. 87]. Nonterminal symbols are typeset in slanted roman text, while TERMINAL symbols are underlined and typeset in CAPITAL LETTERS. Procedure PROCEDURE NAME { Decls Stmts Bool NOT OrTerm OrTerm Decls Decls Decl ; Decl Type Decl ; Type SpecList INT CHAR SpecList SpecList, Spec Spec Spec NAME NAME [ Bounds ] OrTerm OrTerm OR AndTerm AndTerm AndTerm AndTerm AND RelExpr RelExpr RelExpr RelExpr LT Expr RelExpr LE Expr RelExpr EQ Expr RelExpr NE Expr RelExpr GE Expr RelExpr GT Expr Expr Bounds Bound Bounds, Bound Bound NUMBER : NUMBER Expr Expr + Term Expr Term Term Stmts Stmts Stmt Stmt Term Term * Factor Term / Factor Factor Stmt Reference = Expr ; { Stmts WHILE ( Bool ) { Stmts FOR NAME = Expr TO Expr BY Expr { Stmts IF ( Bool ) THEN Stmt IF ( Bool ) THEN Stmt ELSE Stmt Factor ( Expr ) Reference NUMBER CHARCONST Reference NAME Exprs NAME [ Exprs ] Expr, Exprs Expr READ Reference ; WRITE Expr ; This grammar is ambiguous. You will need to transform the grammar to make it suitable for the parsing technique that you use. You should consult reference materials for an explanation of how to transform a grammar for either ll(1) or lr(1) parser generators [1, Ch.3]. DEMO 2 Spring 2018

2.2 Microsyntax The following table specifies the spelling of the TERMINAL SYMBOLS in Demo, except for NAME, NUMBER, and CHARCONST. Terminal Spelling AND and BY by CHAR char ELSE else FOR for IF if INT int NOT not OR or PROCEDURE procedure READ read THEN then TO to WHILE while WRITE write Reserved Words Additional Notes Terminal Spelling + + - * * / / LT < LE <= EQ == NE!= GT > GE >= Operators Terminal Spelling : : ; ;,, = = { { [ [ ] ] ( ( ) ) Punctuation 1. NAME is specified by the regular expression Letter (Letter Digit), where Letter is any uppercase or lowercase English letter, from a to z and Digit is a digit between 0 and 9. In lex or flex, we could write the rule for NAME as [A-Za-z][A-Za-z0-9] 2. NUMBER is specified as the positive closure of Digit. In lex of flex notation, we could write NUMBER as [0-9] +, or [0-9][0-9]. 3. CHARCONST contains a single character, surrounded by single quotes. Any valid ascii character can appear between the quotes. Examples might be a, b, 6, and?. Since neither the forward single quote,, nor the backward single quote,, have any other use in Demo, the scanner should treat them as interchangeable. 4. In Demo blanks are significant. Thus, andor is a valid NAME, while and or contains the terminals AND and OR. Reserved words, as shown in the table, cannot be used for any other purpose. 5. Thecharacters // denote thestart ofacomment thatruns until theendofthecurrent input line. 6. Note that Demo inherits the unfortunate spelling of the boolean equality operator (EQ) from c. EQ is spelled ==. DEMO 3 Spring 2018

2.3 The Meaning of a DEMO Program Program Structure A Demo program consists of a header, followed by a set of declarations, followed by a set of executable statements. The header consists of the PROCEDURE keyword, followed by a procedure name. The declarations and executable statements are enclosed in curly braces ( { and ). All statements, whether declaration or executable, end with a semicolon ( ; ). In the current incarnation of Demo, a program consists of a single procedure. Demo has no provision for either multiple procedures or procedure calls. 1 Declarations Demo supports two data types: integer and character. Integers occupy a single word in the iloc virtual machine, while characters occupy 1 of a word. Either integers 4 or characters can be aggregated into arrays. Thus, an NAME may represent (1) an integer variable, (2) an integer array, (3) a character variable 2, or (4) a character array. NAMEs must be declared in a Decl statement. A given NAME can only occur in one Decl statement. For example, an integer array A, with dimensions of [1,10] and [2,8] would be declared as int A[1:10,2:8]; Demo is case sensitive; that is, a and A are distinct NAMEs. Given the declaration shown above, A cannot appear in another declaration, while a would be legal. Example int a, b[1:10], c[1:10,1:100]; char d, e[1:10], f[1:10,1:100]; declares three integer variables, a, b, and c, as well as three character variables, d, e, and f. a and d are scalar values. b and e are both vectors with ten elements, starting at index 1. c and f are both two-dimensional arrays, with the given bounds. Storage Assignment Because the current version of Demo only supports a single procedure, a Demo compiler has unusual freedom in storage layout. All variables have the same lifetime as the current procedure they are local variables in the sense that term is used in Algol-like languages. Scalarvariables canbeassigned either alocationindatamemory orinavirtual register. Aggregates (e.g., vectors and multi-dimensional arrays) must be assigned locations in data memory. The Iloc virtual machine has separate data and program memory. It requires that integer references be word-aligned, but places no other restrictions on the use of data memory. Executable Statements Demo supports an assignment statement, a compound construct, three control structures, and two statements for handling input and output (I/O). 1 As experience grows with comp 506, we may add provisions for multiple procedures and procedure calls. 2 A character variable holds exactly one character DEMO 4 Spring 2018

Assignment The assignment statement requires that its left-hand side and its right-hand side evaluate to the same type. Demo does not support coercions, so a character LHS requires a character RHS and an integer LHS requires an integer LHS. Similarly, the RHSmust evaluate to a single value; if the LHS refers to an aggregate (vector or array), it must evaluate to a single element of that aggregate. Compound Statement The compound statement consists of a list of statements surrounded by curly braces, { and. From the perspective of any surrounding control-flow construct, a compound statement is treated as a single statement; that is, if any statement in the compound statement executes, they all execute in an order equivalent to the order given in the source program. The grammar allows compound statements to be nested, as in { a = 0; { b = 0; { c = 0; { d = 0; e = 1; None of the blanks in this example compound statement are required. The blanks are, however, legal and were added to improve readability. Demo does not allow an empty statement or statement list. Thus, any of x = 1; ; // second semicolon is anempty statement { // empty statement list { ; // empty statement in a list should produce a syntax error. Control Structures Demo supports three control-flow constructs: a for loop, a while loop, and an if-then-else construct. Boolean Values The while loop and the if-then-else require Boolean expressions, a Bool in the grammar. In Demo, a Boolean expression is evaluated as an integer expression, with the value zero representing false and any non-zero value representing true. For Loop The for loop is a standard iterative loop, with fairly typical semantics. The loop header reads as: for iv = ex 1 to ex 2 by ex 3 The loop s index variable, iv takes on a series of values from ex 1 to ex 2 in increments of ex 3. The index variable must be an integer, and all of ex 1, ex 2, and ex 3 must evaluate to integer values. If ex 1 > ex 2, the statement list under loop s control does not execute. While Loop The while loop provides a simple mechanism for iteration. It executes the statement list under its control until the controlling expression evaluates to false (see the earlier discussion of Boolean values). The order of execution of a while loop is always: (1) evaluate the controlling expression; (2) if the result is true, then execute the controlled statement and go back to step (1). DEMO 5 Spring 2018

If-Then-Else The if-then-else construct allows conditional control of a statement. An if-then-else first evaluates the controlling expression to a Boolean value (see the earlier discussion of Boolean values). If the result is true, the code executes the statement under the then clause. If the result is false and the then clause has a matching else clause, the statement under the else clause executes. The grammar for the if-then-else construct is ambiguous. It embodies the classic example of an ambiguous grammar. You should rewrite that portion of the grammar to resolve the dangling else ambiguity so that each else clause is matched with the most recent unmatched then clause. Input/Output Demo provides two simple I/O statements: Read The read statement reads a single data item from the system s standard input stream (stdin in C terminology). The type of the value read is determined by the type of the Reference. Each value read from stdin must be separated from surrounding values by either blanks or a newline. Write The write statement writes a single data item to the system s standard output stream (stdout in C terminology). The type of the value written is determined by the type of the Expr. Each value printed with a write statement appears on its own line; that is, the value is printed, followed by a newline. Expression Evaluation Demo expressions compute simple values of type integer or character. Since Demo currently has no character operators, the only meaningful character expression is a single literal character, a CHARCONST. No coercions are allowed, so only integer subexpressions can occur in an integer expression. In Demo, all operators are left associative. The precedence of operators is specified by the following table, where larger numbers mean a higher priority. Operator Precedence ( ) 6 *, / 5 +, - 4 <, <=, ==, >, >=,!= 3 not 2 and, or 1 The relational operators, <, <=, ==, >, >=, and!= can only be applied to operands of the same type. Thus, both a <= b and 13 > 2 are legal, while a!= 1 is not. The latter example exhibits a type mismatch an error. The other operators, +, -, *, /, not, and, or can only be applied to operands of type integer. The use of integer values to represent Boolean values creates some unusual behavior. For example, (a == b) or (c < d) is equivalent to (a == b) + (c < d), although the parentheses are required in the expression that uses + and are unneeded in the expression that uses or. DEMO 6 Spring 2018

3 Example Program // Example program in DEMO // A simple bubblesort // (Assumes data initialized by simulator) procedure bsort { int DATA[0:10000]; int i, upper, flag, temp; read upper; // # elements to sort if (upper > 10000) then { upper = 0; // a primitive quit write 1; // a primitive message for i = 0 to upper - 1 by 1 { read DATA[i]; flag = 1; while( flag == 1) { flag = 0; for i = 0 to upper - 2 by 1 { if (DATA[i] > DATA[i+1]) then { flag = 1; temp = DATA[i]; DATA[i] = DATA[i+1]; DATA[i+1] = temp; for i = 0 to upper - 1 by 1 { write DATA[i]; References [1] Keith D. Cooper and Linda Torczon. Engineering a Compiler, 2 nd Edition. Elsevier, Morgan Kaufmann, 2012. DEMO 7 Spring 2018