COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

Similar documents
CS 2210 Programming Project (Part I)

Yacc: A Syntactic Analysers Generator

LECTURE 11. Semantic Analysis and Yacc

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

A simple syntax-directed

Using an LALR(1) Parser Generator

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Understanding main() function Input/Output Streams

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Programming Project II

CS 6353 Compiler Construction Project Assignments

Project 2 Interpreter for Snail. 2 The Snail Programming Language

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Programming Assignment II

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Compiler Lab. Introduction to tools Lex and Yacc

TDDD55 - Compilers and Interpreters Lesson 3


COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Typescript on LLVM Language Reference Manual

Syntax-Directed Translation

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Syntactic Analysis. CS345H: Programming Languages. Lecture 3: Lexical Analysis. Outline. Lexical Analysis. What is a Token? Tokens

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Introduction to Lexical Analysis

CS 415 Midterm Exam Spring SOLUTION

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

A Simple Syntax-Directed Translator

CSC 467 Lecture 3: Regular Expressions

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

CS 6353 Compiler Construction Project Assignments

Parsing and Pattern Recognition

Compiler Construction Assignment 3 Spring 2018

TDDD55- Compilers and Interpreters Lesson 3

CS 4201 Compilers 2014/2015 Handout: Lab 1

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Compiler Design. Subject Code: 6CS63/06IS662. Part A UNIT 1. Chapter Introduction. 1.1 Language Processors

CPS 506 Comparative Programming Languages. Syntax Specification

Introduction to Lex & Yacc. (flex & bison)

CST-402(T): Language Processors

CSCI Compiler Design

PL Revision overview

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

UNIT-2 Introduction to C++

Project 1: Scheme Pretty-Printer

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

4. Structure of a C++ program

Semantic Analysis. Compiler Architecture

An Introduction to LEX and YACC. SYSC Programming Languages

Time : 1 Hour Max Marks : 30

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

Gechstudentszone.wordpress.com

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Part 5 Program Analysis Principles and Techniques

L L G E N. Generator of syntax analyzier (parser)

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Parser Tools: lex and yacc-style Parsing

2068 (I) Attempt all questions.

Group A Assignment 3(2)

CS 403: Scanning and Parsing

Introduction to Compiler Design

C++ Basics. Lecture 2 COP 3014 Spring January 8, 2018

Syntax Analysis. Chapter 4

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Introduction to Compiler Construction

I. OVERVIEW 1 II. INTRODUCTION 3 III. OPERATING PROCEDURE 5 IV. PCLEX 10 V. PCYACC 21. Table of Contents

Working of the Compilers

Do not write in this area EC TOTAL. Maximum possible points:

C++ Programming: From Problem Analysis to Program Design, Fourth Edition. Chapter 5: Control Structures II (Repetition)

Lexical Analysis. Introduction

Why Is Repetition Needed?

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Ray Pereda Unicon Technical Report UTR-03. February 25, Abstract

Programming Project 1: Lexical Analyzer (Scanner)

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Parser Tools: lex and yacc-style Parsing

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Compiler construction 2002 week 5

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compilers and Interpreters

1 The Var Shell (vsh)

Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

More Assigned Reading and Exercises on Syntax (for Exam 2)

Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

Transcription:

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015 Purpose This project is intended to give you experience in using a scanner generator (Lex), a parser generator (YACC), writing a syntax specification (grammar) for a language, performing parsing and semantic analysis (attribute grammar), and practicing error handling in a compiler. Summary Your task is to write an interpreter/translator for a simple calculator whose programming language, CALC, contains basic language constructs such as variables and assignment statements. The interpreter/translator will be written using a compiler generator (YACC). Your program should perform the functions of both the interpreter and translator. The program should call the lexical analyzer (Lex) for the next token, parse the token stream, report grammatical errors, perform static and dynamic semantic checks, interpret the program statement-by-statement (including print statements which produce outputs), and translate the CALC program into an equivalent working C++ program called mya.cpp. Lexical Specification The table below defines the tokens that must be recognized, with their associated symbolic names. Comments are enclosed in (*... *), and cannot be nested. An identifier is a sequence of (at least one) upper or lower case letters followed by zero, one or more digits. CALC is case insensitive. There is no limit on the length of identifiers. However, you may impose limits on the total number of distinct identifiers and string lexemes and on the total number of characters in all distinct identifiers and strings taken together. There should be no other limitation on the number of lexemes that the lexical analyzer will process. An integer constant is an unsigned sequence of digits representing a 10-based number. 1

Token Symbolic Name ; SEMInumber ( LPARENnumber integer constant ICONSTnumber begin BEGINnumber program PROGRAMnumber - MINUSnumber TIMESnumber var VARnumber integer INTnumber end of file EOFnumber, COMMAnumber ) RPARENnumber identifier IDnumber end ENDnumber is ISnumber + PLUSnumber div DIVnumber print PRINTnumber = EQnumber Token attributes A unique identification of each token must be returned by the lexical analyzer. In addition, the lexical analyzer must pass extra information for identifier and integer constant tokens to the parser. For identifier and integer constant tokens, the extra information is passed to the parser as a single value of the integer type, through a global variable as described below (yylval.semantic value). For integer constants, the numeric value of the constant is passed. In order to allow other phases of the compiler to access the original identifier lexeme, the lexical analyzer passes an integer uniquely identifying the identifier. The unique identifying number for identifiers should be an index into a string table created by the lexical analyzer to record the lexemes. Implementation The central routine of the scanner is yylex, an integer function that returns a token number, indicating the type (identifier, integer constant, semicolon, etc...), of the next token in the input stream. In addition to the token type, yylex must set the global variables yyline and yycolumn to the line and column number at which that token appears (hint: modify yyline when you see a newline character and modify yycolumn when you see anything else), and in the case of identifiers and constants, put the extra information needed, as described above, into a global integer variable yylval.semantic value. Lex will write yylex for you, using the patterns and rules defined in your lex input file (which should be called lexer.l). Your rules must include the code to maintain yyline, yycolumn and yylval. You are to write a routine Lex error that takes a message and line and column numbers and reports an error, printing the message and indicating the position of the error. You need only print the line and column 2

number to indicate the position. The #define mechanism should be used to allow the lexical analyzer to return token numbers symbolically. In order to avoid using token names that are reserved or significant in C/C++ or in the parser, the token names have been specified for you in the subsequent table. The parser and the lexical analyzer must agree on the token numbers to ensure correct communication between them. The token number can be chosen by you, as the compiler writer, or, by default, by Yacc. Regardless of how token numbers are chosen, the end-marker must have token number 0 or negative, and thus, your lexical analyzer must return a 0 (or a negative) as token number upon reaching the end of input. Your lexical analyzer should recover from all malformed lexemes, as well as such things as string constants that extended across a line boundary or comments that are never terminated. Syntax Specification The syntax of the CALC language is described by a set of syntax diagrams in syntax.pdf. A syntax diagram is a directed graph with one entry and one exit. Each path through the graph defines an allowable sequence of symbols. For example, the structure of a valid CALC program is defined by the first syntax diagram. The occurrence of the name of another diagram such as declaration and compound statement indicates that any sequence of symbols defined by the other diagram may occur at that point.the following is an example valid CALC program. program xyz is begin var a, b; a = 2; b = 3; var c; c = a + b; print c end Your first task in this assignment is to develop a context free grammar for the CALC language from the syntax diagrams. YACC Specification Your second task in this assignment is to express your grammar as a YACC specification. You will want to run your specification through YACC to ensure that the grammar produces no parsing conflicts (compiled with yacc -v). If conflicts are indicated by YACC, you should alter your grammar to eliminate them without changing the language accepted by your grammar, or ensure that YACCs handling of the conflict agrees with the CALC language specification. It is suggested that you first develop a parser that merely prints out ACCEPT 3

for syntactically correct CALC programs and REJECT with error messages for incorrectly structured CALC programs, calling your lexical analyzer for the tokens. After you have guaranteed that your YACC specification is syntactically sound, you will extend your YACC grammar by adding attributes and semantic rules with actions to develop an interpreter/translator for CALC that does various static and dynamic semantics checks, performs the calculation specified in the CALC program, and translates the CALC program into a C++ program. Besides reporting grammatical errors, the interpreter/translator also performs the following semantics checks: Duplicated declaration when a variable is declared multiple times. Undeclared variable when a variable is used in the program, but not declared or before it is declared. Uninitialized variable when a variable is referenced before initialized. Divided by 0 error when the denominator in a division is 0. The program can exit after reporting one semantic/syntax error. If there is no error, the program should (1) execute each statement and output the result of the expression in each print statement (interpreter function), and (2) produce a semantically equivalent C++ program called mya.cpp with equivalent expressions/assignments (translator function). To facilitate semantic checks and program interpretation and translation, a symbol table must be created to store variables and the related information. The symbol table will need to have at least three fields: the name of the variable, the value of the variable (integer type only in CALC), and a flag indicating whether the variable has been initialized. When the parser encounters an identifier in the declaration,the identifier must be inserted into the symbol table. When the parser encounters an identifier in an expression, it must look up the symbol table to check if the variable has been initialized and obtain its value (if initialized). After the parser processes an assignment statement, the value of the variable in the left hand side of the assignment statement must be updated in the symbol table. The generated mya.cpp file should have the same number of statements as the source program. Each statement in mya.cpp should have the same expression (may differ only in notation) as the original program. For the example CALC program described, the corresponding C++ program would be similar to the following: #include <iostream> using namespace std; main() { int a, b; a = 2; b = 3; int c; c = a + b; cout << c << "\n"; } 4

Interpreting and translating CALC with YACC is relatively simple. Basically, for every assignment statement, the program first evaluates the value of the expression in the right hand side of the statement. This is similar to the cal trans example we gave in class. After that, the interpreter updates the value of the variable in the left hand side of the statement. When processing a print statement, the interpreter evaluates the value of the expression and prints the result to the standard output. To facilitate translation, each expression can have an string attribute that stores the string representation of the C++ expression. Error Recovery Your parser should print appropriate error messages. You do not have to implement any error recovery. Your program should have similar behavior as the sample executable provided. Assignment Submission Submissions are due on March 4, 11:59pm, when you should submit all related files to the Blackboard (including a makefile to produce the executable for the project). Recognize correct programs and detect incorrect programs (50). Semantic checks and error reporting (20). Correct calculation (10). Correct translation (15). Correct translation with minimal parentheses in expressions (5). Extra 10 points for generating correct dynamic divide-by-zero checking code in the target C++ program. Extra 3 points for the person who first reports a new error in the sample executable. Your program will be tested with a series of programs. Some of the testing programs are provided in the project package. If the code cannot handle any input file, 0 points will be given (programs that cannot produce the executable with a make command due to whatever error such as wrong makefile, compiling errors, and missing files automatically get 0 points). If your code can process some programs in the test suite, the grade will be assigned based on the number of programs that your code processes correctly. 5