CS 6353 Compiler Construction Project Assignments

Similar documents
CS 6353 Compiler Construction Project Assignments

1 Lexical Considerations

Lexical Considerations

The PCAT Programming Language Reference Manual

Lexical Considerations

Midterm II CS164, Spring 2006

CS 2210 Programming Project (Part IV)

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

CA Compiler Construction

Project 1: Scheme Pretty-Printer

Time : 1 Hour Max Marks : 30

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Project Compiler. CS031 TA Help Session November 28, 2011

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Language Reference Manual simplicity

CS 415 Midterm Exam Spring 2002

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Decaf Language Reference Manual

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Decaf Language Reference

CS 415 Midterm Exam Spring SOLUTION

Compilers Project 3: Semantic Analyzer

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Languages and Compilers

CS 541 Spring Programming Assignment 2 CSX Scanner

Intermediate Code Generation

Semantic actions for declarations and expressions

1. Consider the following program in a PCAT-like language.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Semantic actions for declarations and expressions. Monday, September 28, 15

Semantic actions for expressions

Types and Static Type Checking (Introducing Micro-Haskell)

Types and Static Type Checking (Introducing Micro-Haskell)

MP 3 A Lexer for MiniJava

Semantics of programming languages

Programming Assignment II

CS 231 Data Structures and Algorithms, Fall 2016

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Full file at

COMP 202 Java in one week

COMPILERS BASIC COMPILER FUNCTIONS

1 The Var Shell (vsh)

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Semantic actions for declarations and expressions

Syntax-Directed Translation

Implementing a VSOP Compiler. March 20, 2018

CS 211 Programming Practicum Fall 2018

Stating the obvious, people and computers do not speak the same language.

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

IC Language Specification

YOLOP Language Reference Manual

Project 3 Due October 21, 2015, 11:59:59pm

Project 2 Interpreter for Snail. 2 The Snail Programming Language

CSCI312 Principles of Programming Languages!

// the current object. functioninvocation expression. identifier (expressionlist ) // call of an inner function

Examination in Compilers, EDAN65

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

The SPL Programming Language Reference Manual

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Compilers. Compiler Construction Tutorial The Front-end

CS /534 Compiler Construction University of Massachusetts Lowell

Parser Design. Neil Mitchell. June 25, 2004

Part 5 Program Analysis Principles and Techniques

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

A Simple Syntax-Directed Translator

The role of semantic analysis in a compiler

CSE 340 Fall 2014 Project 4

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Programming Assignment IV

LECTURE 3. Compiler Phases

Formal Languages and Compilers Lecture VI: Lexical Analysis

Pace University. Fundamental Concepts of CS121 1

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Implementing a VSOP Compiler. March 12, 2018

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

CS 415 Midterm Exam Fall 2003

University of Arizona, Department of Computer Science. CSc 453 Assignment 5 Due 23:59, Dec points. Christian Collberg November 19, 2002

Important Project Dates

CS4215 Programming Language Implementation

CSc 520 final exam Wednesday 13 December 2000 TIME = 2 hours

CS 1044 Program 6 Summer I dimension ??????

Chapter 4 Defining Classes I

Objectives. Chapter 4: Control Structures I (Selection) Objectives (cont d.) Control Structures. Control Structures (cont d.) Relational Operators

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

Yacc: A Syntactic Analysers Generator

Section 2: Introduction to Java. Historical note

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Summary: Direct Code Generation

ML 4 A Lexer for OCaml s Type System

Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

Programming Assignment III

Do not write in this area EC TOTAL. Maximum possible points:

The Decaf Language. 1 Lexical considerations

CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Transcription:

CS 6353 Compiler Construction Project Assignments In this project, you need to implement a compiler for a language defined in this handout. The programming language you need to use is C or C++ (and the language defined by the corresponding tools). The project includes three phases, lexical analysis, syntax analysis, and code generation. In the following, we first define the language syntax and tokens. The definitions are given in BNF form. After the language definition, the three phases of the project are specified. 1 Language Definitions 1.1 Syntax Definitions program ::= { Program program-name function-definitions { Main statements } } program-name ::= identifier function-definitions ::= (function-definition)* function-definition ::= { Function function-name arguments statements return return-arg } function-name ::= identifier arguments ::= (argument)* argument ::= identifier return-arg ::= identifier statements ::= (statement)+ statement ::= assignment-stmt function-call if-stmt while-stmt assignment-stmt ::= { = identifier parameter } function-call ::= { function-name parameters } { predefined-function parameters } predefined-function ::= + * / % read print parameters ::= (parameter)* parameter ::= function-call identifier number character-string Boolean number ::= integer float if-stmt ::= { if expression then statements else statements } while-stmt ::= { while expression do statements } expression ::= { comparison-operator parameter parameter } { Boolean-operator expression expression } Boolean comparison-operator ::= == > >= Boolean-operator ::= or and To make the later references easier, let s call this language the FP language (similar to functional language syntax, but has the procedural language characteristics).

1.2 Definitions of Some Tokens An identifier has the form [a-z, A-Z][a-z, A-Z, 0-9]*, but its length has to be between 1 to 6 characters. Confining the number of characters in an identifier is not a good language design feature, but this is for you to learn a new feature in lex. An integer can have a negative sign ( ) but no positive sign. It can be with or without space(s) between the sign and the digits. If the value is 0, only a single digit 0 is allowed. Otherwise, it is the same as common integer definitions (no leading 0 s). A float has to have the decimal point and at least one digit on each side. The left side of the decimal point follows the rule for integers. The right side of the decimal point can be any number of digits but should have at least one digit. A character string is enclosed within (). It can be [a-z, A-Z, 0-9,, \]+. For example, (I like lex and yacc) is a character string. We use \ to represent a new line, i.e., \n in C. The character strings are mainly used in the read and print functions. The space is simply ( ). Some functions may use character strings as parameters. A Boolean can only be T or F, where T represents true and F represents false. 1.3 Definitions of the Predefined Functions Each of the predefined comparison operations and Boolean operators takes two parameters and the corresponding functionality follows the conventional definition. The predefined functions +, *,, /, and % also have the conventional meanings. +, *,, and / can take two or more parameters. For example, {* A B C} implies A * B * C. Similarly, and / can also take two or more parameters. For example, {/ A B C} implies A / B / C and { A B C} implies A B C. The % function can only take two parameters and {% A B} means A % B (A mod B). The predefine function print takes one or more parameters. It simply prints the value of each parameter to the standard output. Whenever a character string (\) is encountered, a new line should be printed. The space character ( ) should not be skipped and should be specified properly in lex token definitions. The predefine function read also takes one or more parameters. It reads from the standard input the values for the parameters. Since there will be no type definition in advance, the input can be integer, float, string, or Boolean values. Each value for the parameter to be read is separated by a white space. Although there are functions that take a fixed number of parameters, it is not your job to check the number of parameters (not in the lexical and syntax analysis phases). Also, it is not necessary to check whether the number of arguments and number of parameters do match (not in the lexical and syntax analysis phases). You only need to follow the general rules defined in the syntax of the language.

1.4 White Space To ensure the correctness in processing the input FP-program, you need to consider white space as well. White space characters include space, tab, and new-line. Treat the white space the same way as in other high level languages such as C++ and Java (skip white space). Note that space in a character string should not be skipped. 1.5 Some Remarks Note that we do not consider the language typing rules. Thus, there is no need to predefine variables and your FP program does not need to handle mismatched types. Also note that he FP language is case sensitive. 2 First Project -- Lexical Analysis Using lex Define the tokens in the FP language in lex definitions and feed it to lex to generate a scanner (lexer). Then, use the sample program as well as other testing programs written in FP to test your scanner. Note that it is your responsibility to convert the BNF (or verbal) definitions to lex definitions. It is also your responsibility to insert appropriate statements (actions) in your lex definition file so that the lexer (created by lex) will generate a symbol table and print the output token names appropriately. You should design the symbol table such that it contains the necessary fields for this as well as the future projects (you can make the table easy to expand for future fields). One additional task in this project is not an ordinary compiler compiler task, but is designed to let you explore more features in lex. For five of the predefined-functions, +,, *, /, and %, if their parameters are all number s, then you need to evaluate the function and let the returned token be one number (integer or float) and let the numerical result be the corresponding token value. You can consider this as a simple optimization. For example, if you are processing a substring {+ 3 {* 5 2 7}}, then you need to return tokens {, +, integer, integer, and } (the second integer has a value 70), not {, +, integer, {, *, integer, integer, integer, }, and }, not just integer either. Note that you are required to define regular expressions to achieve this (with special lex features). Finally, you also need to handle errors. That means you need to capture wrong tokens instead of letting lex capture them. Your program should continue even if there are wrong tokens. You may not know where to break for an incorrect token, so you can proceed to white space after discovering an incorrect token and resume token recognition after the white space. You need design your token definitions to fulfill this capability. The token symbol for an incorrect token is error an the token value is the text you captured for the incorrect token. Here defines the output from your program. We cannot do any processing at this stage. So you need to output the token names. Print one token on each line, in the correct order. The names of the tokens should conform to those given below. Keyword: Program, Function, return, if, then, else, while, and do. Predefined function: =, +,, *, /, %, read, and, print. Special symbol: { and }. Comparison operator: ==, >, and >=. Boolean operator: or and and

The remainder tokens: identifier, integer, float, character-string, Boolean. Note that for each token, you should output the token names as well as the values you obtained (in the same line). If you do not keep track of the original values, such as the identifier s name, the integer s value, the actual character string, etc., then, the information will be lost. The symbol table should be printed as well. You need to print out the content of the symbol table, one entry on a line. (In this phase, you only have the identifier names). The standard platform for this project is the university Solaris systems, including the servers and workstations such as apache. You should make sure that your program runs on these platforms. You need to electronically submit your program before midnight of the due date (check the calendar for due dates). Follow the instruction below for submission. Login to one of the university Solaris platforms. Go to the directory that only contains: The lex definition file for FP language. The file name should be FP.l. A file containing a sample program written in FP language that you have tested with the scanner generated from your lex definition file. The file name should be sample.fp. If you have multiple sample files, you can name them sample1.fp, sample2.fp, etc. A readme file that explains how to interpret your output from lex (the list of tokens). Also, if you have some information that you would like the TA to know, you can also put it in this file. The optinal Design.doc file that contains the description of some special features of your project that is not specified in the project specification, including all problems your program may have and/or all additional features you implemented. Issue the command ~ilyen/handin/handin.compiler. This command has to be issued in the directory that contains the files listed above. After submitting your project, do not delete or modify your code. Just in case something is wrong, you will have a chance to resubmit your program files with the original dates on them. You are responsible for generating your own testing programs and test your project thoroughly. We will use a different set of FP programs for testing. You need to consider the incorrect cases carefully as well. If your program does not fully function, please try to have partial results printed clearly to obtain partial credits. If your program does not work or it does not provide any proper output, then there will be no partial credits given. If your program does work fully or partially but we cannot make it run properly, then it is your responsibility to make an appointment with the TA and come to the department to demonstrate your program. According to the problems, appropriate point deductions will apply in these situations. If you do not follow the instructions for naming, you are creating trouble for testing, and there will be some penalty for that as well. 3 Second Phase -- Syntax Analysis and Type Using yacc For the second project, you need to define the FP-language in yacc definitions and feed it to yacc to generate a parser. Your definitions should follow the BNF given earlier, but exclude the token

definitions which have already been processed by lex and instead use the token names. The goal of the program is to generate a parse tree for the input FP program. In addition, you need to assign data types to identifiers and perform type checking. Note that FP language is typeless. So, in this project, you also learn how to convert a typeless language to a typed one (but our practice is a simple one). A change is made for the FP language definition. Specifically, the predefined function read is split into two functions, read-int and read-float, which reads integers and floating numbers, respectively. They still take one or more parameters. There will be no read functions that read character strings or Boolean values. predefined-function ::= + * / % read-int read-float print You need to modify your lex definition file to handle the two new read functions. Also, please note that in the lexical analysis in Project 1, you may have processed some tokens too early and it may cause incorrect processing of the input program. You may have to revise the lex definition file and push some of the processing to be done at the yacc level. It is your responsibility to find the problems and make the changes. 3.1 During Parsing In this section, we discuss what you need to do during parsing. First of all, you need to generate a parse tree during parsing. Each node in your parse tree should be labeled by the symbol of the node defined in the BNF of the FP language (e.g., +, Program, argument, return-arg, statement, statements, if-stmt, etc.) and has the value which is the text string grabbed from lex. For each identifier, instead of giving the actual identifier name, the value should be the pointer to the symbol table (the index of the symbol table). You have created a symbol table in the first project. Now you need to add more information into it. During parsing, you need to assign identifier usage types to each identifier. The usage type of an identifier is how the identifier has been used in the FP program. The identifier usage types include: program-name function-name argument return-arg assignment-id: the identifier in the assignment statement and is to be assigned a value expression-id: appear after comparison operators or Boolean operators parameter: all other parameters besides those listed above An identifier may be used in multiple ways in different parts of the program. You need to list all its usage types. The usage type list will be as long as the number of occurrences of the identifier in the program. In your symbol table, if an identifier is labeled as a function-name (usage type), then you need to maintain a list of callers and a list of callees. When processing the function-call statement (not for the pre-defined functions), you need to find out who is the caller. You can either check the parsing stack provided by yacc (with special label to the function name) or push additional pointers into the stack to link to the caller. You are NOT allowed to use a global variable to keep track of the current function under processing, even though it is actually the most efficient method (since the goal is for you to understand how to manipulate the yacc stack). Once the caller and callee are identified, you add them

into the caller and callee lists in the corresponding symbol table entries. Note that both the caller and callee lists should contain the indices (or the pointers) to the symbol table, not the function names. You will need these pointers later. Also note that the identifier with type program-name should have a callee list. Again, caller and callee lists will not contain the pre-defined functions. The identifier usage type list and caller and callee lists assignments have to be done during parsing (coded in the action part of the production rules). Thus, the order of these lists should follow the order of parsing. To show your processing steps during parsing, each time you change the symbol table, you need to print the entries that are being changed. Please reference the output section regarding how to print out a symbol table entry. 3.2 After Parsing, Data Type Assignment and Type Checking Once you obtain the parse tree, you need to perform type checking and assign a data type to each identifier (not for the program-name and the function-name) by traversing the parse tree. Since there is no data type definition in the FP program, you need to do extra work to identify the data types of the identifiers. When you process a read function call, data type can be assigned to the parameters according to the type of the read function (the parameters of read-int and read-float are integers and floats, respectively). When you encounter an integer, a float, a character-string, or a Boolean token, the token name automatically becomes its data type (this means you need to store temporary data types for some nodes that are not identifiers). In an assignment statement, the assignment-id (identifier) has the type of the corresponding parameter in the statement. The result of the predefined function % is an integer. The predefined functions +,, *, and / can only be applied to integers and floats. If the parameters of the function +,, *, or / include a float, then the result of the function is a float. If all the parameters of the function +,, *, or / are integers, then the result of the function is an integer. When you encounter a comparison-operator or a Boolean-operator, the result is Boolean. When you encounter a function-call statement (not the pre-defined functions), you should have already obtained the types of the parameters of the function. You should then go to the callee function, identify its corresponding arguments and assign the data types of the identifiers for the arguments as the types of the actual parameters. You also need to perform type checking for the input FP program. +,, *, and / can only be applied to integers and floats. % can only be applied to integers. Comparison operators > and >= can be applied to integers, floats, and character strings. Comparison operator == can be applied to any data type (integer, float, character string, and Boolean). Boolean operators can only be applied to Boolean. An expression has to be of type Boolean. Generally, in semantic analysis, type checking and semantic rule checking are performed. In this project, you do not need to check the semantics of the input FP program. You can consider the following semantic rules and assume that they are always satisfied in the input FP program. A function can only be defined once. For each function call, the corresponding function has to be defined in the program.

The number of actual parameters should be the same as the number of formal parameters. Each variable has to be defined before it is used. Note that there is no scoping in our FP program. Every identifier is global in the program. Thus, each identifier can only have one data type. When an identifier is assigned to two or more inconsistent data types, an error should be reported. Note that a subtype and a super-type are considered consistent. In our case, integer is a subtype of float and it is the only subtype and super-type relation we have in the language. When you find data type errors, you need to report the error by printing an error message together with the statement that is wrong, the identifiers that do not have the correct data types, and their actual (incorrect) types. It is very important that you perform data type assignments in a proper order. Otherwise, you may have to do the assignments in many rounds. When you process a function definition, you need to make sure that the data types of all its arguments must have already been assigned. If a function F was processed (have data type assignment done) using subtypes for some of its arguments and then later super-types are assigned to those arguments, then F needs to be processed again to correct its data type definitions. This correction may further impact the callers of F because its returned data type may change. Also, during processing, a caller of F may not know the data type of the returned argument till it is at least processed once. You should have your own design to minimize the number of rounds for traversing the tree. For example, you may want to start from the main body of the program. Also, you may want to use the caller and callee lists to help determine the processing order of the function definitions. You need to wait till all data type assignments for the arguments of a function to be fully determined (no way to change further) before proceeding to process the function (but watch out for recursive calls). This design is very important and should be discussed in your design document. In our FP program, it is possible that a function is a parameter for another function call. For example, assume that in the function definition for f1 we have: { Function f1 x y { } { f2 { f3 a b } c } { } return }. In functional languages, functions are considered as the first class objects and can be used just as any other objects. In procedural languages, this can be considered as passing a value that is evaluated in advance before the invocation. Here, we will consider the semantics of the case in the pure procedural language view. In other words, { f2 { f3 a b } c } can be viewed as: { = t1 { f3 a b } } { f2 t1 c }, where t1 is a temporary variable, not exactly existing. Thus, f2 and f3 do not have caller-callee relationship. They are callees of f1 and f1 is a caller for both. During type assignment, you have to conceptually convert the function parameter to a separate assignment statement and process them according to your design. When you perform data type assignment for a function definition, print out a message stating the function you start to work on in the beginning of the process and print out another message stating the function you have finished the data type assignments after the function definition is done. If a function definition is processed more than once, then print the message each time it is processed. If for an identifier, a new data type assignment is done or the existing data type assignment is changed, then print out the symbol table entry after the assignment. Please reference the output section regarding how to print out a symbol table entry.

If a function gets called several times in the program and the corresponding argument types are inconsistent, then you need to report errors (note that subtype and supertype are consistent). If there exist functions in the program that were never called, then you are required to prune the parse tree by removing the function definition subtrees for the unused functions from the parse tree of the program. After removing a subtree of a function definition, you need to print out a message to the standard output stating which function definitions have been removed. 3.3 Output At the end (best is in the main program), after all processings, you need to print out two things: the symbol table and the parse tree. The parse tree should be printed to a file named parsetree.out. The nodes of the parse tree should be printed in the in-fix order with indentations. Each node of the tree should be printed on one line. For each node, you need to print the node label as well as its value (for an identifier, you should only print the index to the symbol table). After you print a tree node, its child nodes should be printed following it with indentation and the indentation should be 2 blank spaces from the parent starting column. Note that your final parse tree should not include the function definitions that were removed (because the functions are unused). Your symbol table should be printed to a file named symboltable.out. For each entry, first print the index of the entry, the identifier name, and the data type if applicable (not for function-name or program-name), in one line. In case the identifier is a function-name, then in the next two lines print its callee and caller lists, with indentation. In case the identifier is a program-name, then in the next line print its callee list, with indentation. Otherwise, print the list of identifier usage types in the next line with indentation. When you print an entry in the symbol table during parsing or during processing, print to the standard output, not to the file symboltable.out. The printing format should be the same as stated above. When you print error messages, please separate the lines with two marking lines: ========== ERROR!!! (you can put 10-30 equal signs) and print out the error message in between the two marking lines. This is to let the tester (most likely our TA) and yourself to see the error messages distinctively. 3.4 Submission The standard platform for this project is the university Sun systems, including the servers and workstations such as apache. You should make sure that your program runs on these platforms. You can use the sample program provided by TA to test your parser. But you definitely need to create other programs to test your parser. Note that it is your responsibility to thoroughly test your program. You need to electronically submit your program before midnight of the due date (check the web page for due dates). Follow the instruction below for submission. Login to one of the university Sun platforms. Go to the directory that only contains: The lex definition file for FP language. The file name should be FP.l. The yacc definition file for FP language. The file name should be FP.y. Any other files that contains code being used in FP.y.

A makefile if you code contains more than FP.l and FP.y. The Design.doc file that contains the description of some special features of your project that is not already specified in the project specification, including all problems your program may have, how you handled some problems listed in the handout, and/or all additional features you implemented. Issue the command ~ilyen/handin/handin.compiler. This command has to be issued in the directory that contains the files listed above. After submitting your project, do not delete or modify your code. Just in case something is wrong, you will have a chance to resubmit your program files with the original dates on them. You are responsible for generating your own testing programs and test your project thoroughly. We will use a different set of FP programs for testing. You need to consider the incorrect cases carefully as well. If your program does not fully function, please try to have partial results printed clearly to obtain partial credits. If your program does not work or it does not provide any output, then there will be no partial credits given. If your program does work fully or partially but we cannot make it run properly, then it is your responsibility to make an appointment with TA and come to the TA office to demonstrate your program. According to the problems, appropriate point deductions will apply in these situations. 4 Third Phase -- Code Generation For the third project, you will generate the pseudo machine code (PMC) for the input FP program. Before code generation, some simple code optimization will be performed. The code generation and optimization procedures are performed on the parse tree obtained in Project 2. 4.1 PMC Definition The definition for the pseudo machine code PMC is as follows. <program> ::= <declarations> <statements> <declarations> ::= <dclr> <declarations> <dclr> <dclr> ::= <type> <identifier> ; <type> ::= integer float string Boolean label <statements> ::= <stmt> <statements> <stmt> <stmt> ::= <label> <instruction> ; <instruction> ; <instruction> ::= <add-instruction> <sub-instruction>... (as shown in the following) <add-instruction> ::= add <op-dest> <math-param> <math-param> <sub-instruction> ::= sub <op-dest> <math-param> <math-param> <mul-instruction> ::= mul <op-dest> <math-param> <math-param> <div-instruction> ::= div <op-dest> <math-param> <math-param> <mod-instruction> ::= mod <op-dest> <math-param> <math-param> <init-instruction> ::= init <identifier> <all-param> <and-instruction> ::= and <op-dest> <bool-param> <bool-param> <or-instruction> ::= or <op-dest> <bool-param> <bool-param>

<test-instruction> ::= <comparison-operator> <op-dest> <math-param> <math-param> <if-instruction> ::= if <register> <label> <goto-instruction> ::= goto <label> <fgoto-instruction> ::= fgoto <identifier> <read-instruction> ::= read <identifier>+ <print-instruction> ::= print <all-parameter>+ <op-dest> ::= <identifier> <register> <constant> ::= <integer> <float> <string> <Boolean> <math-param> ::= <identifier> <register> <integer> <float> <bool-param> ::= <identifier> <register> <Boolean> <all-param> ::= <identifier> <constant> Here are the semantic definitions of the instructions. <add-instruction> computes the first <math-param> + the second <math-param> and stores the result to the <identifier> or <register>. <sub-instruction>, <mul-instruction>, <div-instruction>, and <mod-instruction> are the same as the <add-instruction> but are for, *, /, and % operators, respectively. <init-instruction>: initializes the identifier to the constant or another identifier. <and-instruction> and <or-instruction> compute the first <bool-param> and/or the second <bool-param> and stores the result to <identifier> or <register>. <test-instruction> tests whether the first <math-param> is = or > or >= the second <math-param>. If true then the result is 1; otherwise, the result is 0. The result is store in an <identifier> or a <register> <if-instruction> tests if <register> content is positive (true). If so, then the program flow should go to the instruction labeled by <label>. If not, then it continues to the next instruction. <goto-instruction> simply causes the program flow to go to the instruction labeled by <label>. <fgoto-instruction> is the flexible goto instruction. It causes the program flow to go to the instruction labeled by the content of the <identifier>. The content of the <identifier> has to be a positive integer within the range of the labels in the program. <read-instruction> reads from the standard input into one or more identifiers. It can read multiple inputs for multiple variables. <print-instruction> prints the parameters following it. The syntax and semantics of some of the tokens in PMC are defined as follows. <label> = L[1-9][0-9]*. A few example labels are: L1, L30, etc. The number parts of the labels in PMC have to be consecutive. For example, if the program has labels L5 and L7, then L6 has to also be in the program. <register> = R[0-9] <type> can be integer, float, Boolean, and string. <identifier>, <integer>, <float>, <string>, and <Boolean> are the same as those in FP. But here we have to change FP s <identifier> so that they will not start with R or L. Temporary identifiers generated by the code generator have to be declared also. You will need to add temporary variables to the symbol table because they may be referenced later.

4.2 Optimization on Parse Tree Before code generation, you are to perform code optimizations based on the parse tree. We consider two optimization goals: constant propagation and dead code elimination. You only need to perform optimization for the main body of the program, not for the function definitions. Since you are working on the parse tree, you sort of already have a control flow graph. For constant propagation, you need to identify the constant definition statements. A statement that assigns a constant to an identifier is a constant definition statement. If the parameter of an assignment statement includes a pre-defined function (+,, *, /, %) with constant parameters, then you should evaluate the pre-defined function statement and replace the function-parameter by the constantparameter (the evaluation result). This can be done recursively till a constant is derived as the parameter for the assignment statement or till it is not possible to evaluate further. When you encounter function calls, you consider it the same as a variable. Once a variable is defined as a constant, you need to propagate the definition till the variable is no longer a constant or till the end of the program. The goal for constant propagation is to replace a constant variable by the constant. You may want to do so during analysis and do it on the parse tree, instead of waiting till the end of analysis and come back to find out again where such replacement should take place. For dead code elimination, you are supposed to start from the end of the program and derive back the liveliness set. After the analysis, from the liveliness set after an assignment statement you can decide whether the statement can be removed. If so, then remove the statement from the parse tree. In fact, you can identify the dead code and remove it during analysis, instead of waiting till the end. You need to clearly show the steps you take in optimization. When you construct the set of constant definition statements, you should print the set after each step. You need to do the same for liveliness set. When you perform an optimization action, including when a variable is changed to a constant and/or a dead code is eliminated, you need to print out what optimization you are doing. 4.3 Code Generation After optimization, you need to generate PMC code for the input FP program. First, you need to go through the symbol table and generate the declaration statements for all the identifiers (not for the function names and the program name). Then you need to generate the PMC statements based on the parse tree. During code generation, you may create new temporary variables and labels. You need to put them also in the symbol table for references. You should assign types to temporary variables and you need to generate declaration statements for them. When generating code for a function call in FP, you need to generate the code to copy the values of the actual parameters to the formal parameters, record the return label index, and then jump to the function for execution. After execution of a function, you need to generate code to copy the return value to whatever variable in the caller and jump back to the recorded address. Since the return address is not a fixed label, you need to use fgoto and provide the identifier that stores the return label index to the fgoto instruction. We assume that the parameter passing mechanism used in FP is pass-by-value. Also, to simplify function handling, we assume that there will be no recursive calls in user-defined FP functions (pre-defined functions can be recursive).

The input FC program should be read from a file and the output PMC program should be stored in a file. Do not use hard coded file names, but pass the input and output file names as the program arguments. A PMC executor will be provided so that you can execute the PMC code you generated from your FP program. TA s web page will post the executable of the PMC executor and how to integrate with your code. 4.4 Submission You need to electronically submit your program before midnight of the due date (check the web page for due dates). Follow the instruction below for submission. Login to one of the university Sun platforms. Go to the directory that only contains: The lex definition file for the FP language. The file name should be FP.lex. The yacc definition file for the FP language. The file name should be FP.yacc. Other files that implement the code generation system. A makefile for invoking lex, then invoking yacc, then compiling your program with the code generated by lex and yacc. The sample program(s) written in the FP language. A readme file that explains how to compile and run your program and how to interpret your output. Also, if you have some information that you would like the TA to know, you can also put it in this file. The Design.doc file that contains the description of some special features of your project that is not specified in the project specification, including all problems your program may have and/or all additional features you implemented. Issue the command ~ilyen/handin/handin.compiler. This command has to be issued in the directory that contains the files listed above. After submitting your project, do not delete or modify your code. Just in case something is wrong, you will have a chance to resubmit your program files with the original dates on them. You are responsible for generating your own testing programs and test your project thoroughly. We will use a different set of FP programs for testing. You need to consider the incorrect cases carefully as well. If your program does not fully function, please try to have partial results printed clearly to obtain partial credits. If your program does not work or it does not provide any output, then there will be no partial credits given. If your program does work fully or partially but we cannot make it run properly, then it is your responsibility to make an appointment with TA and come to the TA office to demonstrate your program. According to the problems, appropriate point deductions will apply in these situations.