CS 6353 Compiler Construction Project Assignments

Similar documents
CS 6353 Compiler Construction Project Assignments

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

1 Lexical Considerations

Project 1: Scheme Pretty-Printer

Time : 1 Hour Max Marks : 30

The PCAT Programming Language Reference Manual

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

1 The Var Shell (vsh)

Project 2 Interpreter for Snail. 2 The Snail Programming Language

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Languages and Compilers

LECTURE 3. Compiler Phases

Lexical Considerations

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CS 415 Midterm Exam Spring 2002

Project Compiler. CS031 TA Help Session November 28, 2011

Parser Design. Neil Mitchell. June 25, 2004

Compilers Project 3: Semantic Analyzer

CS 541 Spring Programming Assignment 2 CSX Scanner

MP 3 A Lexer for MiniJava

Important Project Dates

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Part 5 Program Analysis Principles and Techniques

Stating the obvious, people and computers do not speak the same language.

Programming Assignment II

CS164: Midterm I. Fall 2003

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

CS 415 Midterm Exam Fall 2003

A simple syntax-directed

The role of semantic analysis in a compiler

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

1. Consider the following program in a PCAT-like language.

Lexical Considerations

Compiler Design (40-414)

Do not write in this area EC TOTAL. Maximum possible points:

CS 415 Midterm Exam Spring SOLUTION

Language Reference Manual simplicity

CSE 3302 Programming Languages Lecture 2: Syntax

CST-402(T): Language Processors

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Programming Languages Third Edition. Chapter 7 Basic Semantics

Compilers. Compiler Construction Tutorial The Front-end

PL Revision overview

Semantic Analysis. Lecture 9. February 7, 2018

Syntax-Directed Translation

Programming Assignment III

CA Compiler Construction

Full file at

CS 2210 Programming Project (Part I)

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CSCI312 Principles of Programming Languages!

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM

Symbol Tables. ASU Textbook Chapter 7.6, 6.5 and 6.3. Tsan-sheng Hsu.

Implementing a VSOP Compiler. March 20, 2018

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

MP 3 A Lexer for MiniJava

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

The SPL Programming Language Reference Manual

Midterm II CS164, Spring 2006

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

CS 2604 Minor Project 1 Summer 2000

CS 211 Programming Practicum Fall 2018

Examination in Compilers, EDAN65

KU Compilerbau - Programming Assignment

Compiler Design: Lab 3 Fall 2009

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

UNIVERSITY OF COLUMBIA ADL++ Architecture Description Language. Alan Khara 5/14/2014

ML 4 A Lexer for OCaml s Type System

Intermediate Code Generation

COP4020 Programming Assignment 1 CALC Interpreter/Translator Due March 4, 2015

A Simple Syntax-Directed Translator

Semantics of programming languages

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

COMPILERS BASIC COMPILER FUNCTIONS

Java Loop Control. Programming languages provide various control structures that allow for more complicated execution paths.

ECE573 Introduction to Compilers & Translators

CS 2210 Programming Project (Part IV)

Yacc: A Syntactic Analysers Generator

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Programming in C++ 4. The lexical basis of C++

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

Types and Static Type Checking (Introducing Micro-Haskell)

Types and Static Type Checking (Introducing Micro-Haskell)

Implementing a VSOP Compiler. March 12, 2018

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

Parsing and Pattern Recognition

8. Control statements

CS448 Designing and Implementing a Mini Relational DBMS

Summary: Direct Code Generation

Typescript on LLVM Language Reference Manual

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Programming Assignment IV

Transcription:

CS 6353 Compiler Construction Project Assignments In this project, you need to implement a compiler for a language defined in this handout. The programming language you need to use is C or C++ (and the language defined by the corresponding tools). The project includes three phases, lexical analysis, syntax analysis, and code generation. In the following, we first define the language syntax and tokens. The definitions are given in BNF form. After the language definition, the three phases of the project are specified. 1 Language Definitions 1.1 Syntax Definitions program ::= { Program program-name function-definitions { Main statements } } program-name ::= identifier function-definitions ::= (function-definition)* function-definition ::= { Function function-name arguments statements return return-arg } function-name ::= identifier arguments ::= (argument)* argument ::= identifier return-arg ::= identifier statements ::= (statement)+ statement ::= assignment-stmt function-call if-stmt while-stmt assignment-stmt ::= { = identifier parameter } function-call ::= { function-name parameters } { predefined-function parameters } predefined-function ::= + * / % read print parameters ::= (parameter)* parameter ::= function-call identifier number character-string Boolean number ::= integer float if-stmt ::= { if expression then statements else statements } while-stmt ::= { while expression do statements } expression ::= { comparison-operator parameter parameter } { Boolean-operator expression expression } Boolean comparison-operator ::= == > >= Boolean-operator ::= or and To make the later references easier, let s call this language the FP language (similar to functional language syntax, but has the procedural language characteristics).

1.2 Definitions of Some Tokens An identifier has the form [a-z, A-Z][a-z, A-Z, 0-9]*, but its length has to be between 1 to 6 characters. Confining the number of characters in an identifier is not a good language design feature, but this is for you to learn a new feature in lex. An integer can have a negative sign ( ) but no positive sign. It can be with or without space(s) between the sign and the digits. If the value is 0, only a single digit 0 is allowed. Otherwise, it is the same as common integer definitions (no leading 0 s). A float has to have the decimal point and at least one digit on each side. The left side of the decimal point follows the rule for integers. The right side of the decimal point can be any number of digits but should have at least one digit. A character string is enclosed within (). It can be [a-z, A-Z, 0-9,, \]+. For example, (I like lex and yacc) is a character string. We use \ to represent a new line, i.e., \n in C. The character strings are mainly used in the read and print functions. The space is simply ( ). Some functions may use character strings as parameters. A Boolean can only be T or F, where T represents true and F represents false. 1.3 Definitions of the Predefined Functions Each of the predefined comparison operations and Boolean operators takes two parameters and the corresponding functionality follows the conventional definition. The predefined functions +, *,, /, and % also have the conventional meanings. +, *,, and / can take two or more parameters. For example, {* A B C} implies A * B * C. Similarly, and / can also take two or more parameters. For example, {/ A B C} implies A / B / C and { A B C} implies A B C. The % function can only take two parameters and {% A B} means A % B (A mod B). The predefine function print takes one or more parameters. It simply prints the value of each parameter to the standard output. Whenever a character string (\) is encountered, a new line should be printed. The space character ( ) should not be skipped and should be specified properly in lex token definitions. The predefine function read also takes one or more parameters. It reads from the standard input the values for the parameters. Since there will be no type definition in advance, the input can be integer, float, string, or Boolean values. Each value for the parameter to be read is separated by a white space. Although there are functions that take a fixed number of parameters, it is not your job to check the number of parameters (not in the lexical and syntax analysis phases). Also, it is not necessary to check whether the number of arguments and number of parameters do match (not in the lexical and syntax analysis phases). You only need to follow the general rules defined in the syntax of the language.

1.4 White Space To ensure the correctness in processing the input FP-program, you need to consider white space as well. White space characters include space, tab, and new-line. Treat the white space the same way as in other high level languages such as C++ and Java (skip white space). Note that space in a character string should not be skipped. 1.5 Some Remarks Note that we do not consider the language typing rules. Thus, there is no need to predefine variables and your FP program does not need to handle mismatched types. Also note that he FP language is case sensitive. 2 First Project -- Lexical Analysis Using lex Define the tokens in the FP language in lex definitions and feed it to lex to generate a scanner (lexer). Then, use the sample program as well as other testing programs written in FP to test your scanner. Note that it is your responsibility to convert the BNF (or verbal) definitions to lex definitions. It is also your responsibility to insert appropriate statements (actions) in your lex definition file so that the lexer (created by lex) will generate a symbol table and print the output token names appropriately. You should design the symbol table such that it contains the necessary fields for this as well as the future projects (you can make the table easy to expand for future fields). One additional task in this project is not an ordinary compiler compiler task, but is designed to let you explore more features in lex. For five of the predefined-functions, +,, *, /, and %, if their parameters are all number s, then you need to evaluate the function and let the returned token be one number (integer or float) and let the numerical result be the corresponding token value. You can consider this as a simple optimization. For example, if you are processing a substring {+ 3 {* 5 2 7}}, then you need to return tokens {, +, integer, integer, and } (the second integer has a value 70), not {, +, integer, {, *, integer, integer, integer, }, and }, not just integer either. Note that you are required to define regular expressions to achieve this (with special lex features). Finally, you also need to handle errors. That means you need to capture wrong tokens instead of letting lex capture them. Your program should continue even if there are wrong tokens. You may not know where to break for an incorrect token, so you can proceed to white space after discovering an incorrect token and resume token recognition after the white space. You need design your token definitions to fulfill this capability. The token symbol for an incorrect token is error and the token value is the text you captured for the incorrect token. Here defines the output from your program. We cannot do any processing at this stage. So you need to output the token names. Print one token on each line, in the correct order. The names of the tokens should conform to those given below. Keyword: Program, Function, return, if, then, else, while, and do. Predefined function: =, +,, *, /, %, read, and, print. Special symbol: { and }.

Comparison operator: ==, >, and >=. Boolean operator: or and and The remainder tokens: identifier, integer, float, character-string, Boolean. Note that for each token, you should output the token names as well as the values you obtained (in the same line). If you do not keep track of the original values, such as the identifier s name, the integer s value, the actual character string, etc., then, the information will be lost. The symbol table should be printed as well. You need to print out the content of the symbol table, one entry on a line. (In this phase, you only have the identifier names). The standard platform for this project is the university Solaris systems, including the servers and workstations such as apache. You should make sure that your program runs on these platforms. You need to electronically submit your program before midnight of the due date (check the calendar for due dates). Follow the instruction below for submission. Login to one of the university Solaris platforms. Go to the directory that only contains: The lex definition file for FP language. The file name should be FP.l. A file containing a sample program written in FP language that you have tested with the scanner generated from your lex definition file. The file name should be sample.fp. If you have multiple sample files, you can name them sample1.fp, sample2.fp, etc. A readme file that explains how to interpret your output from lex (the list of tokens). Also, if you have some information that you would like the TA to know, you can also put it in this file. The optinal Design.doc file that contains the description of some special features of your project that is not specified in the project specification, including all problems your program may have and/or all additional features you implemented. Issue the command ~ilyen/handin/handin.compiler. This command has to be issued in the directory that contains the files listed above. After submitting your project, do not delete or modify your code. Just in case something is wrong, you will have a chance to resubmit your program files with the original dates on them. You are responsible for generating your own testing programs and test your project thoroughly. We will use a different set of FP programs for testing. You need to consider the incorrect cases carefully as well. If your program does not fully function, please try to have partial results printed clearly to obtain partial credits. If your program does not work or it does not provide any proper output, then there will be no partial credits given. If your program does work fully or partially but we cannot make it run properly, then it is your responsibility to make an appointment with the TA and come to the department to demonstrate your program. According to the problems, appropriate point deductions will apply in these situations. If you do not follow the instructions for naming, you are creating trouble for testing, and there will be some penalty for that as well.

3 Second Phase -- Syntax Analysis Using yacc For the second project, you need to define the FP-language in yacc definitions and feed it to yacc to generate a parser. Your definitions should follow the BNF given earlier, but exclude the token definitions which have already been processed by lex and instead use the token names you defined for the first project. You also need to generate a parse tree for the input FP program. For parse tree generation, you need to write code in the definition file for tree node generation. Also, you need to use a stack to keep track of the tree nodes so that you can link them properly into the correct parse tree. You can either use yacc stack or your own stack for this purpose. Each node in your parse tree should include the actual symbol of the node, including all those defined in the BNF for the FP language (e.g., +, Program, argument, return-arg, statement, statements, if-stmt, etc.) For each identifier, instead of giving the actual identifier name, it should simply be a pointer to the symbol table (the index of the symbol table). You have created a symbol table in the first project. Now you need to make sure you can access it and add more information into it. In this phase, you need to assign identifier type to each identifier. Identifier types include: program-name function-name argument return-arg assignment-id: the identifier in the assignment statement and is to be assigned a value expression-id: appear after comparison operators or Boolean operators parameter: all other parameters besides those listed above At the end (best is in the main program), you need to print out two things: the symbol table and the parse tree. For the parse tree, print it in the pre-fix order with indentations. Each node of the tree should be printed on one line. After you print a tree node, its child nodes should be printed following it with indentation and the indentation should be 2 blank spaces from the parent starting column. For your symbol table, please try to print each identifier in the table on one line. In each line, first print the identifier name, followed by the identifier type(s), then followed by any other information you have put in the table. If an identifier appears multiple times and has multiple identifier types, then print all the different types (order is not important). Please note that in the lexical analysis in Project 1, you may have processed some tokens too early and it may cause incorrect processing of the input program. You may have to revise the lex definition file and push some of the processing to be done at the yacc level. It is your responsibility to find the problems and make the changes. You can use the sample program as well as other testing programs written in FP to test your parser. Note that it is your responsibility to thoroughly test your program.

The standard platform for this project is the university Sun systems, including the servers and workstations such as apache. You should make sure that your program runs on these platforms. You need to electronically submit your program before midnight of the due date (check the web page for due dates). Follow the instruction below for submission. Login to one of the university Sun platforms. Go to the directory that only contains: The lex definition file for FP language. The file name should be FP.l. The yacc definition file for FP language. The file name should be FP.y. A file containing a sample program written in FP language that you have tested with the scanner generated from your lex definition file. The file name should be sample.fp. If you have multiple sample files, you can name them sample1.fp, sample2.fp, etc. A readme file that explains how to interpret your output from lex (the list of tokens). Also, if you have some information that you would like the TA to know, you can also put it in this file. The optinal Design.doc file that contains the description of some special features of your project that is not specified in the project specification, including all problems your program may have and/or all additional features you implemented. Issue the command ~ilyen/handin/handin.compiler. This command has to be issued in the directory that contains the files listed above. After submitting your project, do not delete or modify your code. Just in case something is wrong, you will have a chance to resubmit your program files with the original dates on them. If your program does not fully function, please try to have partial results printed clearly to obtain partial credits. If your program does not work or it does not provide any output, then there will be no partial credits given. If your program does work fully or partially but we cannot make it run properly, then it is your responsibility to make an appointment with the TA and come to the department to demonstrate your program. According to the problems, appropriate point deductions will apply in these situations. 4 Third Phase -- Code Generation For the third project, you will generate the pseudo machine code (PMC) for the FP programs. The definition for the PMC is defined as follows. <program> ::= <stmt> <program> <stmt> <stmt> ::= <label> <instruction> ; <instruction> ; <instruction> ::= <load-instruction> <store-instruction>... (as shown in the following) <load-instruction> ::= load <register> <M-addr> <store-instruction> ::= store <register> <M-addr> <add-instruction> ::= add <parameter> <parameter> <parameter> <sub-instruction> ::= sub <parameter> <parameter> <parameter> <mul-instruction> ::= mul <parameter> <parameter> <parameter> <div-instruction> ::= div <parameter> <parameter> <parameter> <mod-instruction> ::= mod <parameter> <parameter> <parameter> <test-instruction> ::= <comparison-operator> <parameter> <parameter> <register>

<if-instruction> ::= if <register> <label> <goto-instruction> ::= goto <label> <read-instruction> ::= get <parameter>+ <print-instruction> ::= print <parameter>+ <read-instruction> ::= read <parameter>+ <parameter> ::= <register> <M-addr> <number> Here are the semantic definitions of the instructions. <load-instruction> loads a memory content <M-addr> to a register <register>. <store-instruction> stores a register content <register> to a memory slot <M-addr>. <add-instruction> computes the first <parameter> + the second <parameter> and stores the result to the third <parameter>. <sub-instruction>, <mul-instruction>, <div-instruction>, and <mod-instruction> are the same as the <add-instruction> but are for, *, /, and % operators, respectively. <test-instruction> tests whether the first <parameter> is = or > or >= the second parameter <parameter>. If true then <register> is set to 1; otherwise, <register> is set to 0. <if-instruction> tests if <register> content is positive (true). If so, then the program flow should go to the instruction labeled by <label>. If not, then it continues to the next instruction. <goto-instruction> simply causes the program flow to go to the instruction labeled by <label>. <read-instruction> reads from the standard input into one or more registers. It can read multiple inputs for multiple parameters. When there are multiple inputs, they should be separated by spaces. <print-instruction> prints the parameters following it. The tokens in PMC are defined as follows. <label> s regular expression is [a-z]+: <register> s regular expression is R[0-9] <M-addr> s regular expression is [a z]+, but excluding the keywords <number> s regular expression is ( ( )[1-9][0-9]*) 0 <comparison-operator> is the same as that in FP You need to write code in the yacc definition file to generate an MC program from the input FC program. Most of the code generation task is relatively simple, generating the corresponding PMC instruction(s) for each statement. However, for the function calls, the code generation process is more complicated. We make several assumptions about the function calls in FP to ease the code generation task. (1) Assume that there is no recursive function call. (2) Assume that a function will only be called once in the entire program. (3) Assume that the scope of all variables are global, including the variables in the functions. Thus, when generating code for a function call in FP, you simply copy the actual parameter to formal parameter and jump to the function for execution. Note that the return address can be uniquely determined due to assumption (2). So after execution of a function, you can simply jump back to a predetermined label. You do need to copy back the return value properly. We assume that the parameter passing mechanism used in FP is passed-by-value. To place the PMC code for functions properly, you may not want to directly print their code segments to the output. You can keep the code segment for a function in a temporary file first. After finishing processing the entire program, you can then read back these files and print them at the end of the output.

The input FC program should be read from the standard input and the output PMC program should be printed to the standard output. Do not use hard coded file names. A PMC executor will be provided so that you can execute the PMC code you generated from your FP program. You need to electronically submit your program before midnight of the due date (check the web page for due dates). Follow the instruction below for submission. Login to one of the university Sun platforms. Go to the directory that only contains: The lex definition file for the FP language. The file name should be FP.lex. The yacc definition file for the FP language. The file name should be FP.yacc. A makefile for invoking lex, then invoking yacc, then compiling your program generated by lex and yacc. The sample program(s) written in the FP language. A readme file that explains how to compile and run your program and how to interpret your output. Also, if you have some information that you would like the TA to know, you can also put it in this file. The optinal Design.doc file that contains the description of some special features of your project that is not specified in the project specification, including all problems your program may have and/or all additional features you implemented. Issue the command ~ilyen/handin/handin.compiler. This command has to be issued in the directory that contains the files listed above. After submitting your project, do not delete or modify your code. Just in case something is wrong, you will have a chance to resubmit your program files with the original dates on them.