KU Compilerbau - Programming Assignment

Similar documents
KU Compilerbau - Programming Assignment

The SPL Programming Language Reference Manual

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

A Short Summary of Javali

Intermediate Code Generation

The PCAT Programming Language Reference Manual

Contents. Jairo Pava COMS W4115 June 28, 2013 LEARN: Language Reference Manual

Lexical Considerations

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

1 Lexical Considerations

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

Introduction to Programming Using Java (98-388)

Language Reference Manual simplicity

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

SFU CMPT 379 Compilers Spring 2018 Milestone 1. Milestone due Friday, January 26, by 11:59 pm.

Decaf Language Reference

Java Notes. 10th ICSE. Saravanan Ganesh

Crayon (.cry) Language Reference Manual. Naman Agrawal (na2603) Vaidehi Dalmia (vd2302) Ganesh Ravichandran (gr2483) David Smart (ds3361)

Lexical Considerations

IC Language Specification

Sprite an animation manipulation language Language Reference Manual

Time : 1 Hour Max Marks : 30

ARG! Language Reference Manual

c) Comments do not cause any machine language object code to be generated. d) Lengthy comments can cause poor execution-time performance.

Program Fundamentals

GBIL: Generic Binary Instrumentation Language. Language Reference Manual. By: Andrew Calvano. COMS W4115 Fall 2015 CVN

SFPL Reference Manual

B.V. Patel Institute of BMC & IT, UTU 2014

These are reserved words of the C language. For example int, float, if, else, for, while etc.

Typescript on LLVM Language Reference Manual

Compiler Construction. (1 Design practical)

6.096 Introduction to C++ January (IAP) 2009

Standard 11. Lesson 9. Introduction to C++( Up to Operators) 2. List any two benefits of learning C++?(Any two points)

1.1 Introduction to C Language. Department of CSE

COMS W4115 Programming Languages & Translators GIRAPHE. Language Reference Manual

Decaf Language Reference Manual

Overview of C. Basic Data Types Constants Variables Identifiers Keywords Basic I/O

Programming for Engineers Introduction to C

Chapter 2 - Introduction to C Programming

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Pace University. Fundamental Concepts of CS121 1

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Programming in C++ 4. The lexical basis of C++

Full file at C How to Program, 6/e Multiple Choice Test Bank

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

CSCI 2010 Principles of Computer Science. Data and Expressions 08/09/2013 CSCI

Project 1: Scheme Pretty-Printer

C: How to Program. Week /Mar/05

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

ASML Language Reference Manual

GraphQuil Language Reference Manual COMS W4115

Object oriented programming. Instructor: Masoud Asghari Web page: Ch: 3

Programming Assignment II

YOLOP Language Reference Manual

B.V. Patel Institute of Business Management, Computer & Information Technology, Uka Tarsadia University

Expressions and Data Types CSC 121 Spring 2015 Howard Rosenthal

Assoc. Prof. Dr. Marenglen Biba. (C) 2010 Pearson Education, Inc. All rights reserved.

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Compiler Techniques MN1 The nano-c Language

9/5/2018. Overview. The C Programming Language. Transitioning to C from Python. Why C? Hello, world! Programming in C

The Warhol Language Reference Manual

DEMO A Language for Practice Implementation Comp 506, Spring 2018

The C Programming Language. (with material from Dr. Bin Ren, William & Mary Computer Science)

Language Fundamentals Summary

QUIZ: What value is stored in a after this

LESSON 1. A C program is constructed as a sequence of characters. Among the characters that can be used in a program are:

Introduction to C Programming. Chih-Wei Tang ( 唐之瑋 ) Department of Communication Engineering National Central University JhongLi, Taiwan

University of Arizona, Department of Computer Science. CSc 453 Assignment 5 Due 23:59, Dec points. Christian Collberg November 19, 2002

Exercise ANTLRv4. Patryk Kiepas. March 25, 2017

4 Programming Fundamentals. Introduction to Programming 1 1

Introduction to C# Applications

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

ECE220: Computer Systems and Programming Spring 2018 Honors Section due: Saturday 14 April at 11:59:59 p.m. Code Generation for an LC-3 Compiler

Basics of Java Programming

C-LANGUAGE CURRICULAM

CS 6353 Compiler Construction Project Assignments

UNIT- 3 Introduction to C++

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

Compiler Construction D7011E

BIT Java Programming. Sem 1 Session 2011/12. Chapter 2 JAVA. basic

EECS483 D1: Project 1 Overview

Compilers. Compiler Construction Tutorial The Front-end

SMURF Language Reference Manual Serial MUsic Represented as Functions

Introduction to Programming (Java) 2/12

VLC : Language Reference Manual

A brief introduction to C programming for Java programmers

Computer System and programming in C

Expressions and Data Types CSC 121 Fall 2015 Howard Rosenthal

Language Reference Manual

Data and Expressions. Outline. Data and Expressions 12/18/2010. Let's explore some other fundamental programming concepts. Chapter 2 focuses on:

CSE 431S Type Checking. Washington University Spring 2013

A Simple Syntax-Directed Translator

Homework #3: CMPT-379

Intermediate Representations

Homework 1 Simple code genera/on. Luca Della Toffola Compiler Design HS15

Review for Test 1 (Chapter 1-5)

Transcription:

716.077 KU Compilerbau - Programming Assignment Univ.-Prof. Dr. Franz Wotawa, Birgit Hofer Institute for Software Technology, Graz University of Technology April 20, 2011 Introduction During this semester you will build a simple C compiler in Java using ANTLR 3.3. You can to this together with 3 of your colleagues (4 students per group at maximum). Your compiler should be able to translate a program written in the simplified C language described below to Java bytecode. Figure 1(b) illustrates the different phases of a compiler. The phases which will be part of your exercise are highlighted in Grey. (a) Language processing system (b) Compiler phases Figure 1: Compiler 1

The following table gives an overview of your tasks and the delivery deadlines of them. Task Reachable Points Deadline 0 Group Registration - 31.03. 1 Lexical Analysis 5 04.04. 2 Syntax Analysis 15 11.04. 3 Type Checking 15 16.05. 4 Intermediate Code 15 30.05. 5 Code Generation 15 20.06. General rules You have to deliver tasks 1-5 via SVN. You must always hand in something, since the tasks build on the previous tasks. If you are not able to solve one of the tasks you will get the source code of another group on request so that you are able to finish the other tasks. You must document the percentage of participation of each team member for each task in the README file. You will get the points reached for each task via email. There are mandatory interviews at the end of task 5. File/folder hierarchy Your SVN repository must have a branch for each task. The branches must have the name task {number} where {number} is replaced by the actual task number. Each branch must have the following structure. Please take care that the required documents are in the correct folders. task {number}/ build.xml doc/ lib/ src/ test/ readme.txt readme.txt **/*.java NEW: SimpleC.g The file readme.txt must contain the table with the percentage of participation all changes made (e.g. bug correction) with respect to the previous tasks in clear and short sentences known limitations / bugs implemented additional tasks 2

lib The directory lib should contain the ANTLR and the JUnit.jar files. build.xml The file build.xml must define an ant task compile which compiles the grammar-file to Java files and compiles all Java files from src. an ant task run-junit which depends on compile, compiles all Java files from test, runs all JUnit tests and creates an xml file in the output folder. You can use the BUILD file from the framework. Framework You can download a simple framework from course web site. This framework contains all required libraries (ANTLR, JUnit ), skeletons of the required classes, JUnit test files, example input and output files, an example README file and an example BUILD file. Upload these files to your repository and modify and extend them. 3

The Grammar Syntax program declarations function declarations declarations declarations type identifier-list; declarations function head; ɛ identifier list identifier identifier-list, identifier identifier id * identifier type int char void function declarations function declaration function declarations ɛ function declaration function head function body function head type identifier arguments arguments ( parameter-list ) ( ) parameter list type identifier parameter list, type identifier function body { declarations optional statements } compound statement { optional statements } optional statements statement list ɛ statement list statement statement list statement statement compound statement if (assignment expr) statement else statement for (expr stmt expr stmt assignment expr) statement expr stmt return assignment expr ; expr stmt ; assignment expr ; assignment expr identifier assignop expression identifier assignop & identifier expression expression simple expression simple expression relop assign simple expression simple expression term sign term simple expression sign term simple expression or term term factor term mulop factor factor identifier function call NEW: int ( assignment expr ) not factor literal string function call identifier ( extend assignment expr list ) identifier ( ) extend assignment expr list assignment expr & identifier extend assignment expr list, assignment expr extend assignment expr list, & identifier relop assign relop 4

assignop Operators relop < <= > >= == sign + - mulop * / % && assignop = or not! Identifiers id letter ( letter digit )* letter [a-za-z] digit [0-9] String literals literal string ( [ 0-9a-zA-Z ]! % escape sequence relop sign mulop assignop )* Numerical literals int decint octint hexint decint 0 digit digit0* hexint 0 (x X) hexdigit+ octint 0 octdigit+ digit [1-9] digit0 [0-9] hexdigit [0-9a-fA-F] octdigit [0-7] Escape sequences \ n newline \ t horizontal tab \ b backspace \ r carriage return \ f form feed \ single quote \ double quote \\ backslash Comments Comments may appear after any token and are surrounded by /* and */. White spaces White spaces between tokens are optional, with one exception: keywords must be surrounded by white spaces, newlines or the beginning of the program. 5

0 Group Registration Groups up to 4 students are allowed. In order to get access to an SVN repository you have to register your group with help of the Web-Interface. The link to the Web-Interface will be posted on Wednesday, March 16 th to the newsgroup. 1 Lexical Analysis Write a lexical analyzer for the subset of the C language described above in Java with ANTLR version 3.3. Name your grammar file NEW: SimpleC.g. Create the class LexicalAnalyzer.java in the package at.tugraz.ist.compilerbau in the directory src with the method public static int lexer(string file path). This method returns 0 if the lexical analysis was successful, otherwise it returns the number of errors found. In addition write the following information to the standard output: Input program abstracted to lexemes and keywords with the same line separations as the original program, line numbers added, all comments and unnecessary white spaces removed, Summary or errors Number of errors found in the source file You can find an example input and output in the framework. Create your own example programs (at least one error-free program and one program which leads to lexical errors) and add them to the test-folder. Extend the JUnit test to test those programs. Create the README file as descripted above and extend the BUILD file if necessary. Task Summary 1. Define an ANTLR grammar file for the shown grammar 2. Implement the requested method for the class LexicalAnalyzer 3. Create example programs and extend the JUnit tests 4. Create a README file 5. Adapt the BUILD file if necessary 6

2 Syntax Analysis Write a syntactical analyzer for the grammar described above in Java with ANTLR version 3.3. Extend the file NEW: SimpleC.g. Please note: the grammar above must be transformed into an LL grammar. Create the class SyntaxAnalyzer.java in the package at.tugraz.ist.compilerbau in the directory src with the method public static int checksyntax(string file path). This method returns 0 if the syntax analysis was successful, otherwise it returns the number of errors found. If there exist lexer errors, no syntactical analysis have to be performed. Instead the number of lexical errors should be returned. In addition write the following information to the standard output: Input program abstracted to function definitions a line for every definition of a function/procedure (name + signature) an output for the end of every function/procedure line numbers added Summary or errors Number of errors found in the source file You can find an example input and output in the framework. Create your own example programs (at least one error-free program and one program which leads to syntactical errors) and add them to the test-folder. Extend the JUnit test to test those programs. It makes sense to build an abstract syntax tree, since you need it for the next task. Be careful with left and right associativity and precedence. An operator like + is left-associative: a + b + c is equal to (a + b) + c, but assignment is right-associative: a = b = c means a = (b = c). Update the your README file. Don t forget to document any changes in the lexer you made in this task. Bonus Task Your are allowed to extend the given grammar, e.g., you can add in-line comments. If you extend the language, please make sure that your grammar stays downward compatible, and document your changes in the README file. You can get up to 3 points for extensions. Task Summary 1. Extend your ANTLR grammar file 2. Implement the requested method for the class SyntaxAnalyzer 3. Create example programs and extend the JUnit tests 4. Adapt the README file 7

3 Type Checking Create data structures to hold the type information for the identifiers. You will have (at least) int, void, pointers to (pointers to pointers to... ) int or void, and functions with argument and return types. Implement the code to fill the data structures, and to perform type checking. Create the class TypeChecker.java in the package at.tugraz.ist.compilerbau in the directory src with the method public static int checktypes(string file path). This method returns 0 if the type checking analysis was successful, otherwise it returns the number of errors found. If there exist lexical or syntactical errors, no type checking have to be performed. Instead the number of lexical or syntactical errors should be returned. In addition write the following information to the standard output: Type errors Use of undeclared identifiers Use of incorrect type Double declarations (Note: It is allowed to declare a variable in the global scope and in each local scope. But it is an error to declare a variable more than once in any given scope.) Type coercions All type coercions. (For example: cast expression 4 * a from int to real in line 27. ) The type of any operator. (For example, real-real addition 8.0 + 4 * a in line 27.) Definitions of variables, including type and scope. defined in function f with type integer ). (For example: variable a Keep your output readable, e.g., in the form of a table. You can find an example input and output in the framework. The examples shown here do not prescribe the exact form of your output. Define printf and scanf now. Note that printf may have one or two arguments. Create your own example programs (at least one type-correct and one type-incorrect program) and add them to the test-folder. Extend the JUnit test to test those programs. Scopes Note that there are two scopes: a global one for the program and a local one for the current function. Nested scoping is not required, but you can get bonus points for implementing it. NEW: Forward declarations In C it is not allowed to use a variable before it is defined. For instance, if you define a function a, and then a global variable q, you can not use q in a. For functions, the situation is more complicated. If you define a function a, and then a function b, and you use b in a, you are creating an implicit declaration, which assumes, that b returns value of a certain type, e.g., an int. This problem is avoided in C by forward declarations. Don t forget to include forward declarations in your type checking. Type Coercion A coercion occurs if the type of an operand is automatically converted to the type expected by the operator. 1. A conversion to void is always allowed and results in the result being thrown away. For instance, printf returns an int, but (void) printf("hallo") (not allowed in our grammar), or more succinctly printf("hallo") are legal C statements. 8

2. A conversion from void is never possible. A void type may not participate in an operation (such as +). A void variable can not be defined. 3. A binary operator (such as + or <, but not && or ) takes either two int, or a char and an int (in either order). In the latter case, the char is converted to int before the operation is performed. (You can not do arbitrary arithmetic on pointers. In C, adding pointer and int is allowed, but you do not need to support it. Comparison of pointers (a<b) is allowed!) 4. A char can be assigned to an int and is converted automatically. An int can also be converted to a char automatically; this may lead to loss of information. 5. int, char and pointers can be used in Boolean expressions. An int or char with the value 0 is false, any other value is true. Pointers are NULL (false) or non-null (true). Boolean operators (&&,, <, >, etc.) return an int. 6. Any pointer conversion is allowed, and never leads to an error. A warning of an invalid cast is appreciated, and if you implement such warnings consequently, you ll get bonus points. Conversions between pointers and ints are not supported. Bonus Tasks Warnings for invalid pointer casts: 2 points Implementation of nested scopes: 2 points Extend the type system to include more types. Note: coercion rules in C are not quite trivial. Points depend on your exact plans. Don t forget to clearly document the implemented bonus taks in the README file. Task Summary 1. Implement the requested method for the class TypeChecker 2. Define printf and scanf 3. Create example programs and extend the JUnit tests 4. Adapt the README file 9

4 Intermediate Code Build an Intermediate Code Generator which is able to build the intermediate code for any program written in our grammar. Create the class IntermediateCode.java in the package at.tugraz.ist.compilerbau in the directory src with the method public static int createintermediatecode(string file path). This method returns 0 if the intermediate code generation was successful, otherwise it returns the number of (lexical, syntactical or type) errors found. In addition it writes the ASCII representation of the intermediate code to the standard output. Use comments to clarify which part of the C code corresponds to the intermediate code. Three-Address Code The three address code should support the following commands: Assigments of the form x := y op z, where op is one of int+, int-, int*, int/, int%, which take two integers and return the corresponding result. int&&, int, which take two integers and return 0 or 1. int<, int<=, int>=, int>, int==, which take two integers and return 0 or 1. Assignments of the form x := op y, where op is one of intminus intnot and y is a variable. op is one of address, dereference and y is a variable. op is one of intconst, pointerconst, and y is an integer constant or a label, respectively. The assignment x *= y, stating that y should be assigned to the location that x points to. (Translates the C statement *x = y.) A constant declaration string a, where a is a quoted string ( hello!\ 0 ). Copy statements: x := y. Jumps The label label L. The unconditional jump goto L, where L is a label Conditional jump ifint x goto L, where x is an int and L a label. For functions The statement param x, which says that x is a parameter to the next function call. Parameters should be listed from right to left. The statement call f n, where f is the name of a function and n is the number of parameters. The statement function f n, where f is a function, and n is the number of local variables, including temporary ones. The statements return and return x, which return from the function, possibly returning the value in x.if you want your compiler to be gcc compatible, you need to distinguish between int and char returns. This is not part of the assignment. The statement local x i, where x is a local variable, a temporary variable, or an argument, and i is a number to reserve space. For locals and temps, i should be negative and run from -1 down. 10

For arguments, i should be positive and run from 0 up (the left argument has number 0). The statement getresult x, which stores the return value of the last function call (if any) in x. The statement global x i, where x is a variable name, and i is a number, starting at 0, running up. The statement comment x, which is ignored. This list might be incomplete. Your are allowed to add additional commands. Post your additions to the newsgroup so that others can benefit from them. Don t forget to document your extensions in the README file. You do not have to pay attention to efficiency of time or space. In the intermediate code (but not in C ), every function has to end with a return. Boolean values Truth values are represented by integers: 0 is false, anything else is true. generated by a comparison, &&, or may only be 0 or 1. A truth value Pointers Pointers are treated like ints. Implementation Suggestion Create an inheritance hierarchy, with the class Statement on top. Other Statements inherit from this class. Your intermediate code program is a vector of Statements, each Statement can print itself. Later, each Statement will know how to translate itself to e.g. assembler code or Java bytecode. Use an attributed grammar on the abstract syntax tree to generate the code. A function node, for example, should get the intermediate code from its children (declarations, statements), plus the number of temp variables, so that it can construct the function name number statement. The intermediate code for the function node consists of this statement and the intermediate code for the body. Bonus Task You need not to implement short-circuit evaluation for Boolean expressions. You will get 2 bonus points for implementing short circuiting. Task Summary 1. Implement the requested method for the class IntermediateCode 2. Update your README file 11

5 Code Generation Implement the translation of the intermediate code to Java bytecode. Create the class CodeGeneration.java in the package at.tugraz.ist.compilerbau in the directory src with the method public static int createcode(string file path). This method creates the output file <file>.class, where <file>.c is the name of the input file. The method returns 0 if the code generation was successful, otherwise it returns the number of (lexical, syntactical or type) errors found. Your program must be executable by the Java Virtual machine. Printf and scanf must work. NEW: Option Instead of directly producing bytecode, you are allowed to produce code in an assembler-like syntax (Jasmin). In this case your output files must end with.j. It must be possible to translate the generated output files to bytecode by using Jasmin 2.4. Don t forget to add jasmin.jar to your libs-folder. Task Summary 1. Implement the requested method for the class CodeGeneration 2. Update your README file 12