University of Arizona, Department of Computer Science. CSc 453 Assignment 5 Due 23:59, Dec points. Christian Collberg November 19, 2002

Similar documents
University of Arizona, Department of Computer Science. CSc 520 Assignment 4 Due noon, Wed, Apr 6 15% Christian Collberg March 23, 2005

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Christian Collberg March 26, 2008

The PCAT Programming Language Reference Manual

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

C Language Part 1 Digital Computer Concept and Practice Copyright 2012 by Jaejin Lee

Intermediate Code Generation

Please refer to the turn-in procedure document on the website for instructions on the turn-in procedure.

Intermediate Representation Prof. James L. Frankel Harvard University

1 Lexical Considerations

Expressions. Arithmetic expressions. Logical expressions. Assignment expression. n Variables and constants linked with operators

KU Compilerbau - Programming Assignment

Lexical Considerations

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

CS 6353 Compiler Construction Project Assignments

CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV. Department of Computer Science University of Arizona

B.V. Patel Institute of Business Management, Computer & Information Technology, Uka Tarsadia University

ECE220: Computer Systems and Programming Spring 2018 Honors Section due: Saturday 14 April at 11:59:59 p.m. Code Generation for an LC-3 Compiler

The SPL Programming Language Reference Manual

Control Structures. Boolean Expressions. CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV

Intermediate Representations

3. Java - Language Constructs I

Compiler Design: Lab 3 Fall 2009

Introduction to Programming Using Java (98-388)

Concepts Introduced in Chapter 6

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

CS 2210 Programming Project (Part IV)

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

CSc 372 Comparative Programming Languages. 4 : Haskell Basics

CSc 372. Comparative Programming Languages. 4 : Haskell Basics. Department of Computer Science University of Arizona

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

CSc 520. Principles of Programming Languages 11: Haskell Basics

Lexical Considerations

Compilers. Compiler Construction Tutorial The Front-end

FORM 2 (Please put your name and form # on the scantron!!!!)

CSE 340 Fall 2014 Project 4

Compilers CS S-05 Semantic Analysis

Control Structures. Lecture 4 COP 3014 Fall September 18, 2017

QUIZ: What value is stored in a after this

Concepts Introduced in Chapter 6

C - Basics, Bitwise Operator. Zhaoguo Wang

Model Viva Questions for Programming in C lab

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

MIDTERM EXAMINATION - CS130 - Spring 2003

Homework #3: CMPT-379 Distributed on Oct 23; due on Nov 6 Anoop Sarkar

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

C-LANGUAGE CURRICULAM

Writing Program in C Expressions and Control Structures (Selection Statements and Loops)

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

CS Programming In C

CSc 520 Principles of Programming Languages. 26 : Control Structures Introduction

Three-Address Code IR

IR Generation. May 13, Monday, May 13, 13

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

TDDD55 - Compilers and Interpreters Lesson 3

A Fast Review of C Essentials Part I

CpSc 1011 Lab 5 Conditional Statements, Loops, ASCII code, and Redirecting Input Characters and Hurricanes

Intermediate Representations

Tutorial 1: Introduction to C Computer Architecture and Systems Programming ( )

CS 361 Computer Systems Fall 2017 Homework Assignment 1 Linking - From Source Code to Executable Binary

Final CSE 131B Spring 2004

CSCI 171 Chapter Outlines

The C language. Introductory course #1

Name :. Roll No. :... Invigilator s Signature : INTRODUCTION TO PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70

ENGINEERING 1020 Introduction to Computer Programming M A Y 2 6, R E Z A S H A H I D I

Architectures. Code Generation Issues III. Code Generation Issues II. Machine Architectures I

Part I Part 1 Expressions

Pointer Casts and Data Accesses

Semantic actions for declarations and expressions

Java is an objet-oriented programming language providing features that support

CS LIR Language Specification and microlir Simulator Version 1.2

Physics 2660: Fundamentals of Scientific Computing. Lecture 3 Instructor: Prof. Chris Neu

Welcome Back. CSCI 262 Data Structures. Hello, Let s Review. Hello, Let s Review. How to Review 8/19/ Review. Here s a simple C++ program:

Type Checking Binary Operators

Computer System and programming in C

CSI33 Data Structures

SECTION II: LANGUAGE BASICS


C++ for Java Programmers

CSE-304 Programming Assignment #4

Project Compiler. CS031 TA Help Session November 28, 2011

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Basic C Programming (2) Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

Compilers Project 3: Semantic Analyzer

Short Notes of CS201

CS415 Compilers. Intermediate Represeation & Code Generation

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

CSc 520 Principles of Programming Languages. Questions. rocedures as Control Abstractions... 30: Procedures Introduction

CS 2505 Fall 2013 Data Lab: Manipulating Bits Assigned: November 20 Due: Friday December 13, 11:59PM Ends: Friday December 13, 11:59PM

Two s Complement Review. Two s Complement Review. Agenda. Agenda 6/21/2011

Semantic actions for declarations and expressions

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Princeton University Computer Science 217: Introduction to Programming Systems The Design of C

& Simulator. Three-Address Instructions. Three-Address Instructions. Three-Address Instructions. Structure of an Assembly Program.

Semantic actions for declarations and expressions. Monday, September 28, 15

ISA 563 : Fundamentals of Systems Programming

CS201 - Introduction to Programming Glossary By

Software Protection: How to Crack Programs, and Defend Against Cracking Lecture 3: Program Analysis Moscow State University, Spring 2014

Transcription:

University of Arizona, Department of Computer Science CSc 453 Assignment 5 Due 23:59, Dec 4 100 points Christian Collberg November 19, 2002 1 Introduction Your task is to write a code generator for the language Luca. You will be given a front-end that produces a control flow graph of a expression trees, b stack code, c quadruples, or d triples from Luca source files. You should write a code generator that produces either a SPARC-V9 assembly code that runs on lectura, or b MIPS assembly code that runs on the SPIM simulator from one of these intermediate forms. You should produce a program lc that performs all the necessary actions to turn a source program into assembly code. It should be called and executed like this if you are compiling for the MIPS: > lc x.gus > x.s > spim -f x.s or like this if you are compiling for the SPARC: > lc x.gus > x.s > gcc x.s > a.out Typically, lc will be a shell script that calls the various pieces of the luca compiler suite: #!/bin/sh luca_lex $1 luca_parse luca_sem -amips luca_ast2tree \ luca_tree2quad -C luca_mips 1

The exact details of this script of course depend on your particular implementation. Note that if you are compiling for the MIPS, luca sem must be run with the -amips option. If you are compiling for the SPARC, instead use the -asparc-v9 option. There are five different versions of Luca described in this document. You can decide to write your code generator for any one of them. If you decide to work on Luca-1, for example, the maximum number of points you can receive is 60; if you work on Luca-4 you can receive 90 points, etc. It s a good idea to start with Luca-1 and then incrementally add new features until you are compiling your chosen target language. You may use whatever code generation algorithm you want. I will only grade on correctness, not the speed of the generated code. If you feel you ve done something very clever I might be convinced to give extra credit. Be sure to indicate what you have implemented in your README file. This assignment can be coded in the language of your choice as long as it can be compiled and run on lectura. Make sure that your Makefile is working properly, and that lc is called exactly as in the example above. You can assume that the input generated by the front-end is correct. In other words, you don t have to do any input error checking. Your code generator must generate code from a control-flow graph, not from the raw intermediate instructions. The Luca intermediate code generators luca AST2tree,luca tree2* take an argument -C which will generate a CFG for you, but you may, if you wish, build one yourself. Hint: There will be one more assignment where you will be asked to do code local optimization on the intermediate code and/or generated machine code. You will only have one week to complete this assignment. For this reason, it may be in your best interest to prepare for the next assignment at this time. In particular, rather than simply printing out each machine code instruction as you generate it, it is better to make each instruction a record/class/struct like this: class MipsInstruction { String label; String operator; Operand arg1; Operand arg2; Operand arg3; } abstract class Operand { } class Register extends Operand { String name; } class Immediate extends Operand { int value; } class RegPlusOffs extends Operand { String name; int value; } 2

and to insert these instructions in a linked list. This will allow the code optimizer to run over the generated instructions and improve on them. 2 Luca-1 [60 Points] Luca-1 has constant and variable declarations, integer arithmetic, assignment statements, READ, WRITE, and WRITELN statements. Only integers and characters can be read, strings can also be written. Identifiers have to be declared before they are used. Identifiers cannot be redeclared. There are three incompatible built-in types, INTEGER, BOOLEAN and CHAR. The identifiers TRUE and FALSE are predeclared in the language. Here is the concrete syntax of Luca-1: program ::= PROGRAM ident ; decl list block. block ::= BEGIN stat seq END decl list ::= { declaration ; } declaration ::= CONST ident : ident = expression VAR ident : ident expression ::= expression bin operator expression unary operator expression expression integer literal char literal designator designator ::= ident bin operator ::= + / % unary operator ::= stat seq ::= { statement ; } statement ::= designator := expression WRITE expression WRITELN READ designator On the MIPS, integers are 32-bit quantities, character and booleans are 8-bits. On the SPARC, integers are 64-bit quantities, character and booleans are 8-bits. On the SPARC, READ and WRITE should be implemented by calling functions in the standard C IO-library stdio. To see how to do this, inspect the output of the standard C compiler. For example, compile #include <stdio.h> main {printf"%i\n",56;}} with cc -S test.c. The resulting assembly code contains the calling sequence for printf. 3 Luca-2 [10 Points] Luca-2 adds IF, IF-ELSE, LOOP, EXIT, REPEAT, FOR, and WHILE statements. EXIT statements can only occur within LOOP statements. The expression in an IF, IF-ELSE, REPEAT, or WHILE statement must be of boolean type. Here are the extensions to the concrete syntax: 3

bin operator ::= AND OR < <= = # >= > unary operator ::= NOT statement ::= IF expression THEN stat seq ENDIF IF expression THEN stat seq ELSE stat seq ENDIF WHILE expression DO stat seq ENDDO REPEAT stat seq UNTIL expression LOOP stat seq ENDLOOP EXIT FOR ident := expression TO expression [ BY const expr ] DO stat seq ENDFOR The FOR-loop BY-expression must be a compile-time constant expression. A Luca FOR-loop FOR i := e1 TO e2 BY e3 DO S ENDFOR should be compiled into code that s equivalent to i := e1; T1 := e2; T2 := e3; IF T2 >= 0 THEN WHILE i <= T1 DO S; i := i + T2; ENDDO; ELSE WHILE i >= T1 DO S; i := i + T2; ENDDO; ENDIF Hint: The front-end of the compiler takes care of generating the right intermediate code for FOR-loops, so there really isn t anything special you have to do in the code generator to handle them. 4 Luca-3 [10 Points] Luca-3 extends Luca-2 with real variable declarations and real arithmetic. Luca does not allow mixed arithmetic, i.e. there is no implicit conversion of integers to reals in an expression. For example, if I is an integer and R is real, then R:=I+R is illegal. Luca instead supports two explicit conversion operators, TRUNC and FLOAT. TRUNC R returns the integer part of R, and FLOAT I returns a real number representation of I. Note also that % remainder is not defined on real numbers. 4

We add two operators TRUNC and FLOAT: unary operator ::= TRUNC FLOAT On the MIPS, reals are 32-bit quantities. On the SPARC, reals are 64-bit quantities. 5 Luca-4 [10 Points] Luca-4 extends Luca-3 with one-dimensional arrays and record types. Assignment is defined for scalars only, not for variables of structured type. In other words, the assignment A:=B is illegal if A or B are records or arrays. READ and WRITE are only defined for scalar values integers, reals, and characters. The element count of an array declaration must be a constant integer expression. Arrays are indexed from 0; that is, an array declared as ARRAY 100 OF INTEGER has the index range [0..99]. It is a checked run-time error to go outside these index bounds. Here are the extensions to the concrete syntax: declaration ::= TYPE ident = ARRAY expression OF ident TYPE ident = RECORD [ { field } ] field ::= ident : ident ; designator ::= ident { designator } designator ::= [ expression ] designator. ident designator 6 Luca-5 [10 Points] Luca-5 extends Luca-4 with non-nested procedures. There is no limit to the number of arguments a procedure may take. Value parameters including structured types such as arrays and records! should be passed by value, VAR parameters by reference. Procedures can be recursive. Here are the extensions to the concrete syntax: declaration ::= PROCEDURE ident [ formal list ] decl list block ; formal list ::= formal param { ; formal param } formal param ::= [ VAR ] ident : ident actual list ::= expression {, expression } statement ::= ident [ actual list ] field ::= ident : ident ; 5

7 Submission and Assessment The deadline for this assignment is midnight, December 4. You should submit the assignment electronically using the Unix command turnin cs453.5 <files> README. Your submission must contain a README-file that states which parts of Luca your code generator can handle. Also, list the name of your team, the team members, and how much each team member contributed to the assignment. Your electronic submission must contain a working Makefile, and all the files necessary to build the code generator. If your program does not compile out of the box you will receive zero 0 points. The grader will not try to debug your program or your makefile for you! This assignment is worth 100 points. Don t show your code to anyone outside your team, don t read anyone else s code, don t discuss the details of your code with anyone. If you need help with the assignment see the TA or the instructor. A The tree intermediate code The front-end generates control flow graphs where each node is a sequence of expression trees. The nodes in these trees are defined below. Declarations Version Pos Major Minor The version of the intermediate code language. VarDecl Pos Symbol VarDecl declares a global or local variable. Symbol is the symbol number. FormalDecl Pos Symbol Declares the formal parameter of a procedure. Symbol is the symbol number. TypeDecl Pos Symbol Declares a record or array type. Symbol is the symbol number. Loads and Stores Store Pos Type Left Right Left is an expresion tree computing an address. Right is an expression tree computing a value it s type is given by symbol Type to be stored at that address. Load Pos Type Des Des is an expression tree computing an address. Load should load the value whose type is given by the symbol Type stored at that address. 6

Expressions BinExpr Pos Op Type Left Right A node in an expression tree that computes Left Op Right. Op is a string. Type is a symbol number. UnaryExpr Pos Op Type Left A node in an expression tree that computes Op Left. Op is a string. Type is a symbol number. LoadLit Pos Type Value Load the literal value Value. In case of strings, the address should be loaded, not the value. Designators AddressOf Symbol Type parameter. Type is the type of the symbol. Load the address of Symbol which could be a global variable, local variable, or formal IndexOf Pos Type Base Index Compute the address of an array element, i.e. Base+ElementTypeType Index. Base is an expression tree computing the base address of the array. Index is an expression tree computing the index value. Type is the symbol number of the array. It s a checked, fatal, run-time error for Index to be <0 or >HighBoundType-1. FieldOf Pos Type Field Base Compute the address of a record field, i.e. Base + OffsetField. Base is an expression tree computing the base address of the record. Field is the symbol number for the field from which we can retrieve information such as offset. Type is the symbol numbher of the record type. Control Branch Pos Op Type Left Right Label Equivalent to if Left Op Right then goto Label. Left and Right are expression trees, Op is a string. If we are generating control flow graphs Label is the number of the basic block to which we should jump. If we are generating a raw sequence of trees Label is the label number from a Label Pos Number node we should jump to. Goto Pos Label Jump to basic block/instruction number Label. Label Pos Label integer. Marks a position in the code to which a Branch or Goto instruction can jump. Label is an Input and Output Write Pos Type Expr Write the value of Expr an expression tree to the standard output. If Expr is a constant string, Expr will compute the string s address, not its value. Read Pos Type Des Read a value into the address held by Des, an expression tree. The type of the data to be read is given by Type, a symbol number. record WriteLn Pos Write a newline character to the standard output. 7

Procedure call ProcCall Pos Symbol Actuals Call procedure Symbol, a symbol number. Actuals is an expression tree. Actual Pos Formal Type Expr Next Actual nodes are linked together on Next to make a list of actual parameters. Expr an expression tree computes the value/address of the actual. Formal is a reference to the symbol table entry for the corresponding formal parameter. Null Pos Null terminates a sequence of Actual nodes. B The control flow graph The intermediate code instructions are organized in a series of control flow graphs, one per procedure. Control Flow Graph PROCEDURE ProcSy EntryBlock ExitBlock Blocks ProcSy is a symbol table entry for the procedure. EntryBlock is the number of the entry block. ExitBlock is the number of the exit block. Blocks is a list of BLOCK:s, as defined below. BLOCK Number Instrs Successors Predecessors Number is the number of the basic block. Instrs is a list of instructions expression trees, quadruples, etc. Successors is a set of numbers of successor blocks. Predecessors is a set of numbers of predecessor blocks. Here is an example: > cat 0.gus PROGRAM P; BEGIN IF 1.1 < 2.0 THEN WRITE 0; ENDIF; END. > luca_lex 0.gus luca_parse luca_sem luca_ast2tree -C 11 ProcedureSy $MAIN 6 0 0 0 PROCEDURE 11 1 6 BLOCK 1 <<=== The Entry block <<=== Predecessor blocks 2 <<=== Successor blocks <<=== List of instructions 8

BLOCK 2 1 3 4 Branch 3 "<" 2 4 LoadLit 3 2 1.1 LoadLit 3 2 2.0 BLOCK 3 2 5 Goto 3 5 BLOCK 4 2 5 Write 4 1 LoadLit 4 1 0 BLOCK 5 3 4 6 BLOCK 6 <<=== The Exit block 5 9

C The quadruple and triple code There is a more-or-less one-to-one correspondence between the treecode and the quadruple and the triple instruction-sets. For example, they all have a BinExpr instruction: Expressions BinExpr Pos Op Type Left Right In the treecode intermediate representation, BinExpr is a node in an expression tree that computes Left Op Right. Op is a string. Type is a symbol number. Left and Right are expression trees. BinExpr Pos Op Type Result Left Right In the quadruple intermediate representation, BinExpr is a threeaddress code instruction that computes Left Op Right. Op is a string. Type is a symbol number. Result, Left and Right are the symbol numbers of the symbols often temporary variables representing the instruction operands. BinExpr Pos Op Type Left Right In the triple intermediate representation, BinExpr is an instruction that computes Left Op Right. Op is a string. Type is a symbol number. Left and Right are indices into the instruction stream, representing triples that compute the instruction operands. The difference is in how the arguments to the binary operation are encoded. In the treecode they are pointers to sub-trees. In the quadruple code they are references to temporary variables in the symbol table. In the triple code they are integers referencing other triples. For example, the program PROGRAM P; BEGIN WRITE 44+55; END. will look like this in the three different intermediate forms: > luca_lex 9.gus luca_parse luca_sem luca_ast2tree 11 ProcedureSy $MAIN 4 0 0 0 Version 4 7 4 ProcBegin 4 11 Write 3 1 BinExpr 3 "+" 1 LoadLit 3 1 44 LoadLit 3 1 55 10

ProcEnd 4 11 > luca_lex 9.gus luca_parse luca_sem luca_ast2tree luca_tree2quad 11 ProcedureSy $MAIN 4 0 0 0 12 TempSy @0 3-1 1 1-1 13 TempSy @1 3-1 1 1-1 14 ConstSy %0 3-1 1 1 44 15 TempSy @2 3-1 1 1-1 16 ConstSy %1 3-1 1 1 55 Version 4 7 4 ProcBegin 4 11 LoadLit 3 1 13 14 LoadLit 3 1 15 16 BinExpr 3 "+" 1 12 13 15 Write 3 1 12 ProcEnd 4 11 > luca_lex 9.gus luca_parse luca_sem luca_ast2tree luca_tree2triple 11 ProcedureSy $MAIN 4 0 0 0 Version 4 7 4 ProcBegin 4 11 LoadLit 3 1 44 LoadLit 3 1 55 BinExpr 3 "+" 1 3 4 Write 3 1 5 ProcEnd 4 11 11