CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm

Similar documents
Class Information ANNOUCEMENTS

CS5363 Final Review. cs5363 1

Instruction Selection: Peephole Matching. Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

COMP 412, Fall 2018 Lab 1: A Front End for ILOC

The ILOC Simulator User Documentation

Instruction Selection and Scheduling

CS415 Compilers. Lexical Analysis

Compiler Code Generation COMP360

The ILOC Simulator User Documentation

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

The ILOC Simulator User Documentation

CS 6353 Compiler Construction Project Assignments

Compilers Project 3: Semantic Analyzer

Instruction Selection: Preliminaries. Comp 412

Alternatives for semantic processing

CS 2210 Programming Project (Part IV)

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

CSE 340 Fall 2014 Project 4

Semantic actions for declarations and expressions

CS415 Compilers Register Allocation. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Intermediate Representations

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

UNIVERSITY OF CALIFORNIA

Introduction to Compiler

Semantic actions for declarations and expressions

Code Shape II Expressions & Assignment

Semantic actions for declarations and expressions. Monday, September 28, 15

CS 6353 Compiler Construction Project Assignments

CS415 Compilers. Intermediate Represeation & Code Generation

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

CS2110 Assignment 2 Lists, Induction, Recursion and Parsing, Summer

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

CS 314 Principles of Programming Languages. Lecture 9

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

CS 314 Principles of Programming Languages. Lecture 11

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Lecture 10 Parsing 10.1

Context-Free Grammars

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

2068 (I) Attempt all questions.

Time : 1 Hour Max Marks : 30

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Project 1: Scheme Pretty-Printer

Compiler Design. Code Shaping. Hwansoo Han

Intermediate Code Generation

Lecture 12: Compiler Backend

A programming language requires two major definitions A simple one pass compiler

A simple syntax-directed

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

1. Consider the following program in a PCAT-like language.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Introduction to Optimization, Instruction Selection and Scheduling, and Register Allocation

Compiler Design (40-414)

ENGR 3950U / CSCI 3020U (Operating Systems) Simulated UNIX File System Project Instructor: Dr. Kamran Sartipi

COMPILERS BASIC COMPILER FUNCTIONS

This homework is due by 11:59:59 PM on Thursday, October 12, 2017.

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Homework #1 From C to Binary

CSE 401/M501 Compilers

Compiler Design Overview. Compiler Design 1

CS 432 Fall Mike Lam, Professor. Code Generation

QUESTION BANK CHAPTER 1 : OVERVIEW OF SYSTEM SOFTWARE. CHAPTER 2: Overview of Language Processors. CHAPTER 3: Assemblers

Question Bank. 10CS63:Compiler Design

List of Figures. About the Authors. Acknowledgments

Intermediate Representations

Principles of Programming Languages COMP251: Syntax and Grammars

A language is a subset of the set of all strings over some alphabet. string: a sequence of symbols alphabet: a set of symbols

CS /534 Compiler Construction University of Massachusetts Lowell

Homework & Announcements

Summary: Direct Code Generation

Chapter 2 A Quick Tour

CS 330 Homework Comma-Separated Expression

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Parsing II Top-down parsing. Comp 412

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Subject Name: CS2352 Principles of Compiler Design Year/Sem : III/VI

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

A Simple Syntax-Directed Translator

Homework #1 From C to Binary

Semantic actions for expressions

ECE Digital System Design & Synthesis Exercise 1 - Logic Values, Data Types & Operators - With Answers

CS 132 Compiler Construction

Compiler Design: Lab 3 Fall 2009

Creating a Shell or Command Interperter Program CSCI411 Lab

Computing Inside The Parser Syntax-Directed Translation, II. Comp 412

Important Project Dates

Topic 6 Basic Back-End Optimization

COSC 6385 Computer Architecture. - Homework

CS2383 Programming Assignment 3

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

CMSC 330: Organization of Programming Languages

Lexical Considerations

Project 4 Due 11:59:59pm Thu, May 10, 2012

15-122: Principles of Imperative Computation, Fall 2015

1 Lexical Considerations

Transcription:

CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm THIS IS NOT A GROUP PROJECT! You may talk about the project and possible solutions in general terms, but must not share code. In this project, you will be asked to write a recursive descent LL(1) parser and code generator for the tinyl language as discussed in class. Your compiler will generate RISC machine instructions called ILOC (Intermediate Language for Optimizing Compilers). You will also write a code optimizer that takes ILOC instructions as input and implements different peephole optimizations. The output of the optimzer is a sequence of ILOC instructions which produces the same results as the original input sequence. To test your generated programs, you can use a virtual machine (simulator) that can run your ILOC programs. The project will require you to manipulate linked lists of instructions. In order to avoid memory leaks, explicit deallocation of eliminated instructions is necessary. This document is not a complete specification of the project. You will encounter important design and implementation issues that need to be addressed in your project solution. Identifying these issues is part of the project. As a result, you need to start early, allowing time for possible revisions of your solution. <program> ::= <stmt list>. <stmt list> ::= <stmt> <morestmts> <morestmts> ::= ; <stmt list> ɛ <stmt> ::= <assign> <print> <assign> ::= <variable> = <expr> <print> ::=! <variable> <expr> ::= + <expr> <expr> <expr> <expr> <expr> <expr> /<expr> <expr> <variable> <digit> <variable> ::= a b c d e f g h i j k x y z <digit> ::= 0 1 2 3 4 5 6 7 8 9 Figure 1: The tinyl language as specified by a context-free grammar 1

1 Background 1.1 The tinyl language tinyl is a simple expression language that allows assignments, and print as its only I/O operation. Every token is a single character of the input. This makes scanning rather easy, but does not allow integer constants of more than one digit, or variable names of more than one character. The language specification in Backus-Naur from is given in Figure. The following are examples of two valid tinyl programs: (1) a=3;b=5;c=/3*ab;d=+c1;!d. (2) a=7;b=-*+1+2a58;!b. 1.2 Target Architecture instr. format description semantics memory instructions loadi c r x load constant value cinto register r x r x c loadai r x, c r y load value of MEM(r x + c) intor y r y MEM(r x + c) storeai r x r y, c store value in r x into MEM(r y + c) MEM(r y + c) r x bit-shift operations lshifti r x, c r z shift contents of registers r x by c positions to the left r z r x << c rshifti r x, c r z shift contents of registers r x by c positions to the right r z r x >> c arithmetic instructions add r x, r y r z add contents of registers r x and r y,and r z r x + r y sub r x, r y r z subtract contents of register r y from register r z r x r y r x, and mult r x, r y r z multiply contents of registers r x and r y,and r z r x r y div r x, r y r z divide contents of registers r x and r y,and r z r x / r y I/O instruction outputai r x, c write value of MEM(r x + c) to standard output print( MEM(r x + c) The target architecture is a RISC machine with 4096 registers. All registers can only store integer values. A RISC architecture is a load/store architecture where arithmetic instructions operate on registers rather than memory operands (memory addresses). This means that for each access to a memory location, a load or store instruction has to be generated. Here is the machine instruction set of our RISC target architecture. You are only allowed to use these ILOC instructions. ILOC instructions are case sensitive. r x, r y,and r z represent three registers. 2

1.3 Code Shape Your compiler should generate code of a specific form with respect to how variables are accessed. All variables accesses use an address that consists of a base pointer and an offset relative to this base pointer. The base pointer address is stored in a special register, in our case r 0. All memory references are therefore of the form MEM(r 0 + offset). This is what the instructions loadi, loadai, storeai and outputai use. All addresses are byte addresses. Your compiler should assign the address 1024 to r 0 at the beginning of the program. For our example language, variable offsets are non-negative byte addresses. Your compiler should map variable a to offset 0, variable b to offset 4, variable c to offset 8, etc. Other mappings are also possible, so we are really talking about code shape here, which is a particular coding style. Your compiler should generate code that does not reuse registers. If you assign a value toaregisterbyaloadi, loadai, add, sub, mult or div instruction, you will always use a fresh, i.e., new register. Your target machine has many registers, so do not worry about running out of registers. For example, this means that the code generated for + 1 1 will generate two loadi instructions with two distinct target registers and one add instruction. This coding style is also called the register-register model, where each computed value gets its own register. In a real compiler, an additional optimization pass maps (virtual) registers to the limited number of physical registers of a machine. This step is typically called register allocation. We do not deal with register allocation here. Our tinyl language does not contain any control flow constructs (e.g.: jumps, if-thenelse, while). This means that every generated instruction in the instruction sequence will be executed. 2 Project Description The project consists of two main parts: 1. Complete the partially implemented recursive descent LL(1) parser that generates ILOC instructions. 2. Write a peephole optimizer for constant folding and operator strength reduction. In addition, you are asked to write the PrintInstructionList routine. The project represents an entire programming environment consisting of a compiler, an optimizer, and a simulator (virtual machine) for ILOC. The ILOC simulator is called sim and will be made available to you as an executable on the ilab machines. This will allow you to check for correctness of your generated and optimized code. 2.1 Compiler The recursive descent LL(1) parser implements a simple code generator. You should follow the main structure of the code as given to you in file Compiler.c. As given to you, the file contains code for function digit, variable, and partial code for function expr. As is, 3

the compiler is able to generate code only for expressions that contain + operations on operands that are digits or the variable f. You will need to add code in the provided stubs to generate correct RISC machine code for the entire program. Do not change the signatures of the recursive functions. Note: The left-hand and right-hand occurrences of variables are treated differently. 2.2 I/O Instruction Utility Within the Optimizer, a sequence of ILOC instructions is represented as a doubly-linked list. You are asked to implement the following utility function in file InstrUtils.c. void PrintInstructionList(FILE *outfile, Instruction *instr); Function PrintInstructionList traverses the instruction list beginning with instruction instr. The list is written into file outfile. The implementation of this function must be based on the utility function void PrintInstruction(FILE *outfile, Instruction *instr); The implementation of the latter function is provided to you in file InstrUtils.c. This is also the file that will contain your implementation of PrintInstructionList 2.3 Peephole Optimization There are two types of peephole optimizations that you will need to implement: constant folding and operator strength reduction. Peephole optimizations look for particular instruction subsequences that can be replaced by a more efficient instruction sequence. The peephole optimizer uses a sliding window of three RISC machine instructions. It looks for code patterns as described below. If no pattern is detected, the window is moved one instruction down the list of instructions. In the case of a successful match and code replacement, the first instruction of the new window is set to the instruction that immediately follows the instructions that have been replaced, i.e., is set to the instruction immediately after the pattern in the unoptimized code. A peephole optimization pass (constfolding or strengthreduct) expects the ILOC input file to be provided at the standard input (stdin), and will write the generated code back to standard output (stdout). This allows the specification of multiple passes of the same optimization or different sequences of optimization passes using the UNIX pipe feature. For example, to apply optimization constfolding twice, followed by a strengthreduction pass strengthreduct with file tinyl.out as input, and file optimized.out as output, we would specify./constfolding < tinyl.out./constfolding./strengthreduct > optimized.out Instructions that are deleted as part of the optimization process have to be explicitly deallocated using the C free command in order to avoid memory leaks. You will implement your two peephole optimization passes in file ConstFolding.c and StrengthReduction.c. 4

2.3.1 Constant folding Patterns are consecutive instructions in your ILOC program. folding is a follows: The pattern for constant loadi c 1 r a loadi c 2 r b OP r a, r b r c where c 1 and c 2 are integer constant, and OP is an arithmetic operation (add, sub, mult). Note: We do not apply this optimization for the division operator. The above pattern can be replaced by the code sequence loadi c 3 r c where c 3 is the integer constant that represents the result of the computation (c 1 OP c 2 ), which is done at compile time. Due to the code shape assumption, register values in r a and r b are not used after the optimized pattern, allowing the two loadi instructions to be deleted. 2.3.2 Operator strength reduction The patterns for this optimization are as follows: loadi c 1 r a OP r b, r a r c where c 1 is an integer constant and a power of 2 (e.g.: 2, 4, 8, 16,...) and OP is the arithmetic operation mult or div. Multiplication and division can be implemented as leftshift or right-shift operations, respectively. For example, loadi 4 r a div r b, r a r c can be replaced by rshifti r b, 2 r c and 5

loadi 4 r a mult r b, r a r c can be replaced by lshifti r b, 2 r c Note: Arithmetic integer operations may result in overflow or underflow conditions due to the finite bit-width of the integer representation. However, this is not a problem for these peephole optimizations since the overflow/underflow would also occur in the unoptimized code. 2.4 ILOC Simulator The virtual machine executes ILOC program. If a outputai <id> instruction is executed, the value of the specified memory location is written to standard output (stdout). All values are of type integer. An ILOC simulator is provided as an executable (sim). The ILOC simulator reports the overall number of executed instructions for a given input program. This allows you to assess the effectiveness of your dead code elimination optimization. You also will be able to check for correctness of your optimization pass. 3 Grading You will submit your versions of files ConstFolding.c, StrengthReduction.c, Compiler.c and InstrUtils.c. You may also submit an optional ReadMe file if you want to communicate something about your code to the grader. No other file should be modified, and no additional file(s) may be used. The electronic submission procedure will be posted later. Do not submit any executables or any of your test cases. Your programs will be graded based mainly on functionality. Functionality will be verified through automatic testing on a set of syntactically correct test cases. No error handing is required. The original project distribution contains some test cases. Note that during grading we will use additional test cases not known to you in advance. The distribution also contains executables of reference solutions for the compiler (compile.sol) and the optimizers (constfolding.sol, strenghtreduct.sol), and the iloc simulator sim. AsimpleMakefile is also provided in the distribution for your convenience. For example, in order to create the compiler, say make compile at the Linux prompt, which will generate the executable compile. The provided, initial compiler is able to parse and generate code for very simple programs consisting of a single assignment statement with right-hand side expresssions of only additions of numbers, followed by a single print statement. You will need to be able to accept and compile the full tinyl language. The Makefile also contains rules to create executables of your optimization passes. 6

4 How To Get Started The code for this project lives on the ilab cluster in directory: www.cs.rutgers.edu/courses/314/classes/fall 2017 kremer/projects/proj1/students Create your own directory on the ilab cluster, and copy the entire provided project proj1.tar to your own home directory or any other one of your directories. Say tar -xf proj1.tar to extract the project files. Make sure that the read, write, and execute permissions for groups and others are disabled (chmod go-rwx <directory name>). IT IS CONSIDERED CHEATING IF YOU DO NOT PROTECT YOUR PROJECT FILES. Say make compile to generate the compiler. To run the compiler on a test case test, say./compile test. This will generate a RISC machine program in file tinyl.out. Tocreate an optimization pass, say make constfolding or make strengthreduct. The distributed versions of the two optimization passes do not work at all, and the compiler can only handle a single example program structure consisting of a single assignment statement followed by a print statemet. An example test case that the provided compiler can handle is given in file tests/test-dummy. To call your optimization pass constfolding on a file that contains RISC machine code, for instance file tinyl.out, say./constfolding < tinyl.out > optimized.out. This will generate a new file optimized.out containing the output of your optimizer. The operators < and > are Linux redirection operators for standard input (stdin) and standard output (stdout), respectively. The same holds for your strenthreduct optimization pass. You may want to use valgrind for memory leak detection. We recommend to use the following flags, in this case to test the optimizer for memory leaks: valgrind --leak-check=full --show-reachable=yes --track-origins=yes./constfolding < tinyl.out To run a program on the ILOC simulator, for instance tinyl.out, say./sim < tinyl.out. Finally, you can define a tinyl language interpreter on a single Linux command line as follows:./compile test;./strengthreduct < tinyl.out > opt.out;./sim < opt.out The ; operator allows you to specify a sequence of Linux commands on a single command line. 5 Questions All questions regarding this project should be posted on piazza (sakai). Enjoy the project! 7