Compiler construction assignments

Similar documents
Engineering project Compiler Construction Assignments

Compiler construction 2005 lecture 5

IN4305 Engineering project Compiler construction

Compiler construction in4020 lecture 5

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

Compiler construction 2002 week 5

CS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016

x86 assembly CS449 Fall 2017

Yacc: A Syntactic Analysers Generator

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

THEORY OF COMPILATION

Compilers Project 3: Semantic Analyzer

TDDD55 - Compilers and Interpreters Lesson 3

COMPILER CONSTRUCTION Seminar 02 TDDB44

C Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006

Intel assembly language using gcc

Assembly Language: Function Calls

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Assembly Language: Function Calls" Goals of this Lecture"

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

Programming in C First meeting

TDDD55- Compilers and Interpreters Lesson 2

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Practical Malware Analysis

Assembly Language: Function Calls" Goals of this Lecture"

Assembly Language: Function Calls. Goals of this Lecture. Function Call Problems

Lexical and Syntax Analysis

Compilation /15a Lecture 7. Activation Records Noam Rinetzky

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

Final CSE 131B Spring 2004

CS 6353 Compiler Construction Project Assignments

CS 6353 Compiler Construction Project Assignments

Secure Programming Lecture 3: Memory Corruption I (Stack Overflows)

Compiler Lab. Introduction to tools Lex and Yacc

X86 Stack Calling Function POV

x86 assembly CS449 Spring 2016

CS 33: Week 3 Discussion. x86 Assembly (v1.0) Section 1G

Project Compiler. CS031 TA Help Session November 28, 2011

Assignment 11: functions, calling conventions, and the stack

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

CS4120/4121/5120/5121 Spring /6 Programming Assignment 4

LECTURE 11. Semantic Analysis and Yacc

Function Calls COS 217. Reading: Chapter 4 of Programming From the Ground Up (available online from the course Web site)

Introduction to Computer Systems. Exam 1. February 22, This is an open-book exam. Notes are permitted, but not computers.

Digital Forensics Lecture 3 - Reverse Engineering

Compiler construction lecture 1

Compiler Construction LECTURE # 1

Programming in C week 1 meeting Tiina Niklander

Roadmap: Security in the software lifecycle. Memory corruption vulnerabilities

CSC 2400: Computing Systems. X86 Assembly: Function Calls"

Subprograms, Subroutines, and Functions

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

CSE 351: Week 4. Tom Bergan, TA

Compiler Front-End. Compiler Back-End. Specific Examples

ECE573 Introduction to Compilers & Translators

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

QUIZ How do we implement run-time constants and. compile-time constants inside classes?

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Machine Programming 3: Procedures

Stack -- Memory which holds register contents. Will keep the EIP of the next address after the call

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING

CS 4120 and 5120 are really the same course. CS 4121 (5121) is required! Outline CS 4120 / 4121 CS 5120/ = 5 & 0 = 1. Course Information

Compiler construction in4020 course 2006

Introduction to Lex & Yacc. (flex & bison)

Undergraduate Compilers in a Day

CS/SE 153 Concepts of Compiler Design

Introduction to Computer Systems. Exam 1. February 22, Model Solution fp

CPS104 Recitation: Assembly Programming

CIT Week13 Lecture

MPATE-GE 2618: C Programming for Music Technology. Syllabus

CMPE 152 Compiler Design

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29

King Abdulaziz University Faculty of Computing and Information Technology Computer Science Department

An Experience Like No Other. Stack Discipline Aug. 30, 2006

CSC 2400: Computing Systems. X86 Assembly: Function Calls

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Systems I. Machine-Level Programming V: Procedures

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Programming Assignment IV Due Thursday, November 18th, 2010 at 11:59 PM

An Introduction to LEX and YACC. SYSC Programming Languages

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

CMSC 313 Lecture 12 [draft] How C functions pass parameters

X86 Review Process Layout, ISA, etc. CS642: Computer Security. Drew Davidson

Advances in Compilers

Compiler Construction D7011E

15-213/18-213/15-513, Fall 2017 C Programming Lab: Assessing Your C Programming Skills

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only

Lecture 4 CIS 341: COMPILERS

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

18-600: Recitation #4 Exploits (Attack Lab)

15-213/18-213/15-513, Spring 2018 C Programming Lab: Assessing Your C Programming Skills

The role of semantic analysis in a compiler

TDDD55- Compilers and Interpreters Lesson 3

Computer Systems Lecture 9

CSE 374 Programming Concepts & Tools

ASSEMBLY III: PROCEDURES. Jo, Heeseung

Transcription:

Compiler construction assignments for ComputerEngineering students December 12, 2006 1 Introduction This practical course on compiler construction consists of two assignments. We start from an existing compiler for a simple and extensible language, called Asterix. The compiler is written in ANSI-C, and also generates ANSI-C. The compiler has to be modified and extended, as is described in the following section. The main purposes of this course are to get acquainted with modern compiler writing tools, and to develop algorithms used for compiling contemporary language paradigms. We strongly suggest that the assignments be implemented by groups of two persons, because of the amount of work involved. It is not allowed to work with groups of more than two persons. When working as a team you are not allowed to have each member do one complete assignment, because that means that the person doing only the introductory assignment 1 does not learn from the in-depth assignment 2. (The reverse does not hold, since completing assignment 2 means incorporating the knowledge/experience gained during assignment 1, hence, the person doing assignment 2 effectively needs to perform a large fraction of assignment 1 anyway.) Carefully read this document before starting programming. 2 Assignments Both assignments are mandatory. The first is on records, and the second on code generation. 2.1 Assignment 1 The first assignment has been designed to get acquainted with all the aspects of the reference compiler. In this assignment, you are to extend the Asterix language with records (structs in C). A complete description of the changes to the language specification is given below. You must modify the reference compiler such that its implements the extended language. declaration record declaration In addition to variable declarations and subprogram declarations, we introduce record declarations. record declaration record identifier 1 is var declarations end This rule declares a new record with the name identifier 1 in the global scope. It is an error to declare a record other than at the global level. Identifier 1 may not be re-declared. Each record introduces a new scope for its member declarations. These can only be variable declarations. A record has to be defined before its use. It may be used in other records, except for recursive definitions. type identifier 1

A new record type is introduced. Identifier must be declared by a preceding record declaration. A variable of this type is an object storing the members of record identifier. All members of the record have to be initialized with their appropriate values (see Table 1 in the Asterix reference manual). Record types with the same type identifier are equal (name equivalence). Assignment to a record is allowed: all members of the source record (rvalue) are copied to the destination record (lvalue). It is not allowed to test for the equality of two records. expression expression 1. identifier The dot-operator is used to select a member of a record. The precedence level and associatively are the same as for the index operator [] and function call (). expression 1 must be of any record type, and identifier has to be a member of this record. An example program is shown below. include io.inc record foo is var a: int; end record bar is var a: int; b: foo; end function f(x: bar): int is begin x.b.a := x.b.a + 1; return x.b.a; end function main(argv: array of string): int is var h: bar; i: int; begin h.a := 2; h.b.a := 3; i := f(h); WriteString("This should be 4: "); WriteInt(i); WriteString("\n"); WriteString("This should be 3: "); WriteInt(h.b.a); WriteString("\n"); end return 0; The sources for the Asterix compiler can be found in /usr/local/asterix-compiler/src. You should create your own source directory and copy all files in /usr/local/asterix-compiler/src to that direc- 2

tory. Tips: Use the ANSI-C support of structs; they may be passed as parameters and returned as function results. Test your compiler thoroughly; you need it to complete assignment 2. 2.2 Assignment 2 [Fix the bugs, if any, discovered by your instructor during the grading of assignment 1.] In this assignment, you are to change the code generation part of your Asterix compiler developed in assignment 1 to emit IA-32 assembler instead of C code. To keep this task manageable you may use the simple code generation scheme that considers one AST node at a time, see Section 4.2.4 of the Modern Compiler Design book. You are advised to treat the IA-32 assembler as a stack machine (it supports push and pop instructions) to avoid the complexity of register allocation. We will grade your compiler on the correctness of the generated assembly code, not on its efficiency (i.e. speed). It is important to understand the syntax and semantics of the IA-32 assembly language. You can use GCC to obtain this understanding by example: the -S option will cause it to generate assembly. For instance, the command gcc -S cb.c produces the file cb.s. Start by compiling a simple hello world program and move on from there. If you prefer a more gentle introduction to assembly programming, you may consider the following on-line sources: https://idenet.bth.se/servlet/download/news/18144/asm lecture notes.pdf http://linuxassembly.org/ http://www.ipd.bth.se/ska/ The original reference compiler, as well as your extended compiler supporting records, emits C code (i.e., the cb.c file) that uses the standard C, rts, and cblib libraries. You can also take advantage of these libraries by following the calling convention of the C compiler: arguments are passed on the stack (in reverse order, so the first argument lies on top), a function result is passed back in the accumulator register (%eax), and the %eax, %ecx, and %edx registers may be used as scratch registers; all other registers are callee-saved. This calling convention is illustrated by Figure 1, which shows the assembly code as well as the activation records involved in making a call to function foo. The %ebp register (base pointer) keeps track of the current activation record; it is saved on function entry (dynamic link) and restored on exit. 3 General remarks 3.1 Deadlines The assignments should be turned in no later than 08:59 on the dates listed below: Assignment Date 1 November 13 th, 2006 2 January 8 th, 2007 3

%ebp... pushl $3 pushl $5 call foo... 3 5 return address 3 5 return address old %ebp %ebp 3 5 foo: pushl %ebp movl %esp, %ebp... leave ret Figure 1: Various snapshots of the call stack when invoking function foo. For every day that you hand in your assignment late we will subtract 1 point. 3.2 Documentation The assignments should be accompanied by separate documentation, preferably in PDF format. The documentation should reflect on how you handled the assignment; report on your approach and link it to the general theory of compiler construction as provided by the book and lectures. It is important that you are explicit about the particular problems that you encountered; you must clearly describe them, your solution, used algorithms, and so on. Nevertheless, the documentation must be to the point and concise, so limit your writing to around 4 pages. The documentation will also be graded. 3.3 Instructors At the beginning of this practical course, you will receive mail that will give you information about who will be your instructor. He will grade your solutions and provide feedback. Person Email address Room Gertjan Halkes G.P.Halkes@ewi.tudelft.nl HB 09.080 Bas Breijer S.D.Breijer@student.tudelft.nl Peterpaul Klein Haneveld P.T.Kleinhaneveld@ewi.tudelft.nl Leon Planken L.R.Planken@student.tudelft.nl Two instructors will be present at the labs at Drebbelweg to answer questions and assist with practical problems (e.g., debugging code). Take advantage of this service! Day (Sep 20 - Dec 21) Time Rooms Wednesday 8:45-12:45 DW 1-010, DW 1-170 Thursday 8:45-12:45 DW 1-010, DW 1-170 For questions about assignment 1 (records) you can also request the on-line assistance of Panoramix. This Java applet takes an Asterix program and returns example C-code (cb.h & cb.c) produced 4

by an extended version of the reference compiler. Thus, Panoramix will provide you with a flavor of the target code that your compiler could generate. Recently, Panoramix has been upgraded to emit assembly for core Asterix (no records) using a very simplistic code generator to provide a feel for what we expect from assignment 2. Enjoy. 3.4 Grading Grading is done per assignment. According to the following table, points can be obtained for test results, documentation, and source code. The total amounts to a 70 points maximum; 30 points are reserved for the final written exam 1. Assignment Test results Documentation Source code 1 16 7 7 2 26 7 7 3.5 How to submit your work In your own source directory, type make submit which will pack the complete compiler into a file named submit.tgz. Warning: if you not only modified files, but also added new ones, be sure to add them at the appropriate place in the Makefile. Note: you may not modify the cbc script, because our grading tools depend on your compiler behaving nicely. Test if you can unpack and compile your compiler with tar xfz submit.tgz make (Go to some temporary directory before unpacking the compiler.) Upload the submit.tgz file and your documentation (PDF file) to the CPM system: - login to CPM (http://cpm.ewi.tudelft.nl/) using your NetID and password from Blackboard. - select the compiler construction course. - click on the right entry in the overview table, which will pop-up a browse menu for sumbitting a deliverable; you need to do this twice (sources + documentation). - click on the notification button if you want to receive an e-mail once your assignment is graded. Do not submit unless you managed to unpack and compile your own sources. It is useless to turn in your solution far in advance of a deadline; your compiler will be tested and examined (shortly) after the deadline. 3.6 Important announcements Frequently check the blackboard (in4020); announcements and updates to this course will be posted there. 1 You should score at least 11 points for the written exam in order to pass the complete course. 5

3.7 Useful hints Carefully read the specifications of the language (extensions) and make sure you understand every sentence. Stick to the original programming style. Write cleanly, since your code will be thoroughly examined. Before turning in your compiler, test it carefully, both with correct and incorrect programs. Generate correct output for correct input, and give clear errors for incorrect input. Make sure your program does not dump core, even on incorrect input (although the use of assert() is encouraged). Both the compiler and the generated code are ANSI-C. This implies in particular that: you should not generate zero-sized structs; you should not generate zero-sized arrays, or arrays of variable size (use malloc() instead); a block may not end with a label (e.g.: write default:;} instead of default:}); void * is not compatible with a function pointer; cast to the appropriate type; the last field of an enum may not be followed by a comma; you should only use standard library functions that are described by the ANSI-C reference manual; you should not rely on the evaluation order of actual arguments. To test whether the code is ANSI-C conformant, use gcc -ansi -pedantic. 6

A The Asterix Compiler This appendix provides a short introduction to the source code of the Asterix compiler. Figure 2 shows the structure of the compiler, which is in part generated by tools (flex and yacc), and the transformation process from Asterix source to executable. The operator denotes compilation and linking by the C compiler (gcc). Figure 2: Compiler structure. A.1 Overview The design of the compiler is straightforward. It has a lexical analysis, parsing, semantic analysis and code generation phase. The main() function (located in main.c), activates these phases in sequence. int main(int argc, char **argv) { int yyparse(void); init(argc, argv); yyparse(); semantic_analyses(); if (!error_seen) code_generation(); terminate(); } return error_seen? 1 : 0; Since the lexical analysis code is called by the parser, the first relevant call in main() is the one to yyparse(), the parser generated by Bison. The result of the semantic actions during parsing is an AST, containing a list of global declarations, which is stored in the variable global declarations. Next, semantic analyses() (invoking type check declaration list(global declarations)) and code generation() 7

(invoking generate declaration list(global declarations)) are called to perform the semantic analyses and code generation, respectively. These functions start a traversal of the AST by recursively calling other functions to process the branches. The code for the type check and generate functions is stored in a number of.c files, for instance, declaration.c for everything to do with declarations. These files also contain functions to create and delete the structs that make up the AST. A.2 Flex and Bison For the first assignment it is important to understand the way Flex and Bison depend on each other. This is summarized in the following table. Tool Source Generates Depends on Flex lex.l lex.yy.c yylval variable containing yylex() token #define s and yytext Bison grammar.y grammar.tab.c yylex() containing yyparse() and yylval grammar.tab.h containing token #define s The code generated by Bison (yyparse()) calls the yylex() function, which is generated by Flex. Flex also creates the yytext variable, containing the string representation of the last recognized token. In a bottom up parser like Bison, all tokens in a rule are consumed before the rule is reduced. Since we may need the actual text of the token (or the position in the source file), Bison creates the variable yylval, in which the C code in the lex file can store this information. The new token() function in lex.l allocates a structure containing the line number and text of the token. Bison then stores this in the $n parameters, which can be used in the Bison code. A.3 Run time system Building the compiler produces an executable called cbf. With this we can compile an Asterix source file: cbf -I<include directory> <asterix source> This results in two files containing C code: cb.c and cb.h. We can compile these with a normal C compiler like gcc to create an executable file. The compiler also has a run time system, containing code on which cb.c depends. The runtime system consists of the files rts.* and libcb.*. We need to link these files when compiling cb.c. For convenience a script called cbc is available that performs both steps. It compiles an Asterix source file into a.out. cbc <asterix source> A.4 File list Finally, the following table summarizes the purpose of each source file. 8

lex.l lex.yy.c grammar.y grammar.tab.* declaration.* expression.* list.* statement.* type.* scope.* token.* error.* io.* salloc.* bool.h lex.h main.* libcb.* rts.* Flex source file Flex output file Bison source file Bison output files Compiler sources Run time system 9