Intermediate Code Generation. Intermediate Representations (IR) Case Study : itree. itree Statements and Expressions

Similar documents
Compilation 2013 Translation to IR

Typical Runtime Layout. Tiger Runtime Environments. Example: Nested Functions. Activation Trees. code. Memory Layout

7 Translation to Intermediate Code

Topic 7: Intermediate Representations

Itree Stmts and Exprs. Back-End Code Generation. Summary: IR -> Machine Code. Side-Effects

Announcements. Project 2: released due this sunday! Midterm date is now set: check newsgroup / website. Chapter 7: Translation to IR

Connecting Definition and Use? Tiger Semantic Analysis. Symbol Tables. Symbol Tables (cont d)

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Intermediate Code Generation

Translation. From ASTs to IR trees

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Run-Time Environments

Administration. Today

Translation to Intermediate Code. Chapter 7

IR trees: Statements. IR trees: Expressions. Translating MiniJava. Kinds of expressions. Local variables: Allocate as a temporary t

Compiler Construction 2009/2010: Intermediate Representation

Example. program sort; var a : array[0..10] of integer; procedure readarray; : function partition (y, z :integer) :integer; var i, j,x, v :integer; :

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

(Not Quite) Minijava

CPSC 411, Fall 2010 Midterm Examination

CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia

Intermediate Representations & Symbol Tables

G Programming Languages - Fall 2012

Run Time Environment. Activation Records Procedure Linkage Name Translation and Variable Access

System Software Assignment 1 Runtime Support for Procedures

Languages and Compiler Design II IR Code Generation I

Compilers. 8. Run-time Support. Laszlo Böszörmenyi Compilers Run-time - 1

Compiler Internals. Reminders. Course infrastructure. Registering for the course

Agenda. CSE P 501 Compilers. Big Picture. Compiler Organization. Intermediate Representations. IR for Code Generation. CSE P 501 Au05 N-1

Intermediate Representations

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Semantic Analysis and Type Checking

! Those values must be stored somewhere! Therefore, variables must somehow be bound. ! How?

Programming Language Processor Theory

Compilers and computer architecture: A realistic compiler to MIPS

Semantic actions for expressions

The compilation process is driven by the syntactic structure of the program as discovered by the parser

Module 27 Switch-case statements and Run-time storage management

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Semantic actions for declarations and expressions

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Semantic actions for declarations and expressions. Monday, September 28, 15

Programming Languages and Compiler Design

CSc 453 Intermediate Code Generation

Object Code (Machine Code) Dr. D. M. Akbar Hussain Department of Software Engineering & Media Technology. Three Address Code

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

ECS 142 Project: Code generation hints

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Run Time Environments

CS 2210 Programming Project (Part IV)

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Code Generation & Parameter Passing

Compiler Construction

CSCE 5610: Computer Architecture

Concepts Introduced in Chapter 7

1 Lexical Considerations

Semantic actions for declarations and expressions

Topic 3: Syntax Analysis I

Weeks 6&7: Procedures and Parameter Passing

Compiler Construction

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

Compilation 2014 Activation Records

Practical Malware Analysis

The role of semantic analysis in a compiler

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Run Time Environment. Procedure Abstraction. The Procedure as a Control Abstraction. The Procedure as a Control Abstraction

Code Generation. Lecture 12

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

Lexical Considerations

We can emit stack-machine-style code for expressions via recursion

5. Semantic Analysis!

Winter Compiler Construction T11 Activation records + Introduction to x86 assembly. Today. Tips for PA4. Today:

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Intermediate Representations

Principles of Compiler Design

Recap: Functions as first-class values

Review of Activation Frames. FP of caller Y X Return value A B C

Topic 7: Activation Records

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

Concepts Introduced in Chapter 6

Begin at the beginning

Intermediate Code Generation

COMP-520 GoLite Tutorial

Compilation 2013 Basic Blocks and Traces

Assembly labs start this week. Don t forget to submit your code at the end of your lab section. Download MARS4_5.jar to your lab PC or laptop.

Compilation 2014 Warm-up project

Acknowledgement. CS Compiler Design. Semantic Processing. Alternatives for semantic processing. Intro to Semantic Analysis. V.

CSE 504: Compiler Design. Runtime Environments

Compilers and computer architecture Code-generation (2): register-machines

Concepts Introduced in Chapter 6

CSE 401/M501 Compilers

COMP 303 Computer Architecture Lecture 3. Comp 303 Computer Architecture

Motivation. Compiler. Our ultimate goal: Hack code. Jack code (example) Translate high-level programs into executable code. return; } } return

Code Generation. Lecture 19

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Transcription:

Intermediate Code Generation Translating the abstract syntax into the intermediate representation. report all lexical & syntactic errors report all semantic errors Intermediate Representations (IR) What makes a good IR? --- easy to convert from the absyn; easy to convert into the machine code; must be clear and simple; must support various machineindepent optimizing transformations; Some modern compilers use several IRs ( e.g., k=3 in SML/NJ ) --- each IR in later phase is a little closer (to the machine code) than the previous phase. Absyn ==> IR 1 ==> IR 2... ==> IR k ==> machine code pros : make the compiler cleaner, simpler, and easier to maintain cons : multiple passes of code-traversal --- compilation may be slow Tiger source program lexer & parser absyn semantic analysis correct absyn inter. codegen inter. code compiler back What should Intermediate Representation ( IR) be like? not too low-level (machine indepent) but also not too high-level (so that we can do optimizations) How to convert abstract syntax into IR? The Tiger compiler uses one IR only --- the Intermediate Tree (itree) Absyn => itree frags => assembly => machine code How to design itree? stay in the middle of absyn and assembly! Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 1 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 2 of 25 Case Study : itree Here is one example, defined using ML datatype definition: structure Tree : TREE = struct type label = string type size = int type temp = int datatype stm = SEQ of stm * stm... and exp = BINOP of binop * exp * exp... and test = TEST of relop * exp * exp and binop = FPLUS FMINUS FDIV FMUL PLUS MINUS MUL DIV AND OR LSHIFT RSHIFT ARSHIFT XOR and relop = EQ NE LT GT LE GE ULT ULE UGT UGE FEQ FNE FLT FLE FGT FGE and cvtop = CVTSU CVTSS CVTSF CVTUU CVTUS CVTFS CVTFF itree Statements and Expressions Here is the detail of itree statements stm and itree expressions exp datatype stm = SEQ of stm * stm LABEL of label JUMP of exp CJUMP of test * label * label MOVE of exp * exp EXP of exp and exp = BINOP of binop * exp * exp CVTOP of cvtop * exp * size * size MEM of exp * size TEMP of temp ESEQ of stm * exp NAME of label CONST of int CONSTF of real CALL of exp * exp list Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 3 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 4 of 25

itree Expressions itree expressions stand for the computation of some value, possiblly with sideeffects: CONST(i) the integer constant i CONSTF(x) the real constant x NAME(n) the symbolic constant n (i.e., the assembly lang. label) TEMP(t) content of temporary t ; like registers (unlimited number) BINOP(o,e 1,e 2 ) apply binary operator o to operands e 1 and e 2, here e 1 must be evaluated before e 2 more itree expressions: itree Expressions (cont d) CVTOP(o,e,j,k) converting j-byte operand e to a k-byte value using operator o. MEM(e,k) the contents of k bytes of memory starting at address e. if used as the left child of a MOVE, it means store ; otherwise, it means fetch. (k is often a word) CALL(f,l) a procedure call: the application of function f : the expression f is evaluated first, then the expression list (for arguments) l are evaluated from left to right. ESEQ(s,e) the statement s is evaluated for side effects, then e is evaluated for a result. Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 5 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 6 of 25 itree Statements itree statements performs side-effects and control flow - no return value! SEQ(s 1,s 2 ) statement s 1 followed by s 2 EXP(e) evaluate expression e and discard the result LABEL(n) define n as the current code address (just like a label definition in the assembly language) MOVE(TEMP t, e) evaluate e and move it into temporary t MOVE(MEM(e 1,k), e 2 ) evaluate e 1 to address adr, then evaluate e 2, and store its result into MEM[adr] JUMP(e) jump to the address e; the common case is jumping to a known label l JUMP(NAME(l)) CJUMP(TEST(o,e 1,e 2 ),t,f) conditional jump, first evaluate e 1 and then e 2, do comparison o, if the result is true, jump to label t, otherwise jump the label f itree Fragments How to represent Tiger function declarations inside the itree? representing it as a itree PROC fragment : datatype frag = PROC of {name : Tree.label, function name body : Tree.stm, function body itree frame : Frame.frame} static frame layout DATA of string each itree PROC fragment will be translated into a function definition in the final assembly code The itree DATA fragment is used to denote Tiger string literal. lt will be placed as string constants in the final assembly code. Our job is to convert Absyn into a list of itree Fragments Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 7 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 8 of 25

Example: Absyn => itree Frags function tigermain () : restype = (* added for uniformity!*) let type intarray = array of int var a := intarray [9] of 0 function readarray () =... function writearray () =... function exchange(x : int, y : int) = let var z := a[x] in a[x] := a[y]; a[y] := z function quicksort(m : int, n : int) = let function partition(y : int, z : int) : int = let var i := y var j := z + 1 in (while (i < j) do (i := i+1; while a[i] < a[y] do i := i+1; j := j-1; while a[j] > a[y] do j := j-1; if i < j then exchange(i,j)); exchange(y,j); j) in if n > m then (let var i := partition(m,n) in quicksort(m, i-1); quicksort(i+1, n) ) in readarray(); quicksort(0,8); writearray() Example: Absyn => itree frags The quicksort program (in absyn) is translated to a list of itree frags : PROC(label Tigermain, itree code for Tigermain s body, Tigermain s frame layout info) PROC(label readarray, itree code for readarray s body, readarray s frame layout info) PROC(label writearray, itree code for writearray s body, writearray s frame layout info) PROC(label exchange, itree code for exchange s body, exchange s frame layout info) PROC(label partition, itree code for partition s body, partition s frame layout info) PROC(label quicksort, itree code for quicksort s body, quicksort s frame layout info) DATA(... assembly code for string literal #1... ) DATA(... assembly code for string literal #2... )... Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 9 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 10 of 25 Summary: Absyn => itree Frags Each absyn function declaration is translated into an itree PROC frag TODO: 1. functions are no longer nested -- must figure out the stack frame layout information and the runtime access information for local and non-local variables! 2. must convert function body (Absyn.exp) into itree stm 3. calling conventions for Tiger functions and external C functions (which uses standard convention...) Each string literal or real constant is translated into an itree DATA frag, associated with a assembly code label ---- To reference the constant, just use this label. Future work: translate itree-frags into the assembly code of your favourite machine (PowerPC, or SPARC) Review: Tiger Abstract Syntax exp = VarExp of var NilExp IntExp of int StringExp of string * pos AppExp of {func: Symbol.symbol, args: exp list, pos: pos} OpExp of {left: exp, oper: oper, right: exp, pos: pos} RecordExp of {typ: Symbol.symbol, pos: pos, fields: (Symbol.symbol * exp * pos) list} SeqExp of (exp * pos) list AssignExp of {var: var, exp: exp, pos: pos} IfExp of {test: exp, then : exp, else : exp option, pos: pos} WhileExp of {test: exp, body: exp, pos: pos} ForExp of {var: Symbol.symbol, lo: exp, hi: exp, body: exp, pos: pos} BreakExp of pos LetExp of {decs: dec list, body: exp, pos: pos} ArrayExp of {typ: Symbol.symbol, size: exp, init: exp, pos: pos} dec = VarDec of {var: Symbol.symbol,init: exp, pos : pos, typ: (Symbol.symbol * pos) option} FunctionDec of fundec list TypeDec of {name: Symbol.symbol, ty: ty, pos: pos} list Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 11 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 12 of 25

Mapping Absyn Exp into itree We define the following new generic expression type gexp datatype gexp = Ex of Tree.exp Nx of Tree.stm Cx of Tree.label * Tree.label -> Tree.stm this introduce three new constructors: Ex : Tree.exp -> gexp Nx : Tree.stm -> gexp Cx : (Tree.label * Tree.label -> Tree.stm) -> gexp Each Absyn.exp that computes a value is translated into Tree.exp Each Absyn.exp that returns no value is translated into Tree.stm Each condititional Absyn.exp (which computes a boolean value) is translated into a function Tree.label * Tree.label -> Tree.stm Tiger Expression: a>b c<d would be translated into Cx(fn (t,f) => SEQ(CJUMP(TEST(GT,a,b),t,z), SEQ(LABEL z, CJUMP(TEST(LT,c,d),t,f)))) Mapping Absyn Exp into itree Utility functions for convertion among Ex, Nx, and Cx expressions: unex : gexp -> Tree.exp unnx : gexp -> Tree.stm uncx : gexp -> (Tree.label * Tree.label -> Tree.stm) Examples: fun seq [] = error... seq [a] = a seq(a::r) = SEQ(a,seq r) fun unex(ex e) = e unex(nx s) = T.ESEQ(s, T.CONST 0) unex(cx genstm) = let val r = T.newtemp() val t = T.newlabel and f = T.newlabel() in T.ESEQ(seq[T.MOVE(T.TEMP r, T.CONST 1), genstm(t,f), T.LABEL f, T.MOVE(T.TEMP r, T.CONST 0) T.LABEL t], T.TEMP r) Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 13 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 14 of 25 Simple Variables Define the frame and level type for each function definition: type frame = {formals: int, offlst : int list, locals : int ref, maxargs : int ref} datatype level = LEVEL of {frame : frame, slink_offset : offset, parent : level} * unit ref TOP type access = level * offset The access information for a variable v is a pair (l,k) where l is the level in which v is defined and k is the frame offset. The frame offset can be calculated by the alloclocal function in the Frame structure (which is architecture-depant). The access information will be put in the env in the typechecking phase. Simple Variables (cont d) To access a local variable v at offset k, assuming the frame pointer is fp, just do MEM(BINOP(PLUS, TEMP fp, CONST k),w) To access a non-local variable v inside function f at level l f, assuming v s access is (l g,k); we do the following: MEM(+(CONST k n,mem(+(const k n-1,...mem(+(const k 1,TEMP fp))..)) Strip levels from l f, we use the static link offsets k 1, k 2,... from these levels to construct the tree. When we reach l g, we stop. datatype level = LEVEL of {frame : frame, slink_offset : offset, parent : level} * unit ref TOP use unit ref to test if two levels are the same one. Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 15 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 16 of 25

Array and Record Variables In Pascal, an array variable stands for the contents of the array --- the assignment will do the copying : var a, b : array [1..12] of integer; begin a := b ; In Tiger and ML, an array or record variable just stands for the pointer to the object, not the actual object. The assignment just assigns the pointer. let type intarray = array of int var a := intarray[12] of 0 var b := intarray[12] of 7 in a := b In C, assignment on array variables are illegal! int a[12], b[12], *c; a = b; is illegal! c = a; is legal! Array Subscription If a is an array variable represented as MEM(e), then array subscription a[i] would be (ws is the word size) MEM(BINOP(PLUS,MEM(e),BINOP(MUL,i,CONST ws))) To ensure safety, we must do the array-bounds-checking: if the array bounds are L..H, and the subscript is i; then report runtime errors when either i < L or i > H happens. Array subscription can be either l-values or r-values --- use it properly. Record field selection can be translated in the same way. We calculate the offset of each field at compile time. type point intlist = {hd : int, tl : intlist} the offset for hd and tl is 0 and 4 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 17 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 18 of 25 Record and Array Creation Tiger record creation: var z = foo {f 1 = e 1,..., f n = e n } we can implement this by calling the C malloc function, and then move each e i to the corresponding field of foo. (see Appel pp164) In real compilers, calling malloc is expensive; we often inline the malloc code. Tiger array creation: var z = foo n of initv by calling a C initarray(size,initv) function, which allocates an array of size size with initial value initv. to support array-bounds-checking, we can put the array length in the 0-th field. z[i] is accessed at offset (i+1)*word_sz Requirement: a way to call external C functions inside Tiger. Integer and String Integer : absyn IntExp(i) => itree CONST(i) Arithmetic: absyn OpExp(i) => itree BINOP(i) Strings: every string literal in Tiger or C is the constant address of a segment of memory initialized to the proper characters. During translation from Absyn to itree, we associate a label l for each string literal s: to refer to s, just use NAME l Later, we ll generate assembly instructions that define and initialize this label l and string literal s. String representations: 1. a word containing the length followed by characters (in Tiger) 2. a pointer to a sequence of characters followed by \000 (in C) 3. a fixed length array of characters (in Pascal) Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 19 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 20 of 25

Conditionals Each comparison expression a < b will be translated to a Cx generic expression fn (t,f) => (TEST(LT,a,b),t,f) Given a conditional expression (in absyn) if e 1 then e 2 else e 3 1. translate e 1, e 2, e 3 into itree generic expressions e 1, e 2, e 3 2. apply uncx to e 1, and unex to e 2 and e 3 3. make three labels, then case: t and else case: f and join : j 4. allocate a temporary r, after label t, move e 2 to r, then jump to j; after label f, move e 3 to r, then jump to j 5. apply uncx-ed version of e 1 to label t and f Need to recognize certain special case: (x < 5) & (a > b) it is converted to if x < 5 then a > b else 0 in absyn -------- too many labels if using the above algorithm --- inefficient. (read Appel page 162) Translating while loops: test: if not (condition) goto done... the loop body... goto test done: each round executes one conditional branch plus one jump Translating break statements: Loops just JUMP to done need to pass down the label done when translating the loop body! Translating For loops: (exercise, or see Appel pp 166) for i := lo to hi do body goto test top:... the loop body... test: if (condition) goto top done: each round executes one conditional branch only Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 21 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 22 of 25 Function Calls Inside a function g, the function call f(e 1,e 2,..., e n ) is translated into CALL(NAME l f, [sl, e 1, e 2,..., e n ]) sl is the static link --- it is just a pointer to f s parent level, but how can we find it when we are inside g? striping the level of g one by one, generate the code that follow g s chains of static links until we reach f s parent level. When calling external C functions, what kind of static link do we pass? In the future, we need to decide what is the calling convention ----- where the callee is expecting the formal parameters and the static link? Declarations Variable declaration: need to figure out the offset in the frame, then move the expression on the r.h.s. to the proper slot in the frame. Type declaration: no need to generate any itree code! Function declaration: build the PROC itree fragment Later we translate PROC(name : label, body : stm, frame) to assembly: _global name name:... prologue assembly code for body... epilogue The prologue and epilogue captures the calling sequence, and can be figured out from the frame layout information in frame. Prologue and epilogue are often machine-depant. Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 23 of 25 Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 24 of 25

Generating prologue : Function Declarations 1. psuedo-instructions to announce the beginning of a function 2. a label definiton fo the function name 3. an instruction to adjust the stack pointer (allocating a new frame) 4. store instructions to save callee-save registers and return address 5. store instructions to save arguments and static links Generating epilogue : 1. an instruction to move the return result to a special register 2. load instructions to restore callee-save registers 3. an instruction to reset the stack pointer (pop the frame) 4. a return instruction (jump to the return address) 5. psuedo-instructions to announce the of a function Copyright 1994-2015 Zhong Shao, Yale University Intermediate Code Generation: Page 25 of 25