Intermediate Code Generation

Similar documents
Intermediate Code Generation

Intermediate Representations & Symbol Tables

& Simulator. Three-Address Instructions. Three-Address Instructions. Three-Address Instructions. Structure of an Assembly Program.

Code Shape II Expressions & Assignment

(just another operator) Ambiguous scalars or aggregates go into memory

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

CS2210: Compiler Construction. Code Generation

Handling Assignment Comp 412

Intermediate Code Generation

Intermediate Code Generation

Compiler Design. Code Shaping. Hwansoo Han

Intermediate Code Generation

Formal Languages and Compilers Lecture X Intermediate Code Generation

Arrays and Functions

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation Part II

CMPSC 160 Translation of Programming Languages. Three-Address Code

CSCI Compiler Design

Intermediate Code Generation

CSCI565 Compiler Design

Module 27 Switch-case statements and Run-time storage management

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

Alternatives for semantic processing

Intermediate Code Generation

CSc 453 Intermediate Code Generation

Intermediate Representa.on

UNIT IV INTERMEDIATE CODE GENERATION

Time : 1 Hour Max Marks : 30

TACi: Three-Address Code Interpreter (version 1.0)

CS 432 Fall Mike Lam, Professor. Code Generation

CSCI565 Compiler Design

Chapter 6 Intermediate Code Generation

THEORY OF COMPILATION

Intermediate Representations

Dixita Kagathara Page 1

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Syntax-Directed Translation

CS415 Compilers. Intermediate Represeation & Code Generation

Concepts Introduced in Chapter 6

Principles of Compiler Design

Three-address code (TAC) TDT4205 Lecture 16

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Semantic actions for declarations and expressions

Syntactic Directed Translation

intermediate-code Generation

Run Time Environment. Procedure Abstraction. The Procedure as a Control Abstraction. The Procedure as a Control Abstraction

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

Front End. Hwansoo Han

Object Code (Machine Code) Dr. D. M. Akbar Hussain Department of Software Engineering & Media Technology. Three Address Code

Compiler Construction 2009/2010: Intermediate Representation

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

Control Structures. Boolean Expressions. CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV

Syntactic Directed Translation

Intermediate Representations

Intermediate Code Generation

CS 2210 Programming Project (Part IV)

Compilers and computer architecture: A realistic compiler to MIPS

Concepts Introduced in Chapter 6

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Semantic actions for declarations and expressions

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV. Department of Computer Science University of Arizona

Run Time Environment. Activation Records Procedure Linkage Name Translation and Variable Access

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Semantic actions for declarations and expressions. Monday, September 28, 15

Intermediate Representations

Module 25 Control Flow statements and Boolean Expressions

Le L c e t c ur u e e 2 To T p o i p c i s c t o o b e b e co c v o e v r e ed e Variables Operators

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

Homework 1 Answers. CS 322 Compiler Construction Winter Quarter 2006

Project Compiler. CS031 TA Help Session November 28, 2011

LECTURE 3. Compiler Phases

CSE 504: Compiler Design. Intermediate Representations Symbol Table

The Processor Memory Hierarchy

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

1 Lexical Considerations

CSCI Compiler Design

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Intermediate Representation Prof. James L. Frankel Harvard University

Introduction to Programming Using Java (98-388)

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Languages and Compiler Design II IR Code Generation I

CSE 401/M501 Compilers

CSE 504: Compiler Design. Runtime Environments

Theory of Compilation Erez Petrank. Lecture 7: Intermediate Representation Part 2: Backpatching

CS 406/534 Compiler Construction Instruction Scheduling beyond Basic Block and Code Generation

Fall Compiler Principles Lecture 5: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Undergraduate Compilers in a Day

Transcription:

Intermediate Code Generation Basic Approach and Application to Assignment and xpressions Array xpressions Boolean xpressions Copyright 2017, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies of these materials for their personal use.

Structure of a Compiler O(n) O(n) O(n log n) Scanner words Parser IR Analysis & Optimization IR Intermediate Code Generation O(n) IR regs Code Generation Very Fast or Very Hard NP Complete asm k regs A compiler is a lot of fast stuff followed by some hard problems The hard stuff is mostly in Code Generation and Optimization For super-scalars, its Allocation & Scheduling that counts

Intermediate Code Generation IR Parse tree AST Intermediate Code Generation O(n) IR Three-Address Instructions regs Direct Translation Using SDT scheme Parse Tree to Three-Address Instructions Can be done while Parsing in a Single Pass Needs to be able to deal with Syntactic rrors and Recovery Indirect Translation First valate parsing constructing of AST Uses SDT scheme to build AST Traverse the AST and generate Three Address Instructions This Lecture Same but asier

Three-Address Instructions IR High-level Constructs mapped to Three-Address Instructions Register-based IR for xpression valuation Infinite Number of Virtual Registers Still Independent of Target Architecture Parameter Passing Discipline either on Stack or via Registers Addresses and Instructions Symbolic Names are addresses of the corresponding source-level variable. Various constants, such as numeric and offsets (known at compile time) Generic Instruction Format: Label: x = y op z or if exp goto L Statements can have Symbolic Labels Compiler inserts Temporary Variables (any variable with t prefix) Type and Conversions dealt in other Phases of the Code Generation

Assignments: Three-Address Instructions x = y op z (binary operator) x = op y (unary) x = y (copy) x = y[i] and x[i] = y (array indexing assignments) x = phi y z (Static Single Assignment instruction) Memory Operations: x = &y; x = y and x = y; for assignments via pointer variables.

Three-Address Instructions Control Transfer and Function Calls: goto L (unconditional); if (a relop b) goto L (conditional) where relop is a relational operator consistent with the type of the variables a and b; y = call p, n for a function or procedure call instruction to the name or variable p p might be a variable holding a set of possible symbolic names (a function pointer) the value n specifies that before this call there were n putparam instructions to load the values of the arguments. the param x instruction specifies a specific value in reverse order (i.e, the param instruction closest to the call is the first argument value. Later we will talk about parameter passing disciplines (Run-Time nv.)

Function Call xample Source Code Three Address Instructions t1 = a y = p(a, b+1) int p(x,z){ return x+z; } t2 = b + 1 putparam t1 putparam t2 y = call p, 2 p: getparam z getparam x t3 = x + z return t3 return

Function Call xample Source Code Three Address Instructions y = p(a, b+1) int p(x,z){ return x+z; } argument evaluation passing args on the stack getting values from the stack t1 = a t2 = b + 1 putparam t1 putparam t2 y = call p, 2 p: getparam z getparam x t3 = x + z return t3 return

SDT for Three Address Code Generation Attributes for the Non-Terminals, and S Location (in terms of temporary variable) of the value of an expression:.place. If.place is t1 it means that the value of is saved in t1. The Code that valuates the xpressions or Statement:.code Markers for beginning and end of sections of the code S.begin, S.end For simplicity these are symbolic labels. Markers are inherited attributes Semantic Actions in Productions of the Grammar Functions to create temporaries newtemp, and labels newlabel Auxiliary functions to enter symbols and lookup types corresponding to declarations in a symbol table. To generate the code we use the function gen which creates a list of instructions to be emitted later and can generate symbolic labels corresponding to next instruction of a list. Use of append function on lists of instructions. Generate code in post-order traversal of the AST

Assignment Statements S = { p = lookup(.name); if (p!= NULL){ S.code = gen(p =.place); } else } error; S.code = nulllist; } } 1 + 2 {.place = newtemp();.code = append( 1.code, 2.code, gen(.place = 1.place + 2.place); } 1 2 {.place = newtemp();.code = append( 1.code, 2.code, gen(.place = 1.place 2.place); } - 1 {.place = newtemp();.code = append( 1.code,gen(.place = - 1.place)); } ( 1 ) {.place = 1.place;.code = 1.code; } { p = lookup(.name); if (p!= NULL).place = p; else error;.code = nulllist; }

Assignment: xample x = a b + c d - e f; S = + x - a b c d e f

Assignment: xample x = a b + c d - e f; S = + Production: { p = lookup(.name); if (p!= NULL).place = p; else error;.code = null list; } x - a b c d e f

Assignment: xample x = a b + c d - e f; S = + Production: { p = lookup(.name); if (p!= NULL).place = p; else error;.code = null list; } x - place = e code = null place = f code = null a b c d e f

Assignment: xample x = a b + c d - e f; S Production: 1 2 {.place = newtemp();.code = gen(.place = 1.place 2.place);} = + x - place = t1 code = {t1 = e f;} place = e code = null place = f code = null a b c d e f

Assignment: xample x = a b + c d - e f; S Production: 1 2 {.place = newtemp();.code = gen(.place = 1.place 2.place);} = + x place = loc(t2) - code = {t2 = c + d;} place = t1 code = {t1 = e f;} place = loc(c) code = null place = loc(d) code = null place = e code = null place = f code = null a b c d e f

x = a b + c d - e f; Assignment: xample Production: S = { p = lookup(.name); if (p!= NULL).code = append(.code, S code = {t1 = e f; t2 = c d; t3 = t2 - t1; t4 = a b; t5 = t4 + t3; x = t5; } gen(p =.place)); else error; place = t5 } place = x = code = {t1 = e f; t2 = c d; t3 = t2 - t1; t4 = a b; t5 = t4 + t3; } code = null place = t3 + place = t4 code = {t1 = e f; t2 = c d; t3 = t2 - t1; } code = {t4 = a b;} x place = t2 - place = t1 place = a place = b code = {t2 = c + d;} code = {t1 = e + f;} code = null code = null place = c code = null place = d code = null place = e code = null place = f code = null a b c d e f

Assignment: xample x = a b + c d - e f; S = + x - t1 = e f; t2 = c d; t3 = t2 - t1; t4 = a b; t5 = t4 + t3; x = t5; a b c d e f

Reusing Temporary Variables Temporary Variables Short lived Used for valuation of xpressions Clutter the Symbol Table Change the newtemp Function Keep track of when a value created in a temporary is used Use a counter to keep track of the number of active temporaries When a temporary is used in an expression decrement counter When a temporary is generated by newtemp increment counter Initialize counter to zero (0) Alternatively, can be done as a post-processing pass

Assignment: xample x = a b + c d - e f; // c = 0 S t1 = e f; // c = 1 t2 = c d; // c = 2 = t1 = t2 - t1; // c = 1 x + - t2 = a b; // c = 2 t1 = t2 + t1; // c = 1 x = t1; // c = 0 a b c d e f Only 2 Temporary Variables and hence only 2 Registers are Needed

Code Generation for Array Accesses x = A[y,z]; S L = Questions: What is the Base Type of A? What are the Dimensions of A? What is the layout of A? x L list ] Where to Get Answers? Use Symbol Table list, Check Array Layout A [ L L z??? t1 = y 20 t1 = t1 z t2 = basea - 84 t3 = 4 t1 t4 = t2[t3] y X = t4

How does the Compiler Handle A[i,j]? First, we must agree on a Storage Scheme or Layout Row-major order Lay out as a sequence of consecutive rows Rightmost subscript varies fastest A[1,1], A[1,2], A[1,3], A[2,1], A[2,2], A[2,3] Column-major order Lay out as a sequence of columns Leftmost subscript varies fastest A[1,1], A[2,1], A[1,2], A[2,2], A[1,3], A[2,3] Indirection vectors Vector of pointers to pointers to to values Takes much more space, trades indirection for arithmetic Not amenable to analysis (most languages) (Fortran) (Java)

The Concept A Row-major order 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 Laying Out Arrays These have distinct & different cache behavior A 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 Column-major order A 1,1 2,1 1,2 2,2 1,3 2,3 1,4 2,4 Indirection vectors 1,1 1,2 1,3 1,4 A 2,1 2,2 2,3 2,4

Accessing an Array lement How to access A[ i ]? Computing the Address of a 1-D Array lement: basea + ( i low ) x sizeof(basetype(a)) In General: base(a) + ( i low ) x sizeof(basetype(a))

Computing an Array Address Computing the Address of A[ i ]: Computing the Address of a 1-D Array lement: basea + ( i low ) x sizeof(basetype(a)) In General: base(a) + ( i low ) x sizeof(basetype(a)) Almost always a power of 2, known at compile-time and use a shift instruction for speed Defined by the storage class of the array. int A[1:10] where low is 1; Make low 0 for faster access (saves a ) as in the C language

Computing an Array Address A[ i ] basea + ( i low ) x sizeof(basetype(a)) In general: base(a) + ( i low ) x sizeof(basetype(a)) What about A[i 1,i 2 ]? Row-major order, two dimensions basea + (( i 1 low 1 ) x (high 2 low 2 + 1) + i 2 low 2 ) x sizeof(basetype(a)) Column-major order, two dimensions basea + (( i 2 low 2 ) x (high 1 low 1 + 1) + i 1 low 1 ) x sizeof(basetype(a)) Indirection vectors, two dimensions (A[i 1 ])[i 2 ] where A[i 1 ] is, itself, a 1-d array reference This stuff looks expensive! Lots of implicit +, -, x ops

Optimizing Address Calculation for A[i,j] In row-major order where w = sizeof(basetype(a)) basea + (i low 1 )(high 2 low 2 +1) x w + (j low 2 ) x w Which can be factored into basea + i x (high 2 low 2 +1) x w + j x w (low 1 x (high 2 low 2 +1) x w) + (low 2 x w) If low i, high i, and w are known, the last term is a constant Define basea 0 as basea (low 1 x (high 2 low 2 +1)) x w + low 2 x w and len 2 as (high 2 -low 2 +1) Then, the address expression becomes basea 0 + (i x len 2 + j ) x w Compile-time constants

Address Calculation for A[i 1,i 2,,i k ] A[i 1,i 2,,i k ] Addressing generalizes to (( ((i 1 n 2 +i 2 )n 3 +i 3 ) )n k +i k ) x w + +basea - (( ((low 1 n 2 +low 2 )n 3 +low 3 ) )m k +low k ) x w where n j = high j - low j + 1 and w = sizeof(basetype(a)) First term can be computed using the recurrence e 1 = i 1 e m = e m-1 x n m + i m at the end multiply by w and add compile-time constant term Compile-time constants

SDT for Addressing Arrays lements Three Attributes place: just the name or base address of the array offset: the index value into the array ndim: the number of dimensions Use the Recurrence to Compute Offset offset 1 = i 1 offset m = offset m-1 x n m + i m At the end multiply by w = sizeof(basetype(a)) Add the compile-time constant term Keep track of which dimension at each level Use the auxiliary function n m = numlem(a,m) as the number of elements along the m th dimension of A

SDT for Addressing Arrays lements L list ] { L.place = newtemp(); L.offset = newtemp(); code1 = gen(l.place = constterm(list.array)); code2 = gen(l.offset = list.place sizeof(basetype(list.array))); L.code = append(list.code,code1,code2); } list list 1, { t = newtemp(); m = list 1.ndim + 1; x = A[y,z]; code1 = gen(t = list 1.place numlem(list 1.array,m)); code2 = gen(t = t +.place); list.array = list 1.array; list.place = t; list.ndim = m; list.code = append(list 1.code,.code,code1,code2); } S L = L x list list, A [ L L z ] list [ { list.array =.place; list.place =.place; list.ndim = 1; List.code =.code; } y

SDT for Addressing Arrays lements L { if (L.offset = NULL) then.place = L.place; else.place = newtemp;.code = gen(.place = L.place[L.offset]); } S L = L { if L.offset = NULL then.code = gen(l.place =.place); else S.code = append(.code,gen(l.place[l.offset] =.place); } { L.place =.place; L.offset = null; }

SDT for Addressing Arrays lements x = A[y,z]; S A is 10 x 20 array with low 1 = low 2 = 1 sizeof(basetype(a)) = sizeof(int) = 4 bytes L = x list L ] list, A [ L L z y

SDT for Addressing Arrays lements x = A[y,z]; S A is 10 x 20 array with low 1 = low 2 = 1 sizeof(basetype(a)) = sizeof(int) = 4 bytes L.place = x L.offset = null =.place = t4 L.place = t2 x list.place = t1 list.ndim = 2 list.array = A L.offset = t3 ] t1 = y 20 // numlem(a,2) = 20 t1 = t1 + z //.place is z A [ list.place = y list.ndim = 1 list.array = A.place = y,.place = z L.place = z L.offset = null t2 = c // c = base(a) - (20+1)4 t3 = t1 4 // sizeof(int) = 4 t4 = t2[t3] L.place = y L.offset = null z x = t4 y

Array References What about Arrays as Actual Parameters? Whole arrays, as call-by-reference parameters Need dimension information Þ build a dope vector Store the values in the calling sequence Pass the address of the dope vector in the parameter slot Generate complete address at each reference Some improvement is possible Save len i and low i rather than low i and high i Pre-compute the fixed terms in prologue sequence basea low 1 high 1 low 2 high 2

Array References What about Variable-Sized Arrays? Local arrays dimensioned by actual parameters Same set of problems as parameter arrays Requires dope vectors (or equivalent) dope vector at fixed offset in activation record Different access costs for textually similar references This presents a lot of opportunity for a good optimizer Common sub-expressions in the address Contents of dope vector are fixed during each activation Handle them like parameter arrays

SDT Scheme for Boolean xpressions Two Basic Code Generation Flavors Use boolean and, or and not instructions (like arithmetic). Control-flow (or positional code) defines true or false of predicate. Arithmetic valuation Simpler to generate code as just eagerly evaluate the expression. Associate 1 or 0 with outcome of predicates and combine with logic instructions. Use the same SDT scheme explained for arithmetic operations. Control Flow valuation (short circuit evaluation - later) More efficient in many cases. Complications: Need to Know Address to Jump To in Some Cases Solution: Two Additional Attributes nextstat (Inherited) Indicates the next symbolic location to be generated laststat (Synthesized) Indicates the last location filled As code is generated down and up the tree attributes are filled with the correct values

Arithmetic Scheme: Grammar and Actions false.place = newtemp().code = {gen(.place = 0)}.laststat =.nextstat + 1 true.place = newtemp().code = {gen(.place = 1)}.laststat =.nextstat + 1 ( 1 ).place = 1.place;.code = 1.code; 1.nextstat =.nextstat.laststat = 1.laststat not 1.place = newtemp().code = append( 1.code,gen(.place = not 1.place)) 1.nextstat =.nextstat.laststat = 1.laststat + 1

Arithmetic Scheme: Grammar and Actions 1 or 2.place = newtemp().code = append( 1.code, 2.code,gen(.place = 1.place or 2.place) 1.nextstat =.nexstat 2.nextstat = 1.laststat.laststat = 2.laststat + 1 1 and 2.place = newtemp().code = append( 1.code, 2.code,gen(.place = 1.place and 2.place) 1.nextstat =.nexstat 2.nextstat = 1.laststat.laststat = 2.laststat + 1 1 relop 2.place = newtemp().code = gen(if 1.place relop 2.place goto.nextstat+3).code = append(.code,gen(.place = 0)).code = append(.code,gen(goto.nextstat+2)).code = append(.code,gen(.place = 1)).laststat =.nextstat + 4

Boolean xpressions: xample a < b or c < d and e < f relop a < b or relop and relop c < d e < f 00: if a < b goto 03 01: t1 = 0 02: goto 04 03: t1 = 1 04: if c < d goto 07 05: t2 = 0 06: goto 08 07: t2 = 1 08: if e < f goto 11 09: t3 = 0 10: goto 12 11: t3 = 1 12: t4 = t2 and t3 13: t5 = t1 or t4

Summary Intermediate Code Generation Using Syntax-Directed Translation Schemes xpressions and Assignments Array xpressions and Array References Boolean xpressions (Arithmetic Scheme)