CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Similar documents
University of Arizona, Department of Computer Science. CSc 453 Assignment 5 Due 23:59, Dec points. Christian Collberg November 19, 2002

Intermediate Code Generation

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Concepts Introduced in Chapter 6

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

University of Arizona, Department of Computer Science. CSc 520 Assignment 4 Due noon, Wed, Apr 6 15% Christian Collberg March 23, 2005

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Introduction. CSc 453. Compilers and Systems Software. 19 : Code Generation I. Department of Computer Science University of Arizona.

CSc 453. Code Generation I Christian Collberg. Compiler Phases. Code Generation Issues. Slide Compilers and Systems Software.

Concepts Introduced in Chapter 6

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Alternatives for semantic processing

CSc 453. Compilers and Systems Software. 18 : Interpreters. Department of Computer Science University of Arizona. Kinds of Interpreters

CSc 553. Principles of Compilation. 2 : Interpreters. Department of Computer Science University of Arizona

Architectures. Code Generation Issues III. Code Generation Issues II. Machine Architectures I

Principles of Compiler Design

CSc 520 Principles of Programming Languages

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

CS415 Compilers. Intermediate Represeation & Code Generation

CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV. Department of Computer Science University of Arizona

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Intermediate Representations & Symbol Tables

UNIT IV INTERMEDIATE CODE GENERATION

Control Structures. Boolean Expressions. CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

CSc 520 Principles of Programming Languages. 26 : Control Structures Introduction

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Intermediate Representations

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

CS2210: Compiler Construction. Code Generation

Formal Languages and Compilers Lecture X Intermediate Code Generation

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

Semantic actions for declarations and expressions

Semantic actions for declarations and expressions

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

Semantic actions for declarations and expressions. Monday, September 28, 15

CSE 401/M501 Compilers

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Intermediate Representation Prof. James L. Frankel Harvard University

CSc 453 Compilers and Systems Software

Intermediate Representations

CSc 453. Semantic Analysis III. Basic Types... Basic Types. Structured Types: Arrays. Compilers and Systems Software. Christian Collberg

Intermediate Representations

COMPILER CONSTRUCTION Seminar 03 TDDB

Intermediate Representations

Intermediate Code Generation

Chapter 6 Intermediate Code Generation

Intermediate Representations

Intermediate Code Generation

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

tokens parser 1. postfix notation, 2. graphical representation (such as syntax tree or dag), 3. three address code

Compilers. Compiler Construction Tutorial The Front-end

Intermediate Representation (IR)

Dixita Kagathara Page 1

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

Syntax-directed translation. Context-sensitive analysis. What context-sensitive questions might the compiler ask?

LECTURE 17. Expressions and Assignment

& Simulator. Three-Address Instructions. Three-Address Instructions. Three-Address Instructions. Structure of an Assembly Program.

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

Intermediate Representa.on

Principles of Compiler Design

Intermediate Representations

COMPILER DESIGN. Intermediate representations Introduction to code generation. COMP 442/6421 Compiler Design

Languages and Compiler Design II IR Code Generation I

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

SMPL - A Simplified Modeling Language for Mathematical Programming

CSc 520 Principles of Programming Languages

A Simple Syntax-Directed Translator

Intermediate Code Generation

TACi: Three-Address Code Interpreter (version 1.0)

Time : 1 Hour Max Marks : 30

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CSc 453 Interpreters & Interpretation

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit

CSc 520 Principles of Programming Languages

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

CSc 520 Principles of Programming Languages. ameter Passing Reference Parameter. Call-by-Value Parameters. 28 : Control Structures Parameters

Compiler Optimization Intermediate Representation

Software Protection: How to Crack Programs, and Defend Against Cracking Lecture 3: Program Analysis Moscow State University, Spring 2014

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CMPT 379 Compilers. Parse trees

A Fast Review of C Essentials Part I

intermediate-code Generation

Intermediate representation

Object Code (Machine Code) Dr. D. M. Akbar Hussain Department of Software Engineering & Media Technology. Three Address Code

CSc 372 Comparative Programming Languages. 4 : Haskell Basics

Context-sensitive analysis. Semantic Processing. Alternatives for semantic processing. Context-sensitive analysis

6. Intermediate Representation

CSc 453 Intermediate Code Generation

Figure 28.1 Position of the Code generator

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

CSc 520 Principles of Programming Languages

CSc 553. Principles of Compilation. 22 : Code Generation Tree Matching. Department of Computer Science University of Arizona

CST-402(T): Language Processors

Transcription:

CSc 453 Compilers and Systems Software 13 : Intermediate Code I Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2009 Christian Collberg

Introduction

Compiler Phases source Lexer tokens Parser errors errors AST Optimize IR Interm. Code Gen AST Semantic Analyser IR Machine Code Gen We are here! errors asm

Intermediate Representations Some compilers use the AST as the only intermediate representation. Optimizations (code improvements) are performed directly on the AST, and machine code is generated directly from the AST. The AST is OK for machine-independent optimizations, such as inlining (replacing a procedure call with the called procedure s code). The AST is a bit too high-level for machine code generation and machine-dependent optimizations.

Intermediate Representations... For this reason, some compilers generate a lower level (simpler, closer to machine code) representation from the AST. This representation is used during code generation and code optimization.

Intermediate Code Advantages of: 1 Fitting many front-ends to many back-ends, 2 Different development teams for front- and back-end, 3 Debugging is simplified, 4 Portable optimization. Requirements: 1 Architecture independent, 2 Language independent, 3 Easy to generate, 4 Easy to optimize, 5 Easy to produce machine code from.

Mix-and-Match Compilers F R O N T B A C K Ada Pascal Modula 2 C++ INTERMEDIATE REPRESENTATION Sparc Mips 68000 IBM/370 E N D E N D Ada Mips compiler Pascal Mips compiler Pascal 68k compiler

UNCOL A representation which is both architecture and language independent is known as an UNCOL, a Universal Compiler Oriented Language. UNCOL is the holy grail of compiler design many have search for it, but no-one has found it. Problems: 1 Programming language semantics differ from one language to another, 2 Machine architectures differ.

Intermediate Code... There are several different types of intermediate representations: 1 Tree-Based. 2 Graph-Based. 3 Tuple-Based. 4 Linear representations. All representations contain the same information. Some are easier to generate, some are easy to generate simple machine code from, some are easy to generate good code from. IR Intermediate Representation.

Postfix Notation

Postfix Notation assign b + * * Infix: b := (a 2) + (a 2) a 2 a 2 Postfix: b a 2 * a 2 * + := Postfix notation is a parenthesis free notation for arithmetic expression. It is essentially a linearized representation of an abstract syntax tree. In postfix notation an operator appears after its operands. Very simple to generate, very compact, easy to generate straight-forward machine code from, difficult to generate good machine code from.

Tree & DAG Representations

Tree & DAG Representation Trees make good intermediate representations. We can represent the program as a sequence of expression trees. Each assignment, procedure call, or jump becomes one individual tree in the forest. Common Subexpression Elimination (CSE): Even if the same (sub-) expression appears more than once in a procedure, we should only compute its value once, and save the result for future reference. One way of doing this is to build a graph representation, rather than a tree. In the following slides we see how the expression a 2 gets two subtrees in the tree representation and one subtree in the DAG representation.

Tree & DAG Representation... b := (a 2) + (a 2) assign b + * * a 2 a 2 Linearized Tree Nr Op Arg 1 Arg 2 1 ident a 2 int 2 3 mul 1 2 4 ident a 5 int 2 6 mul 4 5 7 add 3 6 8 ident b 9 assign 8 7

Tree & DAG Representation... b := (a 2) + (a 2) assign b + * a 2 Linearized Dag Nr Op Arg 1 Arg 2 1 ident a 2 int 2 3 mul 1 2 4 add 3 3 5 ident b 6 assign 5 4

Sequence of Expression Trees X := 20; WHILE X < 10 DO X := X-1; A[X] := 10; IF X = 4 THEN X := X - 2; ENDIF; ENDDO; Y := X + 5; B2: B4: X Y X := >= goto B3 := X goto EXIT + 20 10 B4 5 B3: B5: B6: := X X := [] 1 10 A X <> X 4 B6 := X X 2 goto EXIT: B2

Building DAGs... Repeatedly add subtrees to build DAG. Only add subtrees not already in DAG. Store subtrees in a hash table. This is the value-number algorithm. OP For every insertion of a subtree, check if (X OP Y ) DAG. X Y PROCEDURE InsertNode ( OP : Operator; L, R : Node) : Node; BEGIN V := hashfunc (OP, L, R); N := HashTab.Lookup (V, OP, L, R); IF N = NullNode THEN N := NewNode (OP, L, R); HashTab.Insert (V, N); END;

Building DAGs Example 3 4 6 2 5 A+B A B + A B 1 3 4 6 2 5 A+B A B (A+B)*C C + A B C * 1 3 4 6 2 5 + A B C * + (A+B)*C+(A+B) A+B A B (A+B)*C C 1

Building DAGs From an expression/expression tree such as the one on the left we might generate the machine code (for some fictitious architecture) on the right: a (b + c) * 1 2 3 a 4 b + 5 c LOAD LOAD ADD LOAD MUL b, r0 c, r1 r0, r1, r2 a, r3 r2, r3, r4 Can we generate better code from a DAG than a tree?

Building DAGs... Example Expression: [(a + b) c + {(a + b) + e} (e + f )] [(a + b) c] * + a b Tree Representation: * + + * * a b c + + + e e f a b c

Building DAGs... [(a + b) c + {(a + b) + e} (e + f )] [(a + b) c] DAG Representation: * + * * + c + + a b e f

Building DAGs... Generating machine code from the tree yields 21 instructions. Code from Tree LOAD a, r0 ; a ADD r2, r0, r4 ; (a + b) + e LOAD b, r1 ; b LOAD f, r0 ; f ADD r0, r1, r2 ; a + b LOAD e, r1 ; e LOAD c, r0 ; c ADD r0, r1, r0 ; f + e MUL r0, r2, r3 ; (a + b) c MUL r4, r0, r4 ; {(a + b) + e} LOAD a, r0 ; a ; (e + f ) LOAD b, r1 ; b ADD r3, r4, r4 ADD r0, r1, r2 ; a + b ; Code for (a + b) c into r3 LOAD e, r0 ; e MUL r4, r3, r0

Building DAGs... Generating machine code from the DAG yields only 12 instructions. Code from DAG LOAD e, r4 ; e LOAD a, r0 ; a ADD r4, r2, r1 ; (a + b) + LOAD b, r1 ; b LOAD f, r0 ; f ADD r0, r1, r2 ; a + b ADD r0, r4, r0 ; f + e LOAD c, r0 ; c MUL r1, r0, r0 MUL r0, r2, r3 ; (a + b) c ADD r0, r3, r0 MUL r0, r3, r0

Tuple Codes

Three-Address Code Another common representation is three-address code. It is akin to assembly code, but uses an infinite number of temporaries (registers) to store the results of operations. There are three common realizations of three-address code: quadruples, triples and indirect triples. Types of 3-Addr Statements: x := y op z Binary arithmetic or logical operation. Example: Mul, And. x := op y Unary arithmetic, conversion, or logical operation. Example: Abs, UnaryMinus, Float. x := y Copy statement.

Three-Address Code... goto L Unconditional jump. if x relop y goto L Conditional jump. relop is one of <,>,<=, etc. If x relop y evaluates to True, then jump to label L. Otherwise continue with the next tuple. param X ; call P, n Make X the next parameter; make a procedure call to P with n parameters. x := y[i] Indexed assignment. Set x to the value in the location i memory units beyond y.

Three-Address Code... x := ADDR(y) Address assignment. Set x to the address of y. x := IND(y) Indirect assignment. Set x to the value stored at the address in y. IND(x) := y Indirect assignment. Set the memory location pointed to by x to the value held by y.

Three-Address Code... Many three-address statements (particularly those for binary arithmetic) consist of one operator and three addresses (identifiers or temporaries): b := (a 2) + (a 2) t 1 := a mul 2 t 2 := a mul 2 t 3 := t 1 add t 2 b := t 3

Three-Address Code... There are several ways of implementing three-address statements. They differ in the amount of space they require, how closely tied they are to the symbol table, and how easily they can be manipulated. During optimization we may want to move the three-address statements around.

Three-Address Code Quadruples Quadruples can be implemented as an array of records with four fields. One field is the operator. The remaining three fields can be pointers to the symbol table nodes for the identifiers. In this case, literals and temporaries must be inserted into the symbol table. b := (a 2) + (a 2) Nr Res Op Arg 1 Arg 2 (1) t 1 mul a 2 (2) t 2 mul a 2 (3) t 3 add t 1 t 2 (4) b assign t 3

Three-Address Code Triples Triples are similar to quadruples, but save some space. Instead of each three-address statement having an explicit result field, we let the statement itself represent the result. We don t have to insert temporaries into the symbol table. b := (a 2) + (a 2) Nr Op Arg 1 Arg 2 (1) mul a 2 (2) mul a 2 (3) add (1) (2) (4) assign b (3)

Three-Address Code Indirect Triples One problem with triples ( The Trouble With Triples? a ) is that they cannot be moved around. We may want to do this during optimization. We can fix this by adding a level of indirection, an array of pointers to the real triples. b := (a 2) + (a 2) Abs Real (1) (10) (2) (11) (3) (12) (4) (13) Nr Op Arg 1 Arg 2 (11) mul a 2 (12) mul a 2 (13) add (11) (12) (14) := b (13) a This is a joke. It refers to the famous Star Trek episode The Trouble With Tribbles. OK, so it s not funny.

Summary

Readings and References Read Louden: Intermediate Code 398 407 Or, read the Dragon book: Postfix notation 33 DAGs & Value Number Alg. 290 293 Intermediate Languages 463 468, 470 473

Summary We use an intermediate representation of the program in order to isolate the back-end from the front-end. A high-level intermediate form makes the compiler retargetable (easily changed to generate code for another machine). It also makes code-generation difficult. A low-level intermediate form make code-generation easy, but our compiler becomes more closely tied to a particular architecture. A basic block is a straight-line piece of code, with no jumps in or out except at the beginning and end.

Homework

Homework I Translate the program below into quadruples, triples, and a sequence of expression trees. PROGRAM P; VAR X : INTEGER; VAR Y : REAL; BEGIN X := 1; Y := 5.5; WHILE X < 10 DO Y := Y + FLOAT(X); X := X + 1; IF Y > 10 THEN Y := Y * 2.2; ENDIF; ENDDO; END.

Homework II Consider the following expression: ((x 4) + y) (y + (4 x)) + (z (4 x)) 1 Show how the value-number algorithm builds a DAG from the expression (remember that + and are commutative). 2 Show the resulting DAG when stored in an array. 3 Translate the expression to postfix form. 4 Translate the expression to indirect triple form.

Homework III Translate the program below into quadruples, triples, and a sequence of expression trees. PROGRAM P; VAR X : INTEGER; VAR Y : REAL; BEGIN X := 1; Y := 5.5; WHILE X < 10 DO Y := Y + FLOAT(X); X := X + 1; IF Y > 10 THEN Y := Y * 2.2; ENDIF; ENDDO; END.

Tree Code Example

record ProcBegin (Name,Level,VarSize,FormalSize,LineNo) The beginning of each procedure is marked by a ProcBegin instruction. Attributes: Name is the name of the procedure. Level is the declaration level of the procedure. VarSize is the amount of local data (in bytes) required for the procedure. FormalSize is the size of the actual parameters passed to the procedure. Children: None. record ProcEnd (Name,Level,VarSize,FormalSize,LineNo) Same attributes as ProcBegin

record VarDecl (Name,Type,Level,Offset,Size,LineNo) Each global variable is given an explicit declaration in the intermediate code. Attributes: Type is one of the strings "INT", "CHAR", "STRING", "REAL", or "STRUCT". Offset gives the relative address of the variable Children: None. record Store (Type,Des,Expr,LineNo) Store instructions are generated from assignment statements. Attributes: Type is the type. Children: Des is the subtree which computes the address (L-value) at which the new value should be stored. Expr is a subtree computing the R-value.

record VarRef (Type,Name,Level,LineNo) VarRef nodes are used to refer to the address of a global or local variable. Attributes: Level is the static nesting level at which the variable was declared. Children: None. record Literal (Type,Value,LineNo) Literal nodes represent integer, real, character, or string constants. Attributes: Type is as for VarDecl, but there are no "STRUCT" constants. Children: None.

record BinExpr (Op,Type,Left,Right,LineNo) BinExpr instructions represent binary arithmetic. Attributes: Type is either "INT" or "REAL". Op is one of "+", "-", "*", "/", or "%". Children: Left and Right are subtrees holding the left and right hand sides of the arithmetic operator, respectively. record Load (Type,Des,LineNo) Load instructions represent the loading of an R-value from an L-value. Attributes: Type is the type of the loaded value. Children: Des is the subtree which computes the address (L-value) whose R-value should be loaded.

record Branch (Op,Type,Left,Right,Label,LineNo) Conditional branch, equivalent to IF Left Op Right THEN GOTO Label. Attributes: Type is "INT", "CHAR", or "REAL". Op is one of "=", "<", ">", "#", "<=", ">=". Label is the number of the label to which we should jump. Children: Left and Right are expression subtrees computing the arguments to the comparison operator. record Goto (Label,LineNo) Unconditional branch. Attributes: Label is the label number to which we should jump. record Label (Number,LineNo) A program location to which a Branch or Goto may jump. Attributes: Number is the label number.

record ProcCall (Name, Actuals, Level, LineNo) Procedure call. Attributes: Name is the name of the procedure. Level is the level at which the procedure is declared. Children: Actuals is the list of actual parameters. record Field (BaseAddress,Name,Offset,LineNo) Compute the address of a field within a record variable, i.e. BaseAddress + Offset. Attributes: Offset is the offset of the field within the record variable. Children: BaseAddress is a subtree computing the address of the record variable.

record Index (BaseAddress,IndexExpr,ElementSize,ElementCou Compute the address of an array element, i.e. BaseAddress + IndexExpr * ElementSize. If 0 > IndexExpr ElementCount then a run-time error message should be generated. Attributes: ElementSize is the size of the array elements. ElementCount is the length of the array. Children: BaseAddress and IndexExpr are subtrees computing the address of the beginning of the array the number of the indexed array element.

record Actual (Type,Number,Offset,Expr,LineNo) Pass the value of Expr as argument to a procedure call. Attributes: If Type="STRUCT" then the formal is passed by reference, otherwise by value. Number is the argument number, Offset is the relative address within the activation record. Children: Expr is the subtree computing the value (or address) of the actual. record FormalRef(Type,Name,Level,Offset,Number,LineNo) Reference to the value of a formal parameter. Similar to VarRef. Attributes: Level is the static nesting level at which the formal is declared. If Type="STRUCT" then the formal is passed by reference, otherwise by value. Children: None.

PROGRAM P; TYPE R=RECORD[a;CHAR;b:INTEGER]; TYPE A=ARRAY 100 OF R; VAR X:INTEGER; VAR Y:A; BEGIN WHILE X<10 DO X := Y[55].b + 66; ENDDO; END. (1) (2) (3) VarDecl Name=X Level=0 Type=INT Offset=0 Size=4 VarDecl Name=Y Level=0 Type=STRUCT Offset=8 Size=600 ProcBegin Name=_main Level= 1 VarSize=608 FormalSize=0 (4) Label Number=4 (5) Branch Op=< Type=INT Label=8 Left Right Load Literal Type=INT Type=INT Des Value=10 VarRef Name=X Type=INT Level=0 Offset=0

(6) Goto Label=7 (9) Goto Label=4 (7) Label Number=8 (10) Label Number=7 (8) Store Type=INT Des Expr (11) ProcEnd Name=_main Level= 1 VarSize=608 FormalSize=0 VarRef Name=X Type=INT Level=0 Offset=0 BinExpr Op=+ Type=INT Left Right

(8) Store Type=INT Des Expr VarRef Name=X Type=INT Level=0 Offset=0 Load Type=INT Des Op=+ Left BinExpr Type=INT Right Literal Type=INT Value=66 Index Name=b ElementSize=6 ElementCount=100 BaseAddr Field Offset=2 BaseAddress IdxExpr VarRef Name=Y Type=STRUCT Level=0 Offset=8 Literal Type=INT Value=55

(1) [VarDecl]: Name= X :Level=0:Type=INT:Offset=0:Size=4 (2) [VarDecl]: Name= Y :Level=0:Type=STRUCT:Offset=8:Size=6 (3) [ProcBegin]: Name= _main :Level=-1:VarSize=608:FormalSi (4) [Label]: Number=4 (5) [Branch]: Op= < :Type=INT:Label=8 Left= [Load]: Type=INT Des= [VarRef]: Name= X :Type=INT:Level=0:Offs Right= [Literal]: Type=INT:Value= 10 (6) [Goto]: Label=7 (7) [Label]: Number=8

(8) [Store]: Type=INT Des= [VarRef]: Name= X :Type=INT:Level=0:Offset=0 Expr= [BinExpr]: Op= + :Type=INT Left= [SEE NEXT SLIDE] Right= [Literal]: Type=INT:Value= 66 (9) [Goto]: Label=4 (10 [Label]: Number=7 (11 [ProcEnd]: Name= _main :Level=-1:VarSize=608:FormalSize

Expr= [BinExpr]: Op= + :Type=INT Left= [Load]: Type=INT Des= [Field]: Name= b :Offset=2 BaseAddress= [Index]: ElementSize= 6 :ElementCount=10 BaseAddress= [VarRef]: Name= Y :Type=STRUCT: Level=0:Offset=8 IndexExpr= [Literal]: Type=INT:Value= 55