Formal Languages and Compilers Lecture X Intermediate Code Generation

Similar documents
Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

CMPSC 160 Translation of Programming Languages. Three-Address Code

Intermediate Code Generation

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

Module 27 Switch-case statements and Run-time storage management

Syntax-Directed Translation

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Module 25 Control Flow statements and Boolean Expressions

Formal Languages and Compilers Lecture IX Semantic Analysis: Type Chec. Type Checking & Symbol Table

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

intermediate-code Generation

THEORY OF COMPILATION

Intermediate Code Generation

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Intermediate Code Generation

CS2210: Compiler Construction. Code Generation

CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV. Department of Computer Science University of Arizona

Control Structures. Boolean Expressions. CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV

Dixita Kagathara Page 1

Intermediate Representations

Intermediate Code Generation

Concepts Introduced in Chapter 6

CSCI Compiler Design

UNIT IV INTERMEDIATE CODE GENERATION

Intermediate Code Generation

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

Concepts Introduced in Chapter 6

Module 26 Backpatching and Procedures

Languages and Compiler Design II IR Code Generation I

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Intermediate Code Generation Part II

PART 6 - RUN-TIME ENVIRONMENT. F. Wotawa TU Graz) Compiler Construction Summer term / 309

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Intermediate Code Generation

Formal Languages and Compilers Lecture I: Introduction to Compilers

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 7, vrijdag 2 november 2018

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

Chapter 6 Intermediate Code Generation

Type Checking. Error Checking

Theory of Compilation Erez Petrank. Lecture 7: Intermediate Representation Part 2: Backpatching

Intermediate Code Generation

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

CS322 Languages and Compiler Design II. Spring 2012 Lecture 7

CSc 453 Intermediate Code Generation

Principles of Compiler Design

PRINCIPLES OF COMPILER DESIGN

Generating 3-Address Code from an Attribute Grammar

Syntax-Directed Translation Part II

ECS 142 Project: Code generation hints (2)

Time : 1 Hour Max Marks : 30

1 Lexical Considerations

Compilers. Compiler Construction Tutorial The Front-end

Type Checking. Chapter 6, Section 6.3, 6.5

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Formal Languages and Compilers Lecture VI: Lexical Analysis

Type systems. Static typing

Intermediate-Code Generat ion

Lexical Considerations

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Lexical Considerations

Intermediate Code Generation Part II

2 Intermediate Representation. HIR (high level IR) preserves loop structure and. More of a wizardry rather than science. in a set of source languages

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Intermediate Representa.on

Page No 1 (Please look at the next page )

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Formal Languages and Compilers Lecture V: Parse Trees and Ambiguous Gr

CSE302: Compiler Design

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Semantic Analysis and Intermediate Code Generation

SYNTAX DIRECTED TRANSLATION FOR CONTROL STRUCTURES

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Lecture 1. Types, Expressions, & Variables

Writing Evaluators MIF08. Laure Gonnord

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Intermediate Code Generation

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

TACi: Three-Address Code Interpreter (version 1.0)

Compilerconstructie. najaar Rudy van Vliet kamer 124 Snellius, tel rvvliet(at)liacs.

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

KU Compilerbau - Programming Assignment

Decaf Language Reference Manual

COP5621 Exam 3 - Spring 2005

Formal Languages and Compilers Lecture VII Part 4: Syntactic A

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1

CS /534 Compiler Construction University of Massachusetts Lowell. NOTHING: A Language for Practice Implementation

Semantic Actions and 3-Address Code Generation

Formal Languages and Compilers Lecture IV: Regular Languages and Finite. Finite Automata

Object Code (Machine Code) Dr. D. M. Akbar Hussain Department of Software Engineering & Media Technology. Three Address Code

CSCI565 Compiler Design

Static Checking and Type Systems

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

B.V. Patel Institute of Business Management, Computer & Information Technology, Uka Tarsadia University

Principle of Compilers Lecture IV Part 4: Syntactic Analysis. Alessandro Artale

BASIC ELEMENTS OF A COMPUTER PROGRAM

Transcription:

Formal Languages and Compilers Lecture X Intermediate Code Generation Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal Languages and Compilers BSc course 2017/18 First Semester

Summary of Lecture X Three-Address Code Code for Assignments Boolean Expressions and Flow-of-Control Statements

Intermediate Code Generation An intermediate code is generated as a program for an abstract machine. 1 The intermediate code should be easy to translate into the target program. 2 A machine-independent Code Optimizer can be applied before generating the target code. As intermediate code we consider the three-address code, similar to assembly: sequence of instructions with at most three operands such that: 1 There is at most one operator, in addition to the assignment (we make explicit the operators precedence). 2 The general form is: x := y op z where x,y,z are called addresses, i.e., either identifiers, constants or compiler-generated temporary names. Temporary names must be generated to compute intermediate operations. Addresses are implemented as pointers to their symbol-table entries.

Types of Three-Address Statements Three-Address statements are akin to assembly code: Statements can have labels and there are statements for flow-of-control. 1 Assignment Statements: x := y op z. 2 Unary Assignment Statements: x := op y. 3 Copy Statements: x := y. 4 Unconditional Jump: goto L, with L a label of a statement. 5 Conditional Jump: if x relop y goto L.

Types of Three-Address Statements (Cont.) 6 Procedure Call: param x, and call p,n for calling a procedure, p, with n parameters. With return y the returned value of the procedure is indicated. param x 1 param x 2... param x n call p, n 7 Indexed assignments: x := y[i] or x[i] := y. Note: x[i] denotes the value in the location i memory units beyond the location x. 8 Pointer Assignmets: x := &y, x := *y, or *x := y; where &y stands for the address of y, and *y for the value stored at y.

Summary Three-Address Code Code for Assignments Boolean Expressions and Flow-of-Control Statements

Code for Assignments The following S-attributed definition generates three-address code for assignments ( means string concatenation) in the code attribute. Production Semantic Rules S id := E S.code := E.code gen(id.addr := E.addr) E E 1 + E 2 E.addr := newtemp(); E.code := E 1.code E 2.code gen(e.addr := E 1.addr + E 2.addr) E E 1 E 2 E.addr := newtemp(); E.code := E 1.code E 2.code gen(e.addr := E 1.addr E 2.addr) E E 1 E.addr := newtemp(); E.code := E 1.code gen(e.addr := uminus E 1.addr) E (E 1 ) E.addr := E 1.addr; E.code := E 1.code E id E.addr := id.addr; E.code := E num E.addr := newtemp(); E.addr := num.val; E.code :=

Code for Assignments (Cont.) The synthesized attribute S.code represents the three-address code for the assignment. Expressions have two synthesized attributes: 1 E.addr: Temporary name holding the value of E; 2 E.code: Three-address code for evaluating E. Temporary names are generated for intermediate computations. The function newtemp() generates distinct temporary names t 1, t 2,.... The function gen() generates three-address code such that: 1 Everything quoted is taken literally; 2 The rest is evaluated.

Code for Assignments: An Example Given the assignment a := b c + d the code generated by the above grammar is: S id a E := E t 3 E t 2 + * E t 1 E id t 1 := uminus c t 2 := b t 1 t 3 := t 2 + d a := t 3 id - E d b id c

Assignments and Symbol Tables Instead of using a code attribute we show how to emit a three-address code to an output file. This is always possible if the code of the non-terminal on the left is obtained by concatenating the code attributes of the non-terminals on the right in the same order (perhaps with some additional string in between). Names/addresses stand for pointers to their symbol table entries: other info are needed for the final code generation (in particular, the storage address). Note. Under this assumption Temporary Names must be also entered into the symbol table as they are created by the newtemp() function. The function lookup(id.name) return nil if the entry is not found, otherwise a pointer to the entry is returned. The lookup(id.name) can be easily modified to account for scope: If name does not appear in the current symbol table the enclosing symbol table is checked (see the Lecture on Symbol Table ).

Assignments and Symbol Tables: The Translation Production S id := E E E 1 + E 2 E E 1 E 2 E E 1 E (E 1 ) E id E num Semantic Rules p := lookup(id.name); if p nil then emit(p := E.addr) else error E.addr := newtemp(); emit(e.addr := E 1.addr + E 2.addr) E.addr := newtemp(); emit(e.addr := E 1.addr E 2.addr) E.addr := newtemp(); emit(e.addr := uminus E 1.addr) E.addr := E 1.addr p := lookup(id.name); if p nil then E.addr := p else error E.addr := newtemp(); E.addr := num.val

Type Coercion in Expressions The syntax directed definition for coercion from integer to real for a generic arithmetic operation op is: Production E num E realnum E id E E 1 op E 2 Semantic Rule E.type := int E.type := real E.type := lookup(id.entry) E.type := if E 1.type = int and E 2.type = int then int else if E 1.type = int and E 2.type = real then real else if E 1.type = real and E 2.type = int then real else if E 1.type = real and E 2.type = real then real else type_error

Type Coercion in Expressions (Cont.) Let us consider the semantic action for E E 1 op E 2. Depending on the value of E.type in the E.code we can: Convert the type of either E 1.addr or E 2.addr (e.g., from int to float); Choose between the integer operation versus the floating point operation (e.g., if op is +, then int+ or float+).

Summary Three-Address Code Code for Assignments Boolean Expressions and Flow-of-Control Statements

Boolean Expressions Boolean Expressions are used to either compute logical values or as conditional expressions in flow-of-control statements. We consider Boolean Expressions with the following grammar: E E or E E and E not E (E) E relop E true false There are two methods to evaluate Boolean Expressions 1 Numerical Representation. Encode true with 1 and false with 0 and we proceed analogously to arithmetic expressions. 2 Jumping Code. We represent the value of a Boolean Expression by a position reached in a program.

Numerical Representation of Boolean Expressions Expressions will be evaluated from left to right assuming that: or and and are left-associative, and that or has lowest precedence, then and, and finally not. Example 1. The translation for a or (b and (not c)) is: t 1 := not c t 2 := b and t 1 t 3 := a or t 2 Example 2. A relational expression such as a<b is equivalent to the conditional statement if a<b then 1 else 0. Its translation involves jumps to labeled statements: 100: if a<b goto 103 101: t := 0 102: goto 104 103: t := 1 104:

Numerical Representation: The Translation The following S-Attributed Definition makes use of the global variable nextstat that gives the index of the next three-address code statement and is incremented by emit. Production Semantic Rules E E 1 or E 2 E.addr := newtemp(); emit(e.addr := E 1.addr or E 2.addr) E E 1 and E 2 E.addr := newtemp(); emit(e.addr := E 1.addr and E 2.addr) E not E 1 E.addr := newtemp(); emit(e.addr := not E 1.addr) E (E 1 ) E.addr := E 1.addr E E 1 relop E 2 E.addr := newtemp(); emit( if E 1.addr relop.op E 2.addr goto nextstat + 3); emit(e.addr := 0 ); emit( goto nextstat + 2); emit(e.addr := 1 ) E true E.addr := newtemp(); emit(e.addr := 1 ) E false E.addr := newtemp(); emit(e.addr := 0 )

Jumping Code for Boolean Expressions The value of a Boolean Expression is represented by a position in the code. Consider Example 2: We can tell what value t will have by whether we reach statement 101 or statement 103. Jumping code is extremely useful when Boolean Expressions are in the context of flow-of-control statements. We start by presenting the translation for flow-of-control statements generated by the following grammar: S if E then S if E then S 1 else S 2 while E do S

Flow-of-Control Statements In the translation, we assume that a three-address code statement can have a symbolic label, and that the function newlabel() generates such labels. We associate with E two labels using inherited attributes: 1 E.true, the label to which control flows if E is true; 2 E.false, the label to which control flows if E is false. We associate to S the inherited attribute S.next that represents the label attached to the first statement after the code for S. Note 1. This method of generating symbolic labels can lead to a proliferation of label: The backpatching method (see the Book) creates labels only when needed and emits directly the code. Note 2. To substitute symbolic labels with actual addresses a second pass is needed: backpatching will avoid also the two-translation.

Flow-of-Control Statements (Cont.) The following figures show how the flow-of-control statements are translated.

Flow-of-Control Statements: The Translation Production P S Semantic Rules S.next = newlabel(); P.code := S.code gen(s.next : ) S if E then S 1 E.true := newlabel(); E.false := S.next; S 1.next := S.next; S.code := E.code gen(e.true : ) S 1.code S if E then S 1 else S 2 E.true := newlabel(); E.false := newlabel(); S while E do S 1 S 1.next := S.next; S 2.next := S.next; S.code := E.code gen(e.true : ) S 1.code gen( goto S.next) gen(e.false : ) S 2.code E.begin := newlabel(); E.true := newlabel(); E.false := S.next; S 1.next := E.begin; S.code := gen(e.begin : ) E.code gen(e.true : ) S 1.code gen( goto E.begin)

Jumping Code for Boolean Expressions (Cont.) Boolean Expressions are translated in a sequence of conditional and unconditional jumps to either E.true or E.false. a < b. The code is of the form: if a < b goto E.true goto E.false E 1 ore 2. If E 1 is true then E is true, so E 1.true = E.true. Otherwise, E 2 must be evaluated, so E 1.false is set to the label of the first statement in the code for E 2. E 1 ande 2. Analogous considerations apply. not E 1. We just interchange the true and false with that for E. Note. Both the true and false attributes are inherited and the translation is an L-attributed grammar.

Jumping Code for Boolean Expressions: The Translation Production Semantic Rules E E 1 or E 2 E 1.true := E.true; E 1.false := newlabel(); E 2.true := E.true; E 2.false := E.false; E.code := E 1.code gen(e 1.false : ) E 2.code E E 1 and E 2 E 1.true := newlabel(); E 1.false := E.false; E 2.true := E.true; E 2.false := E.false; E.code := E 1.code gen(e 1.true : ) E 2.code E not E 1 E 1.true := E.false; E 1.false := E.true; E.code := E 1.code E (E 1 ) E 1.true := E.true; E 1.false := E.false; E.code := E 1.code (Follows )

Jumping Code for Boolean Expressions: The Translation Production Semantic Rules E E 1 relop E 2 E.code := E 1.code E 2.code gen( if E 1.addr relop.op E 2.addr goto E.true) gen( goto E.false) E id p = lookup(id.name); if(p.type = bool) then E.code := gen( if p = true goto E.true) gen( goto E.false) else if(p nil) then E.addr = p; E.code = E true E.code := gen( goto E.true) E false E.code := gen( goto E.false)

Flow-of-Control and Boolean Expressions: An Example Example. Translate the following statement: while a < b do if c or d then x := y + z else x := y - z

Mixed-Mode Boolean Expressions Boolean Expressions often contain Arithmetic sub-expressions e.g., (a+b)<c. On the other hand, if true = 1 and false = 0, then (a<b)+(b<a) can be an Arithmetic expression with value 0 if a=b and 1 otherwise. The method of representing Boolean Expressions by Jumping code is still a good option.

Mixed-Mode Boolean Expressions (Cont.) Consider the following Grammar: E E+E E and E E relop E id E+E, produces an arithmetic result, and the arguments can be mixed; E and E, produces a Boolean result, and both arguments must be Boolean; E relop E, produces a Boolean result, and the arguments can be mixed.

Mixed-Mode Boolean Expressions: The Translation To generate code we use a synthesized attribute E.type, that will be either arith or bool. Boolean Expressions will have inherited attributes E.true and E.false useful for the jumping code. Arithmetic Expressions will have the synthesized attribute E.addr standing for the (temporary) variable holding the value of E. The global variable nextstat gives the index of the next three-address code statement and is incremented by gen.

Mixed-Mode Boolean Expressions: The Translation (Cont.) Let just show the semantic rule for E E+E. E E 1 +E 2 E.type := arith; if E 1.type := arith and E 2.type := arith then begin E.addr := newtemp(); E.code := E 1.code E 2.code gen(e.addr := E 1.addr + E 2.addr) end esle if E 1.type := arith and E 2.type := bool then begin E.addr := newtemp(); E 2.true := newlabel(); E 2.false := newlabel(); E.code := E 1.code E 2.code gen(e 2.true : E.addr := E 1.addr + 1) gen( goto nextstat + 1) gen(e 2.false : E.addr := E 1.addr) end else if...

Summary of Lecture X Three-Address Code Code for Assignments Boolean Expressions and Flow-of-Control Statements