CMPSC 160 Translation of Programming Languages. Three-Address Code

Similar documents
Intermediate Code Generation

Formal Languages and Compilers Lecture X Intermediate Code Generation

Module 25 Control Flow statements and Boolean Expressions

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

Module 27 Switch-case statements and Run-time storage management

Module 26 Backpatching and Procedures

CSCI Compiler Design

Syntax-Directed Translation

Time : 1 Hour Max Marks : 30

CS2210: Compiler Construction. Code Generation

CS322 Languages and Compiler Design II. Spring 2012 Lecture 7

Intermediate Representations

CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV. Department of Computer Science University of Arizona

Dixita Kagathara Page 1

2 Intermediate Representation. HIR (high level IR) preserves loop structure and. More of a wizardry rather than science. in a set of source languages

THEORY OF COMPILATION

Intermediate Code Generation

Intermediate Code Generation Part II

Control Structures. Boolean Expressions. CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV

Intermediate Code Generation

Intermediate Code Generation

Theory of Compilation Erez Petrank. Lecture 7: Intermediate Representation Part 2: Backpatching

Principles of Programming Languages

Intermediate Code Generation

Intermediate Code Generation

PART 6 - RUN-TIME ENVIRONMENT. F. Wotawa TU Graz) Compiler Construction Summer term / 309

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

CSc 453 Intermediate Code Generation

intermediate-code Generation

UNIT IV INTERMEDIATE CODE GENERATION

Intermediate Code Generation

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Computer Components. Software{ User Programs. Operating System. Hardware

Intermediate Representa.on

Intermediate Code Generation

Chapter 6 Intermediate Code Generation

Intermediate Code Generation

Homework 1 Answers. CS 322 Compiler Construction Winter Quarter 2006

TACi: Three-Address Code Interpreter (version 1.0)

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

ECS 142 Project: Code generation hints (2)

Compilerconstructie. najaar Rudy van Vliet kamer 140 Snellius, tel rvvliet(at)liacs(dot)nl. college 7, vrijdag 2 november 2018

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Intermediate Representations

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

PRINCIPLES OF COMPILER DESIGN

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Languages and Compiler Design II IR Code Generation I

Compiling Techniques

The analysis part breaks up the source program into constituent pieces and creates an intermediate representation of the source program.

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

COP5621 Exam 3 - Spring 2005

Concepts Introduced in Chapter 6

Lecture 15 CIS 341: COMPILERS

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

LECTURE 17. Expressions and Assignment

Java Primer 1: Types, Classes and Operators

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

Computer Components. Software{ User Programs. Operating System. Hardware

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

Code Generation. Lecture 30

SYNTAX DIRECTED TRANSLATION FOR CONTROL STRUCTURES

COMP 181. Prelude. Intermediate representations. Today. High-level IR. Types of IRs. Intermediate representations and code generation

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

Compilers. Compiler Construction Tutorial The Front-end

Introduction to Programming Using Java (98-388)

VIVA QUESTIONS WITH ANSWERS

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

CA4003 Compiler Construction Assignment Language Definition

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

COMPUTER ORGANIZATION & ARCHITECTURE

o Counter and sentinel controlled loops o Formatting output o Type casting o Top-down, stepwise refinement

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

CS 432 Fall Mike Lam, Professor. Code Generation

Concepts Introduced in Chapter 6

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Intermediate Representations

Overview (4) CPE 101 mod/reusing slides from a UW course. Assignment Statement: Review. Why Study Expressions? D-1

Announcements. Lab Friday, 1-2:30 and 3-4:30 in Boot your laptop and start Forte, if you brought your laptop

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Intermediate Representations

3. Java - Language Constructs I

Intermediate Representation (IR)

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

CSE302: Compiler Design

Boolean Expressions and if 9/14/2007

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

CS313D: ADVANCED PROGRAMMING LANGUAGE

Operators and Expressions

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

A Simple Syntax-Directed Translator

Transcription:

CMPSC 160 Translation of Programming Languages Lectures 16: Code Generation: Three- Address Code Three-Address Code Each instruction can have at most three operands Each operand corresponds to a memory address or a temporary We have to break large statements into little operations that use temporary variables x=(2+3)+4 turns into to t1=2+3; x=t1+4; Temporary variables store the results at the internal nodes in the AST 1

Three-Address Code Assignments x := y x := y op z op: binary arithmetic or logical operators x := op y op: unary operators (unary minus, negation, integer to float conversion) Branch goto L Execute the statement with labeled L next Conditional Branch if x relop y goto L relop: <, >, =, <=, >=, ==,!= if the condition holds we execute statement labeled L next if the condition does not hold we execute the statement following this statement next Three-Address Code if (x < y) x = 5*y + 5*y/3; else y = 5; x = x + y; Temporaries: temporaries correspond to the internal nodes of the syntax tree Variables can be represented with their locations in the symbol table if x < y goto L1 goto L2 L1: t1 := 5 * y t2 := 5 * y t3 := t2 / 3 x := t1 + t3 goto L3 L2: y := 5 L3: x := x + y Three-address code instructions can be represented as an array of quadruples: operation, argument1, argument2, result triples: operation, argument1, argument2 (each triple implicitly corresponds to a temporary) 2

Three-Address Code vs. Stack-Based Code Three-Address Code: Good: Compact representation Good: Statement is self contained in that it has the inputs, outputs, and operation all in one instruction Bad: Requires lots of temporary variables Bad: Temporary variables have to be handled explicitly Stack-Based Code: Good: No temporaries, everything is kept on the stack Good: It is easy to generate code for this Bad: Requires more instructions to do the same thing Three-Address Code Attributes: E.place: location that holds the value of expression E (this is the temporary variable that will hold the value of the expression) E.code: sequence of instructions that are generated for E Procedures: newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table denotes concatenation Productions S id := E E E 1 + E 2 E E 1 * E 2 E ( E 1 ) E - E 1 E id E num 3

Three-Address Code Attributes: E.place: location that holds the value of expression E (this is the temporary variable that will hold the value of the expression) E.code: sequence of instructions that are generated for E Procedures: newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table Productions Semantic Rules S id := E id.place lookup(id.name); S.code E.code gen(id.place := E.place); E E 1 + E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place + E 2.place); E E 1 * E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place * E 2.place); E ( E 1 ) E.code E 1.code; E.place E 1.place; E - E 1 E.place newtemp(); E.code E 1.code gen(e.place := uminus E 1.place); E id E.place lookup(id.name); E.code (empty string) E num E.place newtemp(); E.code gen(e.place := num.value); Example x := ( y + z ) * a S.code = t1 := y + z t2 := t1 * a x := t2 S E.place = t2 E.code = t1 := y + z t2 := t1 * a id := E E.place = t1 E.code = t1 := y + z E * E E.place = a E.code = E.place = t1 E.code = t1 := y + z ( E ) id E.place = y E.code = id E + E id E.place = z E.code = 4

Code Generation for Boolean Expressions Two approaches Numerical representation Implicit representation Numerical representation Use 1 to represent true, use 0 to represent false For three-address code store this result in a temporary For stack machine code store this result in the stack Implicit representation For the boolean expressions which are used in flow-of-control statements (such as if-statements, while-statements etc.) boolean expressions do not have to explicitly compute a value, they just need to branch to the right instruction Generate code for boolean expressions which branch to the appropriate instruction based on the result of the boolean expression Boolean Expressions: Numerical Representation, Three-Address Code Attributes : E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E relop.op: operator can be ==,!=, <=, >=, >, < Global variable: nextstat: Returns the location of the next instruction to be generated (each call to gen() increments nextstat by 1) Productions Semantic Rules E E 1 relop E 2 E.place newtemp(); E.code E 1.code E 2.code gen( if E 1.place relop.op E 2.place goto nextstat+3); gen(e.place := 0 ) gen( goto nextstat+2) gen(e.place := 1 ); E E 1 and E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place and E 2.place); E E 1 or E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place or E 2.place); 5

Boolean Expressions: Numerical Representation, Three-Address Code (continued) Attributes : E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E Global variable: nextstat: Returns the location of the next instruction to be generated (each call to gen() increments nextstat by 1) Productions Semantic Rules boolean negation E not E 1 E true E false E.place newtemp(); E.code E 1.code gen(e.place := negate E 1.place); E.place newtemp(); E.code gen(e.place := 1 ); E.place newtemp(); E.code gen(e.place := 0 ); Numerical Representation of Boolean Expressions Input boolean expression: x < y and a == b Three address code: Instructions 100-103 are for x < y Instructions 104-107 are for a == b 100 if x < y goto 103 101 t1 := 0 102 goto 104 103 t1 := 1 104 if a = b goto 107 105 t2 := 0 106 goto 108 107 t2 := 1 108 t3 := t1 and t2 These are the locations of the instructions, they are not labels. We could generate code using labels too Stack machine code: Instructions 100-105 are for x < y Instructions 106-111 are for a == b 100 load x 101 load y 102 if_cmplt 105 103 push 0 104 goto 106 105 push 1 106 load a 107 load b 108 if_cmpeq 111 109 push 0 110 goto 112 111 push 1 112 and 6

Boolean Expressions: Implicit Representation, Three-Address Code Attributes : E.code: sequence of instructions that are generated for E E.false: instruction to branch to if E evaluates to false E.true: instruction to branch to if E evaluates to true (E.code is synthesized whereas E.true and E.false are inherited) id.place: location for id Productions Semantic Rules E E 1 relop E 2 E.code E 1.code E 2.code gen( if E 1.place relop.op E 2.place goto E.true) gen( goto E.false); E E 1 and E 2 E 1.true newlabel(); E 1.false E. false; (short-circuiting) This generated label will be inserted to the place for E 1.true in the code generated for E 1 E 2.true E. true; E 2.false E. false; E.code E 1.code gen(e 1.true : ) E 2.code ; These places will be filled with labels later on when they become available Boolean Expressions: Implicit Representation, Three-Address Code (continued) Attributes : E.code: sequence of instructions that are generated for E E.false: instruction to branch to if E evaluates to false E.true: instruction to branch to if E evaluates to true id.place: location for id Productions Semantic Rules E E 1 or E 2 E 1.true E.true; (short-circuiting) E 1.false newlabel(); E 2.true E. true; E 2.false E. false; E.code E 1.code gen(e 1.false : ) E 2.code ; E not E 1 E 1.true E.false; E 1.false E. true; E.code E 1.code; E true gen(`goto E.true); E false gen(`goto E.false); 7

Three-Address Code Example These are the locations of three-address code instructions, they are not labels Input boolean expression: x < y and a == b These labels will be generated later on, and will be inserted to the corresponding places Numerical representation: 100 if x < y goto 103 101 t1 := 0 102 goto 104 103 t1 := 1 104 if a = b goto 107 105 t2 := 0 106 goto 108 107 t2 := 1 108 t3 := t1 and t2 Implicit representation: if x < y goto L1 goto LFalse L1: if a = b goto LTrue goto LFalse... LTrue: LFalse: Three-Address Code, Implicit Representation false attribute will be inserted here Input: x < y and a == b true = false= E code = if x < y goto L1 goto L1:if a = b goto goto true = L1 false= and code = if x < y goto L1 goto E true = false= true attribute will be inserted here code = if a = b goto goto E E relop code = op = < E code = relop E code = op = == E code = place = x id name = x place = y id name = y place = a id name = a place = b id name = b 8

Flow-of-Control Statements If-then-else Branch based on the result of boolean test expression test expression Y N then-block else-block Next block Flow-of-Control Statements: Code Structure We have to decide on the code layout for the code for flow-of-control S if E then S 1 else S 2 E.code to E.true to E.false Labels E.true: S 1.code if E evaluates to true goto S.next E.false: S 2.code if E evaluates to false S.next: 9

Flow-of-Control Statements, Three-Address Code, Assuming Implicit Representation for Boolean Expressions Attributes : S.code: sequence of instructions that are generated for S S.next: label of the instruction that will be executed immediately after S (S.next is an inherited attribute) Productions Semantic Rules ASSUMPTION: the code generated S if E then S 1 else S 2 E.true newlabel(); for boolean expression E will branch E.false newlabel(); to E.true or E.false label based on the S 1.next S. next; result of the expression S 2.next S. next; S.code E.code gen(e.true : ) S 1.code gen( goto S.next) gen(e.false : ) S 2.code ; Flow-of-Control Statements Loops Evaluate condition before loop (if needed) Evaluate condition after loop Branch back to the top if condition holds Pre-test The layout shown here merges loop test with last block of loop body Loop body While, for, do, and until all fit this basic model You can use different layouts for example, put the test before the loop body and branch back to it at the end of the loop body Post-test Next block 10

Flow-of-Control Statements: Code Structure Two different layouts for while statements: The algorithms I give in the following slides use this layout: S while E do S 1 This layout places E.code after S 1.code : S while E do S 1 S.begin: E.true: E.false: E.code S 1.code goto S.begin to E.true to E.false E.true: E.begin: E.false: goto E.begin S 1.code E.code to E.true to E.false Flow-of-Control Statements, Three-Address Code, Assuming Implicit Representation for Boolean Expressions Attributes : S.code: sequence of instructions that are generated for S S.next: label of the instruction that will be executed immediately after S (S.next is an inherited attribute) Productions Semantic Rules S while E do S 1 S.begin newlabel(); E.true newlabel(); E.false S. next; S 1.next S. begin; S.code gen(s.begin : ) E.code gen(e.true : ) S 1.code gen( goto S.begin); S S 1 ; S 2 S 1.next newlabel(); S 2.next S.next; S.code S 1.code gen(s 1.next : ) S 2.code 11

Example Input code fragment: while (a < b) { if (c < d) x = y + z; else x = y z } L1: if a < b goto L2 goto LNext L2: if c < d goto L3 goto L4 L3: t1 := y + z x := t1 goto L1 L4: t2 := y z x := t2 goto L1 LNext:... E.true for a < b E.false for a < b E.true for c < d E.false for c < d Backpatching E.true, E.false, S.next may not be computed in a single pass (they are inherited attributes) Backpatching is a technique for generating labels for E.true, E.false, S.next and inserting them to the appropriate locations Basic idea Keep lists E.truelist, E.falselist, S.nextlist E.truelist (E.falselist): the list of instructions where the label for E.true (E.false) have to be inserted when it becomes available S.nextlist: the list of instructions where the label for S.next have to be inserted when it becomes available When labels E.true, E.false, S.next are computed these labels are inserted to the instructions in these lists 12

x86 C with Static Linking Compile time Run time Executable x86 Assembly x86 Assembly x86 x86 Assembler x86 Object File (.o) x86 Assembler x86 Object File (.o) Libraries At compile time, all of the code is transformed into x86 object files and then the object files are linked together with the libraries to create a statically linked executable. This executable then runs directly on the x86 hardware. Linker x86 Executable One drawback, is if you wanted to run the same code on a Power-PC (even with the same OS) you have to recompile everything. x86 C with Dynamic Linking x86 Assembly x86 Assembler x86 Assembly x86 Assembler Compile time Run time Not-fully Linked x86 Executable Loader Final Executable loaded in Memory x86 Dynamic Libraries x86 Object File (.o) The linker just links together the.o files, and does not link calls to dynamically loaded libraries (DLLs in Windows or Shared Libraries in Unix) x86 Object File (.o) Linker Not-fully Linked x86 Executable This is similar to before, but now the final linking occurs when the program is loaded (or even during program execution) Here, libraries can be shared and they can be updated across the whole system without re-linking every single executable 13

Java Compile time Java Bytecode (class) Run time Java Bytecode (class) Java Classes Java Assembly Java Assembler Java Bytecode (class) Java Assembly Java Assembler Java Bytecode (class) Here the compiler targets java bytecode and the bytecode is then run on top of the Java Virtual Machine (JVM). The JVM really just interprets (simulates) the bytecode like any scripting language. Because of this, any java program compiled to bytecode is portable to any machine that someone has already ported the JVM too. No need to recompile. Java Virtual Machine (bytecode interpreter) x86 Java with Just-in-Time (JIT) Compilation Compile time Java Bytecode (class) Run time Java Bytecode (class) Java Classes Java Assembly Java Assembly Java Assembler Java Assembler Java Bytecode (class) Java Bytecode (class) JIT Here the compiler targets java bytecode (which is what we do in this class) and the bytecode is then run on top of the Java Virtual Machine (JVM). The JVM really just interprets (simulates) the bytecode like any scripting language. Because of this, any java program compiled to bytecode is portable to any machine that someone has already ported the JVM too. No need to recompile. Java Virtual Machine (bytecode interpreter) x86 On Call to Compiled Method On Call or Return back to uncompiled code X86 native code compiled from Java class 14