Three-address code (TAC) TDT4205 Lecture 16

Similar documents
Intermediate Code Generation

COMPILER CONSTRUCTION (Intermediate Code: three address, control flow)

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

IR Lowering. Notation. Lowering Methodology. Nested Expressions. Nested Statements CS412/CS413. Introduction to Compilers Tim Teitelbaum

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 19: Efficient IL Lowering 5 March 08

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

Intermediate Representations

Alternatives for semantic processing

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow

CSE 401/M501 Compilers

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Intermediate Representa.on

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Intermediate Code Generation

This book is licensed under a Creative Commons Attribution 3.0 License

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

Intermediate Code Generation

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Intermediate Code Generation

CS 24: INTRODUCTION TO. Spring 2018 Lecture 3 COMPUTING SYSTEMS

Intermediate Code Generation

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

1 Notational Conventions

Page No 1 (Please look at the next page )

UNIT IV INTERMEDIATE CODE GENERATION

Time : 1 Hour Max Marks : 30

CS2210: Compiler Construction. Code Generation

Page # Expression Evaluation: Outline. CSCI: 4500/6500 Programming Languages. Expression Evaluation: Precedence

CSc 453 Intermediate Code Generation

G Programming Languages - Fall 2012

The PCAT Programming Language Reference Manual

PROFESSOR: Last time, we took a look at an explicit control evaluator for Lisp, and that bridged the gap between

1 Little Man Computer

LECTURE 17. Expressions and Assignment

Functions CHAPTER 5. FIGURE 1. Concrete syntax for the P 2 subset of Python. (In addition to that of P 1.)

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

Denotational semantics

Introduction to Denotational Semantics. Brutus Is An Honorable Man. Class Likes/Dislikes Survey. Dueling Semantics

Principles of Programming Languages. Lecture Outline

Register Machines. Connecting evaluators to low level machine code

G Compiler Construction Lecture 12: Code Generation I. Mohamed Zahran (aka Z)

Assignment 7: functions and closure conversion (part 1)

CSE 401/M501 Compilers

Intermediate Representations

Functions CHAPTER 5. FIGURE 1. Concrete syntax for the P 2 subset of Python. (In addition to that of P 1.)

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Assignment 7: functions and closure conversion

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

Pipelining, Branch Prediction, Trends

Writing Program in C Expressions and Control Structures (Selection Statements and Loops)

CSc 520 Principles of Programming Languages. 26 : Control Structures Introduction

Part I Part 1 Expressions

The story so far. Elements of Programming Languages. Pairs in various languages. Pairs

Will introduce various operators supported by C language Identify supported operations Present some of terms characterizing operators

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

Chapter 6 Control Flow. June 9, 2015

COSC 122 Computer Fluency. Computer Organization. Dr. Ramon Lawrence University of British Columbia Okanagan

MIDTERM EXAM (Solutions)

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Chapter 7 Control I Expressions and Statements

Project 3 Due October 21, 2015, 11:59:59pm

Lecture 7: Primitive Recursion is Turing Computable. Michael Beeson

CS558 Programming Languages

1 Lexical Considerations

Formal Semantics. Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1

CS251 Programming Languages Handout # 29 Prof. Lyn Turbak March 7, 2007 Wellesley College

CS 360 Programming Languages Interpreters


Multicycle conclusion

17.1. Unit 17. Instruction Sets Picoblaze Processor

Program Abstractions, Language Paradigms. CS152. Chris Pollett. Aug. 27, 2008.

TACi: Three-Address Code Interpreter (version 1.0)

1. Lexical Analysis Phase

5. Semantic Analysis!

CS 242. Fundamentals. Reading: See last slide

Concepts Introduced in Chapter 6

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Syntax Errors; Static Semantics

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

CS 415 Midterm Exam Fall 2003

The Assembly Language of the Boz 5

Assembly Language Manual for the STACK Computer

8. Control statements

Intermediate Representation Prof. James L. Frankel Harvard University

Static Checking and Intermediate Code Generation Pat Morin COMP 3002

Concepts Introduced in Chapter 6

Compiler Construction I

University of Arizona, Department of Computer Science. CSc 453 Assignment 5 Due 23:59, Dec points. Christian Collberg November 19, 2002

A Simple Syntax-Directed Translator

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Semantic Analysis and Intermediate Code Generation

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

CSCC24 Functional Programming Scheme Part 2

LANGUAGE PROCESSORS. Introduction to Language processor:

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Intermediate Representation (IR)

Transcription:

1 Three-address code (TAC) TDT4205 Lecture 16

2 On our way toward the bottom We have a gap to bridge: Words Grammar Source program Semantics Program should do the same thing on all of these, but they re all different...

3 High-level intermediate representation (IR) Working from the syntax tree (or similar), we can capture the program s meaning without hardware details If we generalize the representation a bit, we can even liberate it from the specific syntax of the source language The main GCC distribution gives you several front-ends (scan/parse/translate) which target the same IR C C++ Ada Go Fortran GENERIC (tree repr. at function level) (CPU-specifics) There are more, but they re not part of the main distribution (yet)...

4 From the other end CPU-specific details go into things like how to store addresses, how many registers there are, if any of them have special purposes, etc. etc. They all have pretty similar sets of operations, though With an abstraction for that, we can re-use most of the low level logic for different machines Low-level IR Generator

5 Stored-program computing If we ignore their implementation details, practically every* modern CPU looks like** a von Neumann machine, ticking along to a clock that makes it periodically Fetch an instruction code (from a memory address) Fetch the operands of the instruction (from a memory address) Execute the instruction to obtain its result Put the result somewhere clever (into a memory address) Control Arithmetic/Logic CPU Memory (contains both data and program) RAM * research contraptions and exotic experiments notwithstanding ** note that they aren t actually made this way anymore, but emulate it for the sake of programmability

6 There are only two things to handle Instructions for the control unit Data for the arithmetic/logic unit Instructions and data are both found at memory addresses, but we can use symbolic names for those Labels for instructions Names for variables It s handy to sub-categorize the instructions into Binary operations Unary operations Copy operations Load/store operations Unconditional jumps Conditional jumps Procedure calls Math, logic, data movement Control flow

7 TAC is a low-level IR It s three-address because each operation deals with at most three addresses: Binary operations: a = b OP c OP is ADD, MUL, SUB, DIV Unary operations: a = OP b OP is MINUS, NEG, Copy: a = b Load/store: x = &y address-of-y x = *y x[i] = y... value-at-address-y address+offset

8 TAC is a low-level IR Control flow gets the same treatment: Label: L: named adr. of next instr. Unconditional jump: jump L go to L and get next instr. Conditional jump: if x goto L go to L if x is true iffalse x goto L go to L if x is false if x < y goto L comparison operators if x >= y goto L if x!= y goto L Call and return: param x x is a parameter in next call call L almost like jump (more later) return to where the last call came from

9 Internal representation With at most three locations in each operation, they can be written as entries in a 4-column table (quadruples): op arg1 arg2 result mul x x t1 mul y y t2 add t1 t2 t3 copy t3 z This is one (possible) translation of z = (x*x) + (y*y)

10 It can be trimmed down still Three columns (triples) suffice if we treat the intermediate results as places in the code We could decouple the instruction index from the position index (indirect triples) (Instr. #) op arg1 arg2 (0) mul x x (1) mul y y (2) add (0) (1) (3) copy z (2) One can imagine any number of implementations, the TAC part is that each instruction deals with 3 locations...

11 Static Single Assignment Programs are at liberty use the same variable for different purposes in different places: z = (x*x) + (y*y); // Get a sum of squares if ( z > 1 ) // We re only interested in distances > 1 z = sqrt(z); // Get the distance from (0,0) to (x,y) A compiler might make use of how z plays two different parts here It can also introduce as many intermediate variables as it likes: z 1 = (x*x) + (y*y); if ( z 1 > 1 ) z 2 = sqrt(z 1 ); z 3 = Φ ( z 1, z 2 ) This makes it explicit that z 1 and z 2 are different values computed at different points, and that the value of z 3 will be one or the other We can read that from the source code, a compiler needs a representation to recognize it

12 Translations into low IR We have two intermediate representations We need a systematic way to translate one into the other Suppose we let e denote a construct from high IR T [ e ] t = T [ e ] denote its translation into low IR denote the assigment that puts the outcome of T[e] in t to have a notation which can capture nested applications of a translation

13 Simple operations Disregarding how complicated the contents of e1, e2 are, this generally translates t = T [ e1 op e2 ] into t1 = T [ e1 ] t2 = T [ e2 ] t = t1 op t2 In other words, First, (recursively) translate e1 and store its result Next, (recursively) translate e2 and store its result Finally, combine the two stored results e1 op e2

14 This linearizes the program In terms of a syntax tree, we re laying out its parts in depth-first traversal order: t1 = 1 t2 = 3 t = 1 + 3 (from the bottom, where arguments are values) * + 5 1 3

15 This linearizes the program Evaluate one part after another t1 = 1 t2 = 3 t3 = 1 + 3 t4 = t3 t5 = 5 t6 = t3 * 5 Same pattern applied to sub-trees, in order * + 5 1 3

16 This linearizes the program Combine the local parts which represent sub-trees: t1 = 1 t2 = 3 t3 = 1 + 3 t4 = t3 t5 = 5 t6 = t3 * 5 t = t6 Final result is the whole expression + * 1 3 5

17 Nested expressions Combine the local parts which represent sub-trees: t1 = 1 t2 = 3 t3 = 1 + 3 T [ 1 + 3 ] t4 = t3 t5 = 5 T [ t3 * 5 ] T [ (1+3) * 5 ] t = T[(1+3)*5] t6 = t3 * 5 t = t6

18 Statement sequences These are straightforward since they are already sequenced: T [ s1; s2; s3; ; sn ] becomes T [ s1 ] T [ s2 ] T [ s3 ] T [ sn ] Just translate one statement after the other, and append their translations in order

19 Assignments T [ v = e ] requires us to Obtain the value of e Put the result into v = Since e is already (recursively) handled, T [ v = e ] becomes t = T [ e ] v = t v e (or just v = T [ e ] if it s convenient to recognize the shortcut)

20 Array assignment T [ v[e1] = e2 ] requires us to Compute the index e1 Compute the expression e2 = Put the result into v[e1] t1 = T [ e1 ] t2 = T [ e2 ] v [ t1 ] = t2 v v[e1] e1 e2

21 Conditionals These require control flow T [ if ( e ) then s ] becomes if t1 = T [ e ] iffalse t1 goto Lend T [ s ] Lend: (transl. of next statement comes here) e (condition) s (statement)

22 Conditionals If e is true, control goes through s If e is false, control skips past it if t1 = true t1 = T [ e ] iffalse t1 goto Lend T [ s ] Lend: t1 = false e (condition) s (statement)

23 Conditionals + else You can probably guess this one: if t1 = true t1 = T [ e ] t1 = false iffalse t1 goto Lelse T [ s1 ] e s1 s2 jump Lend Lelse: T [ s2 ] Lend:

24 Loops (in while flavor) The condition must be tested every iteration T [ while (e) do s ] becomes t1 = true t1 = false while Ltest: t1 = T [ e ] iffalse t1 goto Lend T [ s ] jump Ltest Lend: e s

25 Loops are loops For the sake of completeness, for(i=0; i<10; i++) { stuff(); } i = 0; while (i<10) { stuff(); i = i + 1 } do { stuff(); } while (x); stuff(); while (x) { stuff(); } Different kinds of loops are equivalent to the point of syntactic sugar, whatever form your compiler likes best works also for the others

26 Switch (if-elseif style) T [ switch (e) { case v1:s1,, case vn: sn } ] can become t = T[e] iffalse (t=v1) jump L1 T[s1] L1: iffalse (t=v2) jump L2 T[s2] L2: iffalse (t=vn) jump Lend T [ sn ] Lend: switch e v1 s1 v2 s2 v3 s3

27 Switch (by jump table) T [ switch (e) { case v1:s1,, case vn: sn } ] can also become t = T[e] jump table[t] Lv1: T[s1] Lv2: T[s2] Lvn: T [ sn ] Lend: switch e v1 s1 v2 s2 v3 s3 provided that the compiler can generate a table which maps v1,,vn into the target addresses Lv1, Lvn for the jumps (We didn t talk about computed jumps, but labels are just addresses which can be calculated. I mention this because it s probably what you ll see if you disassemble your favourite compiler s interpretation of a switch statement.)

28 Labeling scheme Labels must be unique This can be handled by numbering the statements that generate them: if ( e1 ) then s1; if ( e2 ) then s2; becomes t1 = T[e1] iffalse t1 goto Lend1 T[s1] Lend1: t2 = T[e2] iffalse t2 goto Lend2 T[s2] Lend2: (...and so on...)

29 Nested statements if (e1) then if (e2) then a = b requires a little care, nesting (as with expressions) gives t1 = T [ e1 ] iffalse (t1) goto Lend1 t2 = T [ e2 ] iffalse (t2) goto Lend2 t3 = b a = t3 Lend2: Lend1: Statement Inner if (#2) Outer if (#1) The counting scheme must behave like a stack (to generate end-labels in matching order with construct beginnings)

30 Those were the basics You can surely work out similar patterns for many statement types of your own invention or try some from your favourite language Things we didn t talk about Redundant code after translation (Artifacts we want the low IR to expose, so that we can remove them) Procedure call and return (Should be decorated with little background in CPU architecture) These are for next time