Languages and Compiler Design II IR Code Generation I

Similar documents
Languages and Compiler Design II IR Code Optimization

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

Formal Languages and Compilers Lecture X Intermediate Code Generation

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

Concepts Introduced in Chapter 6

Compiler Construction 2009/2010: Intermediate Representation

LECTURE 17. Expressions and Assignment

Concepts Introduced in Chapter 6

Module 27 Switch-case statements and Run-time storage management

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

7 Translation to Intermediate Code

We ve written these as a grammar, but the grammar also stands for an abstract syntax tree representation of the IR.

Intermediate Representa.on

CS2210: Compiler Construction. Code Generation

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV. Department of Computer Science University of Arizona

Topic 7: Intermediate Representations

Intermediate Code Generation

CS 432 Fall Mike Lam, Professor. Code Generation

Compiler Internals. Reminders. Course infrastructure. Registering for the course

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

We can emit stack-machine-style code for expressions via recursion

G Programming Languages - Fall 2012

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

CMPSC 160 Translation of Programming Languages. Three-Address Code

Where we are. What makes a good IR? Intermediate Code. CS 4120 Introduction to Compilers

Fall Compiler Principles Lecture 6: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

(Not Quite) Minijava

Code Generation. Lecture 30

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Module 26 Backpatching and Procedures

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

Module 25 Control Flow statements and Boolean Expressions

THEORY OF COMPILATION

COP4020 Programming Languages. Control Flow Prof. Robert van Engelen

LECTURE 18. Control Flow

Control Structures. Boolean Expressions. CSc 453. Compilers and Systems Software. 16 : Intermediate Code IV

CSE 452: Programming Languages. Outline of Today s Lecture. Expressions. Expressions and Control Flow

Intermediate Code Generation (ICG)

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

Compilers. Compiler Construction Tutorial The Front-end

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Intermediate Code Generation

KU Compilerbau - Programming Assignment

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Code Generation. CS 1622: Code Generation & Register Allocation. Arrays. Why is Code Generation Hard? Multidimensional Arrays. Array Element Address

Relational Expressions. Boolean Expressions. Boolean Expressions. ICOM 4036 Programming Languages. Boolean Expressions

CPSC 411, Fall 2010 Midterm Examination

Intermediate Code Generation

Lexical Considerations

Itree Stmts and Exprs. Back-End Code Generation. Summary: IR -> Machine Code. Side-Effects

1 Lexical Considerations

The SPL Programming Language Reference Manual

Announcements. Project 2: released due this sunday! Midterm date is now set: check newsgroup / website. Chapter 7: Translation to IR

Alternatives for semantic processing

CSc 520 Principles of Programming Languages. 26 : Control Structures Introduction

3. Java - Language Constructs I

Translation. From ASTs to IR trees

Intermediate Code Generation

An Overview of Compilation

Intermediate Representations

Intermediate Representations & Symbol Tables

Fall Compiler Principles Lecture 5: Intermediate Representation. Roman Manevich Ben-Gurion University of the Negev

A Short Summary of Javali

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Review of the C Programming Language

QUIZ. 1. Explain the meaning of the angle brackets in the declaration of v below:

PRINCIPLES OF COMPILER DESIGN

The PCAT Programming Language Reference Manual

G Programming Languages - Fall 2012

IC Language Specification

B.V. Patel Institute of Business Management, Computer & Information Technology, Uka Tarsadia University

opt. front end produce an intermediate representation optimizer transforms the code in IR form into an back end transforms the code in IR form into

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

IR trees: Statements. IR trees: Expressions. Translating MiniJava. Kinds of expressions. Local variables: Allocate as a temporary t

CS 314 Principles of Programming Languages. Lecture 9

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions

Chapter 7 Control I Expressions and Statements

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

CPSC 411, 2015W Term 2 Midterm Exam Date: February 25, 2016; Instructor: Ron Garcia

Implementing Control Flow Constructs Comp 412

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

A Simple Syntax-Directed Translator

Expressions and Assignment Statements

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Expressions and Assignment

Functional Programming. Pure Functional Programming

Chapter 7. Expressions and Assignment Statements ISBN

intermediate-code Generation

Compiler Theory. (Intermediate Code Generation Abstract S yntax + 3 Address Code)

CMSC430 Spring 2014 Midterm 2 Solutions

UCB CS61C : Machine Structures

Intermediate Representations

Software II: Principles of Programming Languages. Why Expressions?

CS322 Languages and Compiler Design II. Spring 2012 Lecture 7

Transcription:

Languages and Compiler Design II IR Code Generation I Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1

Agenda Grammar G1 CodeGen Overview Arithmetic Expression Translation Boolean Expression Translation Various Statement Translations PSU CS322 HM 2

Grammar G1 Input: Output: Approach: AST representation of MINI source Three-address code or IR tree code Syntax-directed translation Generic Grammar G1, start symbol: S E -> E arithop E E relop E E logicop E E -> - E! E E -> newarray E // new int array size E1 E -> E [ E ] // indexed element E -> id num // end-nodes S -> E := E ; // assignment statement S -> if ( E ) then S else S S -> while ( E ) S S -> print E ; S -> return E ; PSU CS322 HM 3

CodeGen Overview Arithmetic Expressions: preserve precedence and associativity Pay attention, whether language requires check for zero-divide Boolean Expressions: define short-circuit evaluation vs. complete evaluation discern bit-wise vs. logical and, or, xor multiple unary NOT allowed? Array definition: 1D is simple for compiler Per dimension, record: element size, low-bound, high-bound, total size, index type and type Array element reference: L-value or r-value? Nested array reference: index expression can in turn be array element Discern pass by value or reference, other Statements: Goto into other scope, out of current scope issue in FTN, C Return: long-jump in C non-trivial Parameters: Function parameters in PL/I and Pascal hard Easy to confuse pointer type parameters with reference parameters (in C) PSU CS322 HM 4

Arithmetic Expression Translation Generate tree-address code: get new temp per operation E.s holds statements that evaluate E E.t is temp that holds E s value E -> E1 arithop E2 t = new Temp(); E.s := [ E1.s; E2.s; t := E1.t arithop E2.t; ] E.t := t; E -> unaryop E1 t = new Temp(); E.s := [ E1.s; t := unaryop E1.t; ] E.t := t; PSU CS322 HM 5

Arithmetic Expression Translation, Cont d To generate IR trees, embed expression subtrees into current root. Attribute E.tr holds IR tree for E E -> E1 arithop E2 E.tr := ( BINOP arithop E1.tr E2.tr ) E -> unaryop E1 E.tr := ( UNOP unaryop E1.tr null ) b * -c + b * d / e // assume l-2-r associativity of * / % => t1 := -c t2 := b * t1 t3 := b * d t4 := t3 / e t5 := t2 + t4 => (BINOP + (BINOP * b (UNOP c ) ) (BINOP / (BINOP * b d ) e ) ) similar to Polish Postfix notation, after Lukasiewicz, 1920 PSU CS322 HM 6

Boolean Expression Translation Rely on target machine with conditional branches Condition can be part of instruction Or condition can be inquired by using machine flags Or condition can be evaluated separately (canonical execution) and then be provided as one of the arguments Operands are: condition, target address, and *+1 CodeGen uses temps for intermediate booleans Or CodeGen uses flow of control, so code locations imply state of boolean subexpressions Or combination of both Target computer may provide boolean or even bitwise operations: And Or Xor Not, etc PSU CS322 HM 7

Boolean Expression Translation, Quads Relational operations need to record their result, e.g. in machine flags. Logical operations can be realized through computation proper or via control flow. See sample expression: a < 5 b > 2 a < 5 b > 2 // source => Pure code mapping, with logical and, or, xor instruction: t1 := a < 5 // e.g. encode 0 as false, 1 as true t2 := b > 2 t3 := t1 t2 // generate jump out if t3 is false => Control flow mapping, w/o logical and, or, xor : t1 := 1 // guess t1 is true, override if needed if a < 5 goto l1 // could be quad: Cond_jump_if_less t1 := 0 // guess was wrong, override to false l1:t2 := 1 // guess: set t2 to true initially if b > 2 goto l2 t2 := 0 // wrong guess, set t2 to false l2:t3 := 1 // final guess if t1 goto l3 if t2 goto l3 t3 := 0 // final guess computed as false l3: // use t3 PSU CS322 HM 8

Better Representation of Booleans, and IR Use target machine s native logical operations for: and, or, xor; also in sample expression: a < 5 b > 2 t1 := 1 if a < 5 goto l1 t1 := 0 l1: t2 := 1 if b > 2 goto l2 t2 := 0 l2: t3 := t1 or t2 // use t3 MOVE t3 ( (BINOP (ESEQ [ [MOVE t1 (CONST 1) ] [CJUMP < (NAME a) (CONST 5) l1 ] [MOVE t1 (CONST 0) ] [LABEL l1] ] t1 ) (ESEQ [ [MOVE t2 (CONST 1)] [CJUMP > (NAME b) (CONST 2) l2 ] [MOVE t2 (CONST 0) ] [LABEL l2] ] t2 ) ) ) PSU CS322 HM 9

Value Representation, Relational E -> E1 relop E2 Three-Address Code: L := new Label(); t := new Temp(); E.s := [ E1.s; E2.s; t := 1; if ( E1.t relop E2.t ) goto L; t := 0; L:... ] E.t := t; IR Tree Code: L := new NAME(); t := new TEMP(); E.tr := ( ESEQ [ [MOVE t (CONST 1 ) ] [ CJUMP relop E1.tr E2.tr L ] [MOVE t (CONST 0) ] [LABEL L] t ) PSU CS322 HM 10

Value Representation, Three-Address Code E -> E1 E2 L := new Label(); t := new Temp(); E.s := [ E1.s; E2.s; t := 1; if ( E1.t == 1 ) goto L; if ( E2.t == 1 ) goto L; t := 0; L:... ] E.t := t; E -> E1 && E2 L := new Label(); t := new Temp(); E.s := [ E1.s; E2.s; t := 0; if ( E1.t == 0 ) goto L; if ( E2.t == 0 ) goto L; t := 1; L:... ] E.t := t; E -> E1! E2 t := new Temp(); E.s := [ E1.s; t := 1 E1.t; ] E.t := t; PSU CS322 HM 11

Value Representation, IR Tree Code E -> E1 E2 E -> E1 && E2 L = new NAME(); t = new TEMP(); E.tr := (ESEQ [ [MOVE t (CONST 1) ] [CJUMP == E1.tr (CONST 1) L ] [CJUMP == E2.tr (CONST 1) L ] [MOVE t (CONST 0) ] [LABEL L] t ) L = new NAME(); t = new TEMP(); E.tr := (ESEQ [ [MOVE t (CONST 0) ] [CJUMP == E1.tr (CONST 0) L ] [CJUMP == E2.tr (CONST 0) L ] [MOVE t (CONST 1) ] [LABEL L] t ) E -> E1! E2 t = new TEMP(); E.tr := (ESEQ [MOVE t (BINOP (CONST 1) E1.tr)] t ) PSU CS322 HM 12

Control-Flow Mapping, Long If Version Booleans used in programs to direct flow of control, e.g. if ( a < 5 b > 2 ) S1; else S2; Frequently, the Boolean result is not needed afterwards. Thus possible to generate positional code. Instead of: // assume ( a < 5 b > 2 ) stored in t3 // code to compute t3, includes boolean OR if ( t3 == 0 ) goto l2 L1: code for S1 goto L3 l2: code for S2 L3:... Successor of if-statement... t3 not needed; if machine flag: overridden by S1, S2 PSU CS322 HM 13

Control-Flow Mapping, Shorter If Version if ( a < 5 ) goto L4 if ( b > 2 ) goto L4 goto L5 L4: code for S1 goto L6 l5: code for S2 L6:... Code after if-statement 1. No need to create temps to compute boolean value 2. How does code-gen know where to branch to? Use backpatching! Can be done by buffering code, or after CodeGen. Ramifications would be good Midterm question PSU CS322 HM 14

Control-Flow Mapping, Nested If Statements Source Code Skeleton if ( a < 5 ) if ( b > 2 ) S1; else S2; //end if else if ( c < 6 ) S3; else S4; //end if //end if Object Code Skeleton if ( a >= 5 ) goto L8 // back-patch if ( b <= 2 ) goto L7 // back-patch code for S1 goto L10 // back-patch L7: // L7 resolved code for S2 goto L10 // back-patch L8: // L8 resolved if ( c >= 6 ) goto L9 // back-patch code for S3 goto L10 // back-patch L9: // L9 resolved code for S4 L10: // L10 resolved Data structures needed to back-patch? PSU CS322 HM 15

Control-Flow Mapping, Elsif Clauses (Ada) Source Code Skeleton if a < 5 then S1; elsif b > 2 then S2; elsif c < 6 then S3; else S4; end if; Object Code Skeleton if ( a >= 5 ) goto L11 code for S1 goto L14 L11: if ( b <= 2 ) goto L12 code for S2 goto L14 L12: if ( c >= 6 ) goto L13 code for S3 goto L14 L13: code for S4 L14: How can linked-list of to-be-back-patched addresses be created? PSU CS322 HM 16

Control-Flow Mapping, While Source Code Skeleton while ( i < 10 ) { S; i++; } //end while // assume i NOT needed after // i is pure IV Object Code Skeleton // R1 holds induction variable L15: if ( R1 >= 10 ) goto L16 code for S R1++ goto L15 L16: Are there size (of code) limitations to back-patching? PSU CS322 HM 17

Control-Flow Mapping, Repeat (Pascal) Source Code Skeleton //Pascal source repeat S; i++; until i >= 10; // again no use of i after Object Code Skeleton // R1 holds induction variable L15: if ( R1 >= 10 ) goto L16 code for S R1++ goto L15 L16: Fall-Through in Repeat vs. initial test in While PSU CS322 HM 18

Control-Flow Mapping, For Source Code Skeleton for( int i=0; i<10; i++ ) { S; } //end for // i is undefined/not used // can be IV in reg Object Code Skeleton mov R1, #0 L17: If ( R1 >= 10 ) goto L18 code for S R1++ goto L17 L18: What happens if i (induction variable AKA IV) is defined outside, and used as loop parameter? What is its value after for loop completion? Can it be referenced? i.e. value be printed? What happens if IV is assigned inside loop? What should happen, if IV value is > end-value at start? PSU CS322 HM 19

Back Patching Example if ( a < 5 b > 2 ) S1; else S2; Handling a<5: if (a < 5) goto <Lx>; // <Lx> needs to be patched; addr. insertion Handling b>2: if (b > 2) goto <Ly>; // <Ly> needs to be patched Handling....: if (a < 5) goto <Lx>; //.. else fall through if (b > 2) goto <Lx>; // <Ly> is patched to <Lx> goto <Lz> // <Lz> needs to be patched Handling if.. S1 else S2: if (a < 5) goto L4; // <Lx> is patched to L4 if (b > 2) goto L4; goto L5; // <Lz> is patched to L5 L4: [code for S1] goto L6; // then clause L5: [code for S2] // else clause L6: // end of If Statement PSU CS322 HM 20

Back Patching: Jump Labels Three-Address Code: Add two attributes E.true position to jump to when E evaluates to true; E.false position to jump to when E evaluates to false. E -> E1 relop E2 E.s := [ E1.s; E2.s; if ( E1.t relop E2.t ) goto E.true; E.false: ] E -> E1 E2 E1.true := E.true; E1.false := new Label(); E2.true := E.true; E2.false := E.false; E.s := [ E1.s; E1.false: E2.s; ] PSU CS322 HM 21

Back Patching: Three-Address Code Cont d E -> E1 && E2 E1.true := new Label(); E1.false := E.false; E2.true := E.true; E2.false := E.false; E.s := [ E1.s; E1.true: E2.s; ] E ->! E1 E1.true := E.false; E1.false := E.true; E.s := E1.s; PSU CS322 HM 22

Back Patching: Jump Labels Cont d IR Tree Code: E -> E1 relop E2 E.tr := ( ESEQ [CJUMP relop E1.tr E2.tr E.true ] null ) E -> E1 E2 E1.true := E.true; E1.false := new NAME(); E2.true := E.true; E2.false := E.false; E.tr := (ESEQ [stmt( E1.tr); LABEL( E1.false ); stmt( E2.tr); ] null ) PSU CS322 HM 23

Back Patching: IR Tree Cont d E -> E1 && E2 E1.true := new NAME(); E1.false := E.false; E2.true := E.true; E2.false := E.false; E.tr := (ESEQ [stmt( E1.tr ); LABEL( E1.true ); stmt( E2.tr ); ] null) E ->! E1 E1.true := E.false; E1.false := E.true; E.tr := E1.tr; PSU CS322 HM 24

Converting Back to Value Actual Boolean value are needed in programs, e.g. boolean x = a < 5 b > 2; We still need to generate a value for the Boolean expression! This can be implemented by patching the two labels E.true and E.false for the Boolean expression E with two assignment statements for assigning 1 and 0, respectively. Boolean expression E t = new Temp(); E.true := new Label(); E.false := new Label(); L := new Label(); E.s := [ E.true: t := 1; goto L; E.false: t := 0; L: ] E.t := t; PSU CS322 HM 25

New Arrays E => newarray E1 Storage allocation for E: Follow Java s array storage convention. The length of array is stored as the 0 th element. So storage for a 10-element array actually requires 11 cells Cell initialization All elements automatically initialized to 0; you emit code Pseudo IR Code: L: new Label; t1,t2,t3: new Temps;// wdsize == 4 E.s := [ E1.s; t1 := ( E1.t + 1 ) * wdsize; // number of elements t2 := malloc( t1 ); // t2 points to cell 0 t2[0] := E1.t; // store array length t3 := t2 + ( E1.t * wdsize ); // t3 points to last cell L: t3[0] := 0; // init a cell to 0 t3 := t3 - wdsize; if ( t3 > t2 ) goto L; ] E.t := t2; // move down a cell // loop back PSU CS322 HM 26

Arrays Element Reference E => E1 [ E2 ] Calculate address for E: addr( a[i] ) = base a + (i+1) * wdsize Bounds check: i >= 0 and i < num-elements. i is general expression! L1,L2: new Label; t1,t2,t3,t4: new Temps; E.s := [ E1.s; E2.s; t1 := E1.t[ 0 ]; // t1 holds num elements if ( E2.t < 0 ) goto L1; // too low? if ( E2.t >= t1 ) goto L1;// too high? t2 := E2.t + 1; // must be OK t3 := t2 * wdsize; // compute offset t4 := E1.t[ t3 ]; // address = start + offset goto L2; // bypass exception handler L1: param E1.t; param E2.t; call arrayerror, 2; L2: ] // t4 holds final address E.t := t4; PSU CS322 HM 27

Statements Assignment Statement S => E1 := E2 ; => S.s :=[ E1.s; E2.s; E1.t := E2.t; ] If Statement with Else Clause S => if ( E ) then S1 else S2 ; => L1, L2, L3: new Labels; E.true := L1; E.false := L2; S.s :=[ E.s; L1: S1.s; goto L3; L2: S2.s; L3: ; ] PSU CS322 HM 28

Statements, Cont d While Statement S => while ( E ) S1 ; => L1, L2, L3: new labels; // no explicit jump to L2 E.true := L2; E.false := L3; S.s :=[ L1: E.s; L2: S1.s; goto L1; L3: ] Print Statement with 1 argument S => print E ; => S.s :=[ E.s; param E.t; call print, 1; ] PSU CS322 HM 29