Intermediate Code Generation

Similar documents
Object Code (Machine Code) Dr. D. M. Akbar Hussain Department of Software Engineering & Media Technology. Three Address Code

Compiler construction

Intermediate Code Generation

SOURCE LANGUAGE DESCRIPTION

Intermediate Code Generation

Intermediate Code Generation

Concepts Introduced in Chapter 6

Concepts Introduced in Chapter 6

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

1 Lexical Considerations

Dr. D.M. Akbar Hussain

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

CMPT 379 Compilers. Anoop Sarkar. 11/13/07 1. TAC: Intermediate Representation. Language + Machine Independent TAC

Module 27 Switch-case statements and Run-time storage management

Intermediate Representations

Chapter 6 Intermediate Code Generation

Lexical Considerations

The PCAT Programming Language Reference Manual

CSc 453 Interpreters & Interpretation

Functional Programming. Pure Functional Programming

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Intermediate Representations

Semantic analysis and intermediate representations. Which methods / formalisms are used in the various phases during the analysis?

Lexical Considerations

UNIT IV INTERMEDIATE CODE GENERATION

NARESHKUMAR.R, AP\CSE, MAHALAKSHMI ENGINEERING COLLEGE, TRICHY Page 1

Formal Languages and Compilers Lecture X Intermediate Code Generation

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

A Simple Syntax-Directed Translator

CSE 3302 Notes 5: Memory Management

CSc 453. Compilers and Systems Software. 13 : Intermediate Code I. Department of Computer Science University of Arizona

Intermediate Representa.on

Intermediate Code Generation

CS2210: Compiler Construction. Code Generation

UNIT 3

Write a P-Code Assembler. Program Files: File Description pasm.cpp. Project:

CS558 Programming Languages Winter 2018 Lecture 4a. Andrew Tolmach Portland State University

Semantic actions for declarations and expressions

CSc 453 Intermediate Code Generation

intermediate-code Generation

Compiler Principle and Technology. Prof. Dongming LU April 15th, 2019

Virtual Machine Tutorial

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

Exercise 1 Codify in Yacc the generator of the abstract trees of the language defined by the following grammar:

Review of the C Programming Language

CODE GENERATION Monday, May 31, 2010

Intermediate Code Generation

Computer Architecture

NOTE: Answer ANY FOUR of the following 6 sections:

Question Bank. 10CS63:Compiler Design

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

CSCI565 Compiler Design

Semantic actions for declarations and expressions

CS558 Programming Languages

The SPL Programming Language Reference Manual

A Fast Review of C Essentials Part I

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Principles of Compiler Design

Today's Topics. CISC 458 Winter J.R. Cordy

CSCE 531, Spring 2015 Final Exam Answer Key

Semantic actions for declarations and expressions. Monday, September 28, 15

Static Checking and Type Systems

Chapter 4 :: Semantic Analysis

CMPSC 160 Translation of Programming Languages. Three-Address Code

Homework #3: CMPT-379 Distributed on Oct 23; due on Nov 6 Anoop Sarkar

Semantic Processing. Semantic Errors. Semantics - Part 1. Semantics - Part 1

(just another operator) Ambiguous scalars or aggregates go into memory

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Principle of Compilers Lecture VIII: Intermediate Code Generation. Alessandro Artale

CS 320: Concepts of Programming Languages

COMP-520 GoLite Tutorial

Principles of Programming Languages

Semantic Analysis and Intermediate Code Generation

Language Reference Manual simplicity

Parsing Scheme (+ (* 2 3) 1) * 1

Run Time Environments

Acknowledgement. CS Compiler Design. Intermediate representations. Intermediate representations. Semantic Analysis - IR Generation

University of Arizona, Department of Computer Science. CSc 453 Assignment 5 Due 23:59, Dec points. Christian Collberg November 19, 2002

G Programming Languages - Fall 2012

Compiler Construction I

BLM2031 Structured Programming. Zeyneb KURT

Decaf Language Reference Manual

LECTURE 18. Control Flow

UNIT-3. (if we were doing an infix to postfix translator) Figure: conceptual view of syntax directed translation.

CS321 Languages and Compiler Design I Winter 2012 Lecture 13

Talen en Compilers. Johan Jeuring , period 2. December 15, Department of Information and Computing Sciences Utrecht University

Syntax-Directed Translation

Review of the C Programming Language for Principles of Operating Systems

Code Generation: Integrated Instruction Selection and Register Allocation Algorithms

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

P.G.TRB - COMPUTER SCIENCE. c) data processing language d) none of the above

More On Syntax Directed Translation

Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.

PSD3A Principles of Compiler Design Unit : I-V. PSD3A- Principles of Compiler Design

CS 2210 Programming Project (Part IV)

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

Compilerconstructie. najaar Rudy van Vliet kamer 124 Snellius, tel rvvliet(at)liacs.

Transcription:

Intermediate Code Generation Front-end intermediate code generation source program Lexical Analyzer Syntax Analyzer Semantic Analyzer Int. Code Generator intermediate representation Symbol Table Advantages of intermediate representation (independent of target): 1. Porting change of back-end only 2. First optimization independent of target 1

Intermediate Representation Intermediate representation º data structure representing source program during translation (exmp: abstract tree + symbol table) Abstract tree adequate representation of source program inadequate for target program (exmp :loops goto) Intermediate code intermediate representation closest to target code Various forms of intermediate code: in general linearization of abstract tree (» oper. semantics) Various abstraction levels high close to abstract tree low close to target machine Popular representations Three-address code P-code ¹ forms 2

Three-Address Code Typical statement: designed to represent evaluation of simple arithmetic expressions x = y op z x, y, z = memory addresses (certainly x, while y and z may be constants) op = either arithmetic operator (typically) or other Example: 2 * a + (b 3) + t3 t1 * t2 2 a b 3 t1 = 2 * a t2 = b 3 t3 = t1 + t2 Notes: Compiler generation of names for temporaries (t1, t2, t3) isomorphic to internal nodes Three-address code = linearization (left to right) of abstract tree Unless $ constraints on evaluation order, possible a different order: (¹ meaning of names) t1 = b 3 t2 = 2 * a t3 = t2 + t1 In general: need for other forms of statements ($ standard), exmp: t2 = t1 3

Three-Address Code (ii) Extended example: computation of factorial x! Notes: read x; /* input of integer */ if x > 0 then fact := 1; repeat fact := fact * x; x := x 1 until x = 0; write fact endif read x t1 = x>0 if_false t1 goto L1 fact = 1 label L2 t2 = fact * x fact = t2 t3 = x 1 x = t3 t4 = x==0 if_false t4 goto L2 write fact label L1 halt read / write: directly translated into statements with one address if_false (two addresses): used to translate if / repeat label (one address): may be necessary in some implementations of three-address code halt (zero addresses!): program termination Assignments within source mapped to copy statements fact := fact * x t2 = fact * x fact = t2 (even if sufficient unique statement optimization) 4

Three-Address Code (iii) Representation: typically " statement record of fields complete code array linked list of records Record (typically): op addr1 addr2 addr3 Possible null value for some addresses read x t1 = x>0 if_false t1 goto L1 fact = 1 label L2 t2 = fact * x fact = t2 t3 = x 1 x = t3 t4 = x==0 if_false t4 goto L2 write fact label L1 halt (rd, x,, ) (gt, x, 0, t1) (if_f, t1, L1, ) (asn, 1, fact, ) (lab, L2,, ) (mul, fact, x, t2) (asn, t2, fact, ) (sub, x, 1, t3) (asn, t3, x, ) (eq, x, 0, t4) (if_f, t4, L2, ) (wri, fact,, ) (lab, L1,, ) (halt,,, ) 5

P-code Historically: designed» 1980 to be the language of a P-machine of which an interpreter was implemented on different real platforms Requirement: portability of Pascal compilers P-code designed to be directly executed We: simplified version of P-machine code memory data memory (for variables with names) stack (for temporaries) registers to manage the stack (stack pointer) support execution (program counter) Example 1: 2 * a + (b 3) ldc 2 ; load constant 2 lod a ; load value of var a mpi ; integer multiplication lod b ; load value of var b ldc 3 ; load constant 3 sbi ; integer substraction adi ; integer addition 6

P-code (ii) Example 2: x := y + 1 lda x ; load address of x lod y ; load value of y ldc 1 ; load constant 1 adi ; add sto ; store top to address below top and pop both Note: different semantics for lda load address lvalue lod load value rvalue 7

P-code (iii) Example 3: read x; /* input of integer */ if x > 0 then fact := 1; repeat fact := fact * x; x := x 1 until x = 0; write fact endif lda x ; load address of x rdi ; read integer, store to address on top and pop lod x ; load value of x ldc 0 ; load constant 0 grt ; compare top two values, pop them, push Boolean result fjp L1 ; pop Boolean value, jump to L1 if false lda fact ; load address of fact ldc 1 ; load constant 1 sto ; pop two values, storing first to address given by second lab L2 ; definition of label L2 lda fact ; load address of fact lod fact ; load value of fact lod x ; load value of x mpi ; multiply sto ; store top to address of second and pop both lda x ; load address of x lod x ; load value of x ldc 1 ; load constant 1 sbi ; substract sto ; store (as before) lod x ; load value of x ldc 0 ; load constant 0 equ ; test for equality fjp L2 ; pop Boolean value, jump to L2 if false lod fact ; load value of fact wri ; write top and, then, pop lab L1 ; definition of label L1 stp ; stop execution 8

Comparison between P-code and Three-Address Code Pros: P-code closer to target code than Three-address code P-code statements require less addresses (our examples: one address at most) (operands implicitly on the stack) Cons: P-code less compact than Three-address code (number of statements) P-code not self-contained: statements implicitly operate on a stack (implicit locations of stack = surrogate of addresses) Advantage in using a stack: contains all values necessary in any point of the code unnecessary for compiler assigning names to values Also: automatic removal of temporaries 9

Intermediate Code as Synthesized Attribute Example: Subset of C expressions (assignment as expression) expr id = expr term term term + factor factor factor ( expr ) num id Assumption: id and num : with lexical attribute lexval = string of characters P-code (A = { pcode }): AG simpler (unnecessary generating names for temporaries) A = { pcode } Necessary a nondestructive store to support assignments as expressions stn Semantics of stn: Stores value at address below it (as sto) Leaves value on top of stack Discards address Production expr 1 id = expr 2 expr term term 1 term 2 + factor term factor factor ( expr ) factor num factor id Semantic rules expr 1.pcode = lda id.lexval ++ expr 2.pcode ++ stn expr. pcode = term. pcode term 1.pcode = term 2.pcode ++ factor.pcode ++ adi term.pcode = factor.pcode factor.pcode = expr.pcode factor.pcode = ldc num. lexval factor.pcode = lod id. lexval = concatenation with space ++ = concatenation with newline 10

Intermediate Code as Synthesized Attribute (ii) Example (P-code): (x = x + 3) + 4 expr term 2 term + factor 3 factor ( 5 expr ) id x 4 = expr term num 4 8 term + factor 9 10 factor id x 0 1 7 6 num 3 Nodes P-code 10, 8 lod x 9 ldc 3 lod x 7, 6 ldc 3 adi lda x lod x 5, 4, 2 ldc 3 adi stn 3 ldc 4 lda x lod x 1, 0 ldc 3 adi stn ldc 4 adi 11

Intermediate Code as Synthesized Attribute (iii) Example (Three-address code): necessary assigning each expression with a name A = { name, qcode } Production expr 1 id = expr 2 expr term term 1 term 2 + factor term factor factor ( expr ) factor num factor id Semantic rules expr 1.name = expr 2.name expr 1.qcode = expr 2. qcode ++ id.lexval "=" expr 2.name expr. name = term. name expr. qcode = term. qcode term 1.name = newtemp() term 1.qcode = term 2.qcode ++ factor.qcode ++ term 1.name "=" term 2.name "+" factor.name term.name = factor.name term.qcode = factor.qcode factor.name = expr.name factor.qcode = expr.qcode factor.name = num. lexval factor.qcode = " " factor.name = id. lexval factor.qcode = " " 12

Intermediate Code as Synthesized Attribute (iv) (x = x + 3) + 4 0 1 expr term 2 term + factor 3 factor ( 5 expr ) id x 4 = expr term num 4 8 term + factor 9 6 7 Nodes qcode name 10,8 x 9 3 3 4 7, 6 t1 = x + 3 t1 5, 4, 2 1, 0 t1 = x + 3 x = t1 t1 = x + 3 x = t1 t2 = t1 + 4 t1 t2 10 factor id x num 3 13

Intermediate Code as Synthesized Attribute (v) Notes (on code generation as computation of a synthesized attribute) Clearly shows relations between code fragments syntax sub-trees Impractical, because: 1. String concatenation many copy operations waste of memory 2. Better generating small fragments of code and writing on file (or on data structures) semantic actions which do not adhere to attribute synthesis in post-order 3. In general: code generation heavily depends on inherited attributes complication in AG Need for more direct (practical) code-generation techniques not based on AG 14

Practical Code Generation Hp: Syntax tree with nodes having at most two children (easily generalizable) procedure gencode(n: Node); begin if n ¹ nil then generate code to prepare for code of left child of n; gencode(n.p1); generate code to prepare for code of right child of n; gencode(n.p2); generate code to implement the action of n endif end. p1 p2 15

Practical Code Generation (ii) (for abstract tree) typedef enum {PLUS, ASSIGN, NUM, ID} Symbol; expr id = expr term term term + factor factor factor ( expr ) num id typedef struct t_node { Symbol symbol; struct t_node *p1, *p2; char *lexeme; /* for id, num */ } Node; symbol p1 p2 lexeme typedef Node *Pnode; (x = x + 3) + 4 + x= 4 + x 3 Note: Assignment identifier stored in the relevant node If LHS complex need for another (first) child 16

Practical Code Generation (iii) void gencode(pnode p) { char code[maxcode]; /* max length of one line of P code */ symbol p1 p2 lexeme } switch(p >symbol) { case PLUS: gencode(p >p1); gencode(p >p2); emit("adi"); break; case ASSIGN: sprintf(code, "lda %s", p >lexeme); emit(code); gencode(p >p1); emit("stn"); break; case NUM: sprintf(code, "ldc %s", p >lexeme); emit(code); break; case ID: sprintf(code, "lod %s", p >lexeme); emit(code); break; } + x= 4 + x 3 17

Code Generation for References to Data Structures Previously: code generation for expr/assignments, where values = constants (LDC) simple var (LOD, LDA) Generation of target code var names replaced by addresses registers absolute addresses relative addresses within AR In general: need to compute addresses by means of intermediate code array indexing record fields pointers need to extend the intermediate notation to express the computation of addresses 18

Extending P-code for Address Computation Introduction of new statements to express ¹ address modes ind <offset> (indirect load) top(stack) = <address> <offset> = integer addr = <address> + <offset> pop() push(*addr) top a ind i top *(a + i) ixa <scale> (indexed address) top(stack) = <offset> subtop(stack) = <base-addr> <scale> = scale factor addr = <base-addr> + (<scale> * <offset>) pop(), pop() push(addr) top a i ixa s top a + (s * i) 19

Code Generation for Array Manipulation Array elements stored sequentially Element address: computed by means of base address of array offset = linear function of index value int a[size], i, j; a[i+1] = a[j*2] + 3; address integer anyway need to first compute address Address computation: 1. Index "normalization" (when index does not start with 0) normalized index 2. Multiplication of normalized index by scale factor = sizeof(elem) scaled index 3. Address = base + scaled index (at point 2.) 20

Code Generation for Array Manipulation (ii) scaled index Example (C): a[i+1] a + ((i+1) * sizeof(int)) base index scale factor General formula: a[t] base(a) + ((t lower_bound(a)) * elem_size(a)) Assumption for the independence from the target machine: 1. Address of array variable º base address lda a push(base address of a) 2. elem_size(a) º size of array element (statically known replaced with a constant by compiler back-end) 21

Procedure of Code Gen. for References and Arrays Notes: expr ixpr = expr term term term + factor factor factor ( expr ) num ixpr ixpr id id [ expr ] LHS of assignment = either identifier or indexing expression Node structured as on pag. 16, but with new operation IDX: typedef enum {PLUS, ASSIGN, IDX, NUM, ID} Symbol; typedef struct t_node { Symbol symbol; struct t_node *p1, *p2; char *lexeme; /* for id, num */ } Node; symbol p1 p2 lexeme typedef Node *Pnode; In general: no longer possible storing assignment var name in relevant node assignment node: with two children (LHS, RHS), like PLUS Indexing: only on identifiers possible storing them in IDX nodes (otherwise: two children for IDX): + (a[i+1] = 2) + a[j] = a[] 2 + i 1 a[] j 22

Procedure of Code Gen. for References and Arrays (ii) void gencode(pnode p, int isaddr) { char code[maxcode]; /* max length of a line of P code */ } switch(p >symbol) { case PLUS: gencode(p >p1, FALSE); gencode(p >p2, FALSE); emit("adi"); break; case ASSIGN: gencode(p >p1, TRUE); gencode(p >p2, FALSE); emit("stn"); break; symbol p1 p2 lexeme case IDX: sprintf(code, "lda %s", p >lexeme); emit(code); gencode(p >p1, FALSE); sprintf(code, "ixa elem_size(%s)", p >lexeme); emit(code); if(!isaddr) emit("ind 0"); break; + case NUM: sprintf(code, "ldc %s", p >lexeme); emit(code); break; = case ID: if(isaddr) sprintf(code, "lda %s", p >lexeme); a[] 2 else sprintf(code, "lod %s", p >lexeme); emit(code); + break; i 1 } a[] j 23

Procedure of Code Gen. for References and Arrays (iii) isaddr = inherited attribute distinguishing IDX and ID between positions LHS RHS isaddr = true returned address false returned value of expressions Application of gencode: (a[i+1] = 2) + a[j] + = a[] a[] 2 j + i 1 lda a lod i ldc 1 adi ixa elem_size(a) ldc 2 stn lda a lod j ixa elem_size(a) ind 0 adi 24

Multidimensional Arrays 0 0 9 int a[5][10]; a[i] 4 Indexing partial array with restricted dimensions: a[i]= one-dimensional array total a[i][j] = integer Computation of address relevant to (partial/total) indexing expression by iterated application of techniques introduced for one-dimensional arrays a a[i][j] = 3; = [] 3 [] j i 10*size(int) lda a lod i ixa elem_size(a) lod j ixa size(int) ldc 3 sto In general: no longer possible storing array name in IDX node IDX node with two children 25

Code Generation for Records and Pointers Computation of address of record field» computation of address for indexing expression in array: 1. Computation of base address of record 2. Computation of offset relevant to field (typically: fixed size static information) 3. Address = base + offset typedef struct rec { int i; char c; int j; } Rec;... Rec r; addresses other memory r.j r.c r.i other memory stack offset(r.j) offset(r.c) base(r) 26

Code Generation for Records and Pointers (ii) Notes: Fields allocated linearly (typically: by increasing addresses) Offset: constants Offset of first field = 0 Offset(field) = f (size of data types in target machine): (no scale factor exists as for array because of inhomogeneity) Code generation independent of target machine: field_offset(r, f) r = record variable f = record field typically: method of symbol table 27

P-code for Records and Pointers typedef struct rec { int i; char c; int j; } Rec; Rec r; r.j r.j = r.i lda r ldc field_offset(r, j) ixa 1 lda r ldc field_offset(r, j) ixa 1 lda r ind field_offset(r, i) sto To get value of r.i without precomputation of relevant address! (directly) int *p, i; *p = i; i = *p; typedef struct tnode { int val; struct tnode *p1, *p2; } Node;... Node *p; lod p lod i sto lda i lod p ind 0 sto p >p1 = p; p = p >p2; lod p ldc field_offset(*p, p1) ixa 1 lod p sto lda p lod p ind field_offset(*p, p2) sto 28

Code Generation for Control Structures if-stat if ( expr ) stat if ( expr ) stat else stat while-stat while ( expr ) stat if ( expr ) stat 1 else stat 2 while ( expr ) stat if ( expr ) stat expr FALSE expr Conditional jump stat TRUE Conditional jump TRUE stat 1 FALSE Unconditional jump expr Conditional jump TRUE stat FALSE Unconditional jump stat 2 29

P-code for Control Structures Sufficient two kinds of jump: unconditional (ujp) conditional to FALSE case (fjp) if ( expr ) stat expr fjp L1 stat lab L1 Notes: if ( expr ) stat 1 else stat 2 expr fjp L1 stat 1 ujp L2 lab L1 stat 2 lab L2 while ( expr ) stat lab L1 expr fjp L2 stat ujp L1 lab L2 All fragments of code end with a label statement exit label of control In many PLs: possible exiting loops from any point within the body exit label = inherited attribute in code-generation functions called in loop Translation mapping of if ( expr ) stat 1 else stat 2 still valid for conditional expressions ( expr? expr 1 : expr 2 ) 30

Code Generation for Logical Expressions If intermediate code has Boolean type logical operators (and, or, ) Computation of boolean expression in natural way If $ Boolean mapping of booleans to arithmetic values true 1 false 0 Short-circuit evaluation need for explicit jumps to load (a and b) b evaluated only if a = true (a or b) b evaluated only if a = false Possible defining short circuit by conditional expression: ( expr 1 and expr 2 ) º (expr 1? expr 2 : false) (expr 1 or expr 2 ) º (expr 1? true : expr 2 ) ( x? y : z) expr 1 and expr 2 expr 1 or expr 2 x fjp L1 y ujp L2 lab L1 z lab L2 expr 1 fjp L1 expr 2 ujp L2 lab L1 ldc FALSE lab L2 expr 1 fjp L1 ldc TRUE ujp L2 lab L1 expr 2 lab L2 31

Logical Expressions Evaluated in Short Circuit expr 1 and expr 2 expr 1 fjp L1 expr 2 ujp L2 lab L1 ldc FALSE lab L2 expr 1 or expr 2 expr 1 fjp L1 ldc TRUE ujp L2 lab L1 expr 2 lab L2 (x!= 0) && (y == x) lod x ldc 0 neq fjp L1 lod y lod x equ ujp L2 lab L1 ldc FALSE lab L2 (x!= 0) (y == x) lod x ldc 0 neq fjp L1 ldc TRUE ujp L2 lab L1 lod y lod x equ lab L2 32

Procedures of Code Gen. for Control Structures stat if-stat while-stat break other if-stat if ( expr ) stat if ( expr ) stat else stat while-stat while ( expr ) stat expr true false Note: Ambiguous G ambiguity resolution by rule balancing the closest then Node of syntax tree: typedef enum {EXPR, IF, WHILE, BREAK, OTHER} Symbol; typedef struct snode { symbol children val Symbol symbol; struct snode *children[3]; int val; } Node; typedef Node *Pnode; IF if(true) while(true) if(false) break else other EXPR(1) EXPR(1) WHILE IF EXPR(0) BREAK OTHER 33

Procedures of Code Gen. for Control Structures (ii) void gencode(pnode p, char *label) { char code[maxcode], *lab1, *lab2; } switch(p >symbol) { case EXPR: sprintf(code, "ldc %s", (p >val == 0? "FALSE" : "TRUE")); emit(code); break; case IF: gencode(p >children[0], label); lab1 = newlab(); sprintf(code, "fjp %s", lab1); emit(code); gencode(p >children[1], label); if(p >children[2]!= NULL) { lab2 = newlab(); sprintf(code, "ujp %s", lab2); emit(code); } sprintf(code, "lab %s", lab1); emit(code); if(p >children[2]!= NULL) { gencode(p >children[2], label); sprintf(code, "lab %s", lab2); emit(code); } break; expr fjp L1 stat lab L1 expr fjp L1 stat 1 ujp L2 lab L1 stat 2 lab L2 EXPR(1) EXPR(1) EXPR(0) case WHILE: lab1 = newlab(); sprintf(code, "lab %s", lab1); emit(code); gencode(p >children[0], label); lab2 = newlab(); sprintf(code, "fjp %s", lab2); emit(code); gencode(p >children[1], lab2); sprintf(code, "ujp %s", lab1); emit(code); sprintf(code, "lab %s", lab2); emit(code); break; case BREAK: sprintf(code, "ujp %s", label); emit(code); break; case OTHER: emit("other"); break; } IF WHILE IF BREAK OTHER lab L1 expr fjp L2 stat ujp L1 lab L2 34

Procedures of Code Gen. for Control Structures (iii) Notes: label = additional parameter of gencode to generate unconditional jump for break label: changed only in recursive call to gencode relevant to body of while loop in order to exit the inner loop Initial call to gencode label = " " (empty string) break generation of jump to empty label error! lab1, lab2: to save label names relevant to jump definitions still dangling "other" º dummy P-code statement to complete translation mapping 35

Procedures of Code Gen. for Control Structures (iv) IF if(true) while(true) if(false) break else other EXPR(1) EXPR(1) WHILE IF EXPR(0) BREAK OTHER ldc TRUE fjp L1 lab L2 ldc TRUE fjp L3 ldc FALSE fjp L4 ujp L3 ujp L5 lab L4 other lab L5 ujp L2 lab L3 lab L1 expr fjp L1 stat lab L1 expr fjp L1 stat 1 ujp L2 lab L1 stat 2 lab L2 lab L1 expr fjp L2 stat ujp L1 lab L2 36

Complications: Code Generation for Subprograms Diversification of call mechanisms in different target machines Calls: heavily depend on organization of runtime environment (which depends on L) difficult defining intermediate code enough general for ¹ Intermediate code for subprograms: target architectures runtime environments Need for two constructs definition call Runtime environment: built partially by of subprogram caller called function procedure calling sequence Definition of subprogram: Entry instruction body Return instruction Call to subprogram: Begin-argument-computation instruction arguments Call instruction 37

P-code for Subprograms int f(int x, int y) { return(x + y + 1); } ent f lod x lod y adi ldc 1 adi ret f(2+3, 4) mst ldc 2 ldc 3 adi ldc 4 cal f Notes: ret: without parameters (return value on top of stack) mst = "mark stack": corresponding target code allocates AR executing first statements of calling sequence Operands computed from left to right (not necessarily) 38

P-code for Subprograms (ii) program decl-list expr decl-list decl-list decl e decl function id ( param-list ) = expr param-list param-list, id id expr expr + expr call num id call id ( arg-list ) arg-list arg-list, expr expr Notes: Program = (possibly empty) sequence of function definitions + program expression $ variables or assignments, but only parameters, functions, and expressions Unique type: integer " function at least one parameter Operations in expressions addition function call Example of program: function f(x) = 2 + x function g(x,y) = f(x) + y g(3,4) g(3,4) = f(3) + 4 = 9 39

P-code for Subprograms (iii) Node of syntax tree: symbol child1 child2 brother name val typedef enum {PROG, FUNC, PARAM, PLUS, CALL, NUM, ID} Symbol; typedef struct snode { Symbol symbol; struct snode *child1, *child2, *brother; char *name; /* with FUNC, PARAM, CALL, ID */ int val; /* with NUM */ } Node; typedef Node *Pnode; Notes on typology of abstract tree: Root = PROG: associates function declarations (list with head child1) program expression (child2) FUNC child1 = head of parameters child2 = function expression CALL: child1 = head of actual parameters 40

P-code for Subprograms (iv) function f(x) = 2 + x function g(x,y) = f(x) + y g(3,4) PROG PARAM x FUNC f PLUS PARAM x FUNC g PARAM y PLUS NUM 3 CALL g NUM 4 NUM 2 ID x CALL f ID y ID x 41

Notes: P-code for Subprograms (v) void gencode(pnode p) { char code[maxcode]; Pnode t; } switch(p >symbol) { case PROG: t = p >child1; while(t!= NULL){genCode(t); t = t >brother;} gencode(p >child2); break; case FUNC: sprintf(code, "ent %s", p >name); emit(code); gencode(p >child2); emit("ret"); break; case NUM: sprintf(code, "ldc %d", p >val); emit(code); break; case PLUS: gencode(p >child1); gencode(p >child2); emit("adi"); break; case ID: sprintf(code, "lod %s", p >name); emit(code); break; case CALL: emit("mst"); t = p >child1; while(t!= NULL){genCode(t); t = t >brother;} sprintf(code, "cal %s", p >name); emit(code); break; } FUNC: "frames" function body with ent ret PARAM: no code generation! (only previous semantic analysis) CALL: mst + code generation for actual parameters + cal 42

P-code for Subprograms (vi) PROG PARAM x FUNC f NUM 2 PLUS ID x PARAM x FUNC g PARAM y function f(x) = 2 + x function g(x,y) = f(x) + y g(3,4) ID x CALL f PLUS ID y NUM 3 CALL g ent f ldc 2 lod x adi ret ent g mst lod x cal f lod y adi ret mst ldc 3 ldc 4 cal g NUM 4 43