Intermediate Code Generation 1
Intermediate Code Generation Translating source program into an intermediate language" Simple CPU Independent, yet, close in spirit to machine language Benefits Retargeting is facilitated Machine independent code optimization can be applied 2
Intermediate Code Generation Intermediate codes are machine independent codes, but they are close to machine instructions The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language Syntax trees can be used as an intermediate language Postfix notation can be used as an intermediate language Three-address code (Quadruples) can be used as an intermediate language we will use quadruples to discuss intermediate code generation quadruples are close to machine instructions, but they are not actual machine instructions Some programming languages have well defined intermediate languages java java virtual machine prolog Warren abstract machine In fact, there are byte-code emulators to execute instructions in these intermediate languages 3
A Tree and A DAG as Intermediate Languages assign assign a + a + b uminus b uminus b uminus Pro: Con: c c easy restructuring of code and/or expressions for intermediate code optimization memory intensive c 4
Postfix Notation a := b -c + b -c a b c uminus b c uminus + assign Pro: Con: Postfix notation represents operations on a stack easy to generate stack operations are more difficult to optimize Bytecode (for example) iload 2 // push b iload 3 // push c ineg // uminus imul // iload 2 // push b iload 3 // push c ineg // uminus imul // iadd // + istore 1 // store a 5
Two Representations of A Syntax Tree id b uminus id c assign id a + id b uminus id c 0 id b 1 id c 2 uminus 1 3 0 2 4 id b 5 id c 6 uminus 5 7 4 6 8 + 3 7 9 id a 10 assign 9 8 11... 6
Three-Address Code assign x:= x:= y y op op z z assign a + Tree Dag a + b uminus b uminus t t 1 :=-c 1 :=-c t t 2 :=bt 2 :=bt 1 1 t t 3 :=-c 3 :=-c t t 4 :=bt 4 :=bt 3 3 t t 1 :=-c 1 :=-c t t 2 :=bt 2 :=bt 1 1 t t 5 :=t 5 :=t 2 +t 2 +t 2 2 a:=t a:=t 5 5 b uminus t t 5 :=t 5 :=t 2 +t 2 +t 4 4 c c a:=t a:=t 5 5 c Three-address code is a linearization of the tree/dag 7
Types of Three-Address Code Instructions Binary operations: x:=y op z Unary operations: x:=op y Copy instructions: x:=y Conditional jumps: if x relop y goto L Procedure calls: param x 1 param x 2 param x n call p,n Index assignments: x:=y[i],x[i]:=y Address and pointer assignments: x:=&y, x:=y, x:=y 8
Syntax-Directed Translation Into 3-address Code First deal with assignments and simple expressions Use attributes E.place: the name that will hold the value of E Identifier will be assumed to already have the place attribute defined E.code: hold the three address code statements that evaluate E Use function newtemp that returns a new temporary variable that we can use Use function gen to generate a single three address statement given the necessary information (variable names and operations) 9
The gen Function gen(e.place := E 1.place + E 2.place) Code generation t3 := t1 + t2 10
Code Generation for Expressions Production S id:=e E E 1 +E 2 E E 1 E 2 E -E 1 E ( E 1 ) E id Semantic Rules S.code:=E.code gen(id.place ':=' E.place) E.place:=newtemp; E.code:=E 1.code E 2.code gen(e.place ':=' E 1.place '+' E 2.place) E.place:=newtemp; E.code:=E 1.code E 2.code gen(e.place ':=' E 1.place '' E 2.place) E.place:=newtemp; E.code:=E 1.code gen(e.place ':=' 'uminus' E 1.place) E.place:=E 1.place; E.code:=E 1.code E.place := id.place E.code:='' 11
Code Generation for while Loops Production S while E do S 1 Semantic Rules S.begin:=newlabel; S.after:=newlabel; S.code:=gen(S.begin ':') E.code gen('if' E.place '=' '0' 'goto' S.after) S 1.code gen('goto' S.begin) gen(s.after ':') S.begin : E.code if E.place = 0 goto S.after S 1.code S.after : goto S.begin 12
Example i := 2 n + k while i do i := i - k t1 t1 := := 2 2 t2 t2 := := t1 t1 n n t3 t3 := := t2 t2 + + k k i i := := t3 t3 L1: L1: if if i i = = 0 0 goto goto L2 L2 t4 t4 := := i i - - k k i i := := t4 t4 goto goto L1 L1 L2: L2: 13
Quadruple Representation of Three-Address Statements (0) op uminus arg1 c arg2 result t 1 t 1 t :=- 1 :=c c (1) (2) (3) (4) uminus + b c b t 2 t 1 t 3 t 4 t 2 t 3 t 4 t 5 t 2 t :=b 2 :=b t 1 t 1 t 3 t :=- 3 :=c c t 4 t :=b 4 :=b t 3 t 3 t 5 t :=t 5 :=t 2 + 2 + t 4 t 4 a:=t a:=t 5 5 (5) := t 5 a Temporary names must be entered into the symbol table as they are created 14
Triple Representation of Three-Address Statements (0) op uminus arg1 c arg2 t 1 t :=- 1 :=c c (1) (2) (3) uminus b c b (0) (2) t 2 t :=b 2 :=b t 1 t 1 t 3 t :=- 3 :=c c t 4 t :=b 4 :=b t 3 t 3 t 5 t :=t 5 :=t 2 + 2 + t 4 t 4 (4) + (1) (3) a:=t a:=t 5 5 (5) assign a (4) Temporary names are not entered into the symbol table 15
Other Types of 3-address Statements Ternary operations like x[i]:=y x:=y[i] require two or more entries: op arg1 arg2 op arg1 arg2 (0) []= x i (0) =[] y i (1) assign (0) y (1) assign x (0) x[i] := y x := y[i] 16
Indirect Triples Representation of Three-Address Statements Instruction op arg1 arg2 (0) (14) (14) uminus c (1) (15) (15) b (14) (2) (16) (16) uminus c (3) (17) (17) b (16) (4) (18) (18) + (15) (17) (5) (19) (19) assign a (18) 17
Declarations P { offset := 0 } D D D ; D D id : T { enter(id.name, T.type, offset); offset := offset + T.width } T char { T.type := char; T.width := 1 } T integer { T.type := integer; T.width := 4 } T real { T.type := real; T.width := 8 } T ^T 1 { T.type := pointer(t 1.type); T.width := 4 } T array [ num ] of T 1 { T.type := array(1..num.val, T 1.type) ; T.width := num.val T 1.width } name a b c type char pointer(real) real offset 0 1 5 18
Nested Procedure Declarations For each procedure we should create a symbol table. mktable(previous) create a new symbol table where previous is the parent symbol table of this new symbol table enter(symtable,name,type,offset) create a new entry for a variable in the given symbol table. enterproc(symtable,name,newsymbtable) create a new entry for the procedure in the symbol table of its parent. addwidth(symtable,width) puts the total width of all entries in the symbol table into the header of that table. We will have two stacks: tblptr to hold the pointers to the symbol tables offset to hold the current offsets in the symbol tables in tblptr stack. 19
Symbol Table for Nested Procedures sort null a x header readarray exchange to readarray's symbol table to exchange's symbol table quicksort readarray exchange quicksort i header header k v partition header i j header partition 20
Nested Procedure Declarations Production P MD M D D 1 ;D 2 D proc id; ND 1 ; S D id: T N Semantic Rules addwidth(top(tblptr), top(offset)); pop(tblptr); pop(offset) t:=mktable(nil); push(t, tblptr); push(0,offset) t:=top(tblptr); addwidth(t,top(offset)); pop(tblptr); pop(offset); enterproc(top(tblptr), id.name, t) enter(top(tblptr), id.name, T.type, top(offset)); top(offset) := top(offset)+t.width t:=mktable(top(tblptr)); push(t,tblptr); push(0,offset) 21
Code Generation for Expressions Production S id:=e E E 1 +E 2 E E 1 E 2 E - E 1 E ( E 1 ) E id Semantic Rules p:=lookup(id.name); if p<> nil then emit(p ':=' E.place) else error E.place:=newtemp; emit(e.place ':=' E 1.place '+' E 2.place) E.place:=newtemp; emit(e.place ':=' E 1.place '' E 2.place) E.place:=newtemp; emit(e.place ':=' 'uminus' E 1.place) E.place := E 1.place p:= lookup(id.name); if p<> nil then E.place:=p else error 22
Addresing of Array Elements A[low..high] A[i] base + (i - low) w i w + (base - loww) A[low 1..high 1, low 2..high 2 ] A[i 1,i 2 ] n 2 = high 2 - low 2 + 1 base + ( (i 1 - low 1 ) n 2 + i 2 - low 2 ) w ( (i 1 n 2 + i 2 ) w + (base - ( (low 1 n 2 ) + low 2 ) w) 23
Conversion of Types E E 1 + E 2 { E.place := newtemp; if E 1.type = integer and E 2.type = integer then begin emit (E.place ':=' E 1.place 'int+' E 2.place); E.type := integer; end else if E 1.type = real and E 2.type = real then begin emit (E.place ':=' E 1.place 'real+' E 2.place); E.type := real; end else if E 1.type = integer and E 2.type = real then begin u := newtemp; emit (u ':=' 'inttoreal' E 1.place); emit (E.place ':=' u 'real+' E 2.place ); E.type := real; end else if E 1.type = real and E 2.type = integer then begin u := newtemp; emit (u ':=' 'inttoreal' E 2.place); emit (E.place ':=' E 1.place 'real+' u); E.type := real; end else E.type := type_error } x := y + i j t 1 := i int j t 3 := inttoreal t 1 t 2 := y real+ t 3 x := t 2 24
Boolean Expressions E E or E E and E not E ( E ) id relop id true false a or b and not c t 1 := not c t 2 := b and t 1 t 3 := a or t 2 a < b if a < b then 1 else 0 100: if a < b goto 103 101: t := 0 102: goto 104 103: t := 1 104: 25
Boolean Expressions E E 1 or E 2 { E.place := newtemp; emit (E.place ':=' E 1.place 'or' E 2.place) } E E 1 and E 2 { E.place := newtemp; emit (E.place ':=' E 1.place 'and' E 2.place) } E not E 1 { E.place := newtemp; emit (E.place ':=' 'not' E 1.place) } E ( E 1 ) { E.place := E 1.place } E id 1 relop id 2 { E.place := newtemp; emit ('if' id 1.place relop.op id 2.place 'goto' nextstat + 3); emit (E.place ':=' '0') emit ('goto' nextstat + 2) emit (E.place ':=' '1') } E true { E.place := newtemp; emit (E.place ':=' '1') } E false { E.place := newtemp; emit (E.place ':=' '0') } a < b or c < d and e < f 100: if a < b goto 103 101: t 1 := 0 102: goto 104 103: t 1 := 1 104: if c < d goto 107 105: t 2 := 0 106: goto 108 107: t 2 := 1 108: if e < f goto 111 109: t 3 := 0 110: goto 112 111: t 3 := 1 112: t 4 := t 2 and t 3 113: t 5 := t 1 or t 4 26
Control Instructions S if E then S 1 S if E then S 1 else S 2 do E.true E.code do E.false do E.true E.true : E.false : S 1.code E.true : E.code S 1.code do E.false S.begin : S while E do S 1 E.code do E.true do E.false E.false : S.next : goto S.next S 2.code E.true : S 1.code E.false : goto S.begin 27
S if E then S 1 Dyntax-Directed Definition for Flow-of-Control Statements E.true := newlabel; E.false := S.next; S 1.next := S.next; S.code := E.code gen(e.true ':') S 1.code E.true : E.false : E.code S 1.code do E.true do E.false S if E then S 1 else S 2 E.true := newlabel; E.false := newlabel; S 1.next := S.next; S 2.next := S.next; S.code := E.code gen(e.true ':') S 1.code gen('goto' S.next) gen(e.false ':') S 2.code E.true : E.false : E.code S 1.code goto S.next S 2.code do E.true do E.false S while E do S 1 S.begin := newlabel; E.true := newlabel; E.false := S.next S 1.next := S.begin; S.code := gen(s.begin ':') E.code gen(e.true ':') S 1.code gen('goto' S.begin) S.next : S.begin : E.code E.true : S 1.code do E.true do E.false E.false : goto S.begin 28
Boolean Expression - "Short-Circuit" Translation E E 1 or E 2 E E 1 and E 2 E not E 1 E ( E 1 ) E 1.true := E.true; E 1.false := newlabel; E 2.true := E.true; E 2.false := E.false; E.code := E 1.code gen(e 1.false ':') E 2.code E 1.true := newlabel; E 1.false := E.false; E 2.true := E.true; E 2.false := E.false; E.code := E 1.code gen(e 1.true ':') E 2.code E 1.true := E.false; E 1.false := E.true; E.code := E 1.code; E 1.true := E.true; E 1.false := E.false; E.code := E 1.code; E id 1 relop id 2 E.code := gen( 'if' id 1.place relop.op id 2.place 'goto' E.true) gen( 'goto' E.false ) Code for E= a<b: if a < b goto E.true goto E.false E true E.code := gen( 'goto' E.true ) E false E.code := gen( 'goto' E.false ) 29
Boolean Expression - "Short-Circuit" Translation a < b or c < d and e < f if a < b goto Ltrue goto L1 L1: if c < d goto L2 goto Lfalse L2: if e < f goto Ltrue goto Lfalse while a < b do if c < d then x := y + z else x := y - z L1: if a < b goto L2 goto Lnext L2: if c < d goto L3 goto L4 L3: t 1 := y + z x := t 1 goto L 1 L4: t 2 := y - z x := t 2 goto L 1 Lnext: 30
Boolean Expressions - Mixed-Mode (a + b) < c (a < b) + (b < a) E E + E E and E E relop E id E E 1 + E 2 E.type := arith; if E 1.type = arith and E 2.type = arith then begin / arithmetic addition / E.place := newtemp; E.code := E 1.code E 2.code gen(e.place ':=' E 1.place '+' E 2.place) end else if E 1.type = arith and E 2.type = bool then begin E.place := newtemp; E 2.true := newlabel; E 2.false := newlabel; E.code := E 1.code E 2.code gen(e 2.true ':' E.place ':=' E 1.place '+' 1) gen('goto' nextstat + 1) gen(e 2.false ':' E.place ':=' E 1.place) else if... E 2.true : E.place := E 1.place + 1 goto nextstat + 1 E 2.false : E.place := E 1.place 31
Translating Short-Circuit Expressions Using Backpatching For the examples of the previous lectures for implementing syntax-directed definitions, the easiest way is to use two passes. First syntax tree is constructed and is then traversed in depth-first order to compute the translations given in the definition The main problem in generating three address codes in a single pass for Boolean expressions and flow of control statements is that we may not know the labels that control must go to at the time jump statements are generated 32
Translating Short-Circuit Expressions Using Backpatching This problem is solved by generating a series of branch statements with the targets of the jumps temporarily left unspecified. Each such statement will be put on a list of goto statements whose labels will be filled in when the proper label can be determined. This subsequent filling of addresses for the determined labels is called BACKPATCHING. 33
Syntax-Directed Definition for Backpatching E E or M E E and M E not E ( E ) id relop id true false M ε Synthesized attributes: E.code three-address code E.truelist backpatch list for jumps on true E.falselist backpatch list for jumps on false M.quad location of current three-address quad 34
Backpatch Operations with Lists makelist(i) creates a new list containing threeaddress location i, returns a pointer to the list merge(p 1, p 2 ) concatenates lists pointed to by p 1 and p 2, returns a pointer to the concatenated list backpatch(p, i) inserts i as the target label for each of the statements in the list pointed to by p 35
Backpatching with Lists: Example a < b or c < d and e < f 100: if a < b goto _ 101: goto _ 102: if c < d goto _ 103: goto _ 104: if e < f goto _ 105: goto _ backpatch 100: if a < b goto TRUE 101: goto 102 102: if c < d goto 104 103: goto FALSE 104: if e < f goto TRUE 105: goto FALSE 36
Backpatching with Lists: Translation Scheme M ε { M.quad := nextquad } E E 1 or M E 2 { backpatch(e 1.falselist, M.quad); E.truelist := merge(e 1.truelist, E 2.truelist); E.falselist := E 2.falselist } E E 1 and M E 2 { backpatch(e 1.truelist, M.quad); E.truelist := E 2.truelist; E.falselist := merge(e 1.falselist, E 2.falselist); } E not E 1 { E.truelist := E 1.falselist; E.falselist := E 1.truelist } E ( E 1 ) { E.truelist := E 1.truelist; E.falselist := E 1.falselist } E id 1 relop id 2 E true E false { E.truelist := makelist(nextquad); E.falselist := makelist(nextquad + 1); emit( if id 1.place relop.op id 2.place goto _ ); emit( goto _ ) } { E.truelist := makelist(nextquad); E.falselist := nil; emit( goto _ ) } { E.falselist := makelist(nextquad); E.truelist := nil; emit( goto _ ) } 37
Backpatching Example E.t = {100} E.f = {101} a < b E.t = {100,104} E.f = {103,105} or M.q = {102} ε E.t = {102} E.f = {103} E.t = {104} E.f = {103,105} and M.q = {104} ε 100: if a < b goto _ 101: goto _ 102: if c < d goto _ 103: goto _ 104: if e < f goto _ 105: goto _ E.t = {104} E.f = {105} c < d e < f 38
Flow-of-Control Statements and Backpatching: Grammar S if E then S if E then S else S while E do S begin L end A L L ; S S Synthesized attributes: S.nextlist backpatch list for jumps to the next statement after S (or nil) L.nextlist backpatch list for jumps to the next statement after L (or nil) Jumps out of S 1 100: Code for S1 S 1 ; S 2 ; S 3 ; S 4 ; S 4 backpatch(s 1.nextlist, 200) 200: Code for S2 300: Code for S3 backpatch(s 2.nextlist, 300) 400: Code for S4 backpatch(s 3.nextlist, 400) 500: Code for S5 backpatch(s 4.nextlist, 500) 39
Flow-of-Control Statements and Backpatching S A { S.nextlist := nil } S begin L end { S.nextlist := L.nextlist } S if E then M S 1 { backpatch(e.truelist, M.quad); S.nextlist := merge(e.falselist, S 1.nextlist) } L L 1 ; M S { backpatch(l 1.nextlist, M.quad); L.nextlist := S.nextlist; } L S { L.nextlist := S.nextlist; } M ε { M.quad := nextquad } S if E then M 1 S 1 N else M 2 S 2 { backpatch(e.truelist, M 1.quad); backpatch(e.falselist, M 2.quad); S.nextlist := merge(s 1.nextlist, merge(n.nextlist, S 2.nextlist)) } S while M 1 E do M 2 S 1 { backpatch(s 1,nextlist, M 1.quad); backpatch(e.truelist, M 2.quad); S.nextlist := E.falselist; emit( goto M 1.quad ) } N ε { N.nextlist := makelist(nextquad); emit( goto _ ) } 40