CMPSC 160 Translation of Programming Languages Lectures 16: Code Generation: Three- Address Code Three-Address Code Each instruction can have at most three operands Each operand corresponds to a memory address or a temporary We have to break large statements into little operations that use temporary variables x=(2+3)+4 turns into to t1=2+3; x=t1+4; Temporary variables store the results at the internal nodes in the AST 1
Three-Address Code Assignments x := y x := y op z op: binary arithmetic or logical operators x := op y op: unary operators (unary minus, negation, integer to float conversion) Branch goto L Execute the statement with labeled L next Conditional Branch if x relop y goto L relop: <, >, =, <=, >=, ==,!= if the condition holds we execute statement labeled L next if the condition does not hold we execute the statement following this statement next Three-Address Code if (x < y) x = 5*y + 5*y/3; else y = 5; x = x + y; Temporaries: temporaries correspond to the internal nodes of the syntax tree Variables can be represented with their locations in the symbol table if x < y goto L1 goto L2 L1: t1 := 5 * y t2 := 5 * y t3 := t2 / 3 x := t1 + t3 goto L3 L2: y := 5 L3: x := x + y Three-address code instructions can be represented as an array of quadruples: operation, argument1, argument2, result triples: operation, argument1, argument2 (each triple implicitly corresponds to a temporary) 2
Three-Address Code vs. Stack-Based Code Three-Address Code: Good: Compact representation Good: Statement is self contained in that it has the inputs, outputs, and operation all in one instruction Bad: Requires lots of temporary variables Bad: Temporary variables have to be handled explicitly Stack-Based Code: Good: No temporaries, everything is kept on the stack Good: It is easy to generate code for this Bad: Requires more instructions to do the same thing Three-Address Code Attributes: E.place: location that holds the value of expression E (this is the temporary variable that will hold the value of the expression) E.code: sequence of instructions that are generated for E Procedures: newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table denotes concatenation Productions S id := E E E 1 + E 2 E E 1 * E 2 E ( E 1 ) E - E 1 E id E num 3
Three-Address Code Attributes: E.place: location that holds the value of expression E (this is the temporary variable that will hold the value of the expression) E.code: sequence of instructions that are generated for E Procedures: newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table Productions Semantic Rules S id := E id.place lookup(id.name); S.code E.code gen(id.place := E.place); E E 1 + E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place + E 2.place); E E 1 * E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place * E 2.place); E ( E 1 ) E.code E 1.code; E.place E 1.place; E - E 1 E.place newtemp(); E.code E 1.code gen(e.place := uminus E 1.place); E id E.place lookup(id.name); E.code (empty string) E num E.place newtemp(); E.code gen(e.place := num.value); Example x := ( y + z ) * a S.code = t1 := y + z t2 := t1 * a x := t2 S E.place = t2 E.code = t1 := y + z t2 := t1 * a id := E E.place = t1 E.code = t1 := y + z E * E E.place = a E.code = E.place = t1 E.code = t1 := y + z ( E ) id E.place = y E.code = id E + E id E.place = z E.code = 4
Code Generation for Boolean Expressions Two approaches Numerical representation Implicit representation Numerical representation Use 1 to represent true, use 0 to represent false For three-address code store this result in a temporary For stack machine code store this result in the stack Implicit representation For the boolean expressions which are used in flow-of-control statements (such as if-statements, while-statements etc.) boolean expressions do not have to explicitly compute a value, they just need to branch to the right instruction Generate code for boolean expressions which branch to the appropriate instruction based on the result of the boolean expression Boolean Expressions: Numerical Representation, Three-Address Code Attributes : E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E relop.op: operator can be ==,!=, <=, >=, >, < Global variable: nextstat: Returns the location of the next instruction to be generated (each call to gen() increments nextstat by 1) Productions Semantic Rules E E 1 relop E 2 E.place newtemp(); E.code E 1.code E 2.code gen( if E 1.place relop.op E 2.place goto nextstat+3); gen(e.place := 0 ) gen( goto nextstat+2) gen(e.place := 1 ); E E 1 and E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place and E 2.place); E E 1 or E 2 E.place newtemp(); E.code E 1.code E 2.code gen(e.place := E 1.place or E 2.place); 5
Boolean Expressions: Numerical Representation, Three-Address Code (continued) Attributes : E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E Global variable: nextstat: Returns the location of the next instruction to be generated (each call to gen() increments nextstat by 1) Productions Semantic Rules boolean negation E not E 1 E true E false E.place newtemp(); E.code E 1.code gen(e.place := negate E 1.place); E.place newtemp(); E.code gen(e.place := 1 ); E.place newtemp(); E.code gen(e.place := 0 ); Numerical Representation of Boolean Expressions Input boolean expression: x < y and a == b Three address code: Instructions 100-103 are for x < y Instructions 104-107 are for a == b 100 if x < y goto 103 101 t1 := 0 102 goto 104 103 t1 := 1 104 if a = b goto 107 105 t2 := 0 106 goto 108 107 t2 := 1 108 t3 := t1 and t2 These are the locations of the instructions, they are not labels. We could generate code using labels too Stack machine code: Instructions 100-105 are for x < y Instructions 106-111 are for a == b 100 load x 101 load y 102 if_cmplt 105 103 push 0 104 goto 106 105 push 1 106 load a 107 load b 108 if_cmpeq 111 109 push 0 110 goto 112 111 push 1 112 and 6
Boolean Expressions: Implicit Representation, Three-Address Code Attributes : E.code: sequence of instructions that are generated for E E.false: instruction to branch to if E evaluates to false E.true: instruction to branch to if E evaluates to true (E.code is synthesized whereas E.true and E.false are inherited) id.place: location for id Productions Semantic Rules E E 1 relop E 2 E.code E 1.code E 2.code gen( if E 1.place relop.op E 2.place goto E.true) gen( goto E.false); E E 1 and E 2 E 1.true newlabel(); E 1.false E. false; (short-circuiting) This generated label will be inserted to the place for E 1.true in the code generated for E 1 E 2.true E. true; E 2.false E. false; E.code E 1.code gen(e 1.true : ) E 2.code ; These places will be filled with labels later on when they become available Boolean Expressions: Implicit Representation, Three-Address Code (continued) Attributes : E.code: sequence of instructions that are generated for E E.false: instruction to branch to if E evaluates to false E.true: instruction to branch to if E evaluates to true id.place: location for id Productions Semantic Rules E E 1 or E 2 E 1.true E.true; (short-circuiting) E 1.false newlabel(); E 2.true E. true; E 2.false E. false; E.code E 1.code gen(e 1.false : ) E 2.code ; E not E 1 E 1.true E.false; E 1.false E. true; E.code E 1.code; E true gen(`goto E.true); E false gen(`goto E.false); 7
Three-Address Code Example These are the locations of three-address code instructions, they are not labels Input boolean expression: x < y and a == b These labels will be generated later on, and will be inserted to the corresponding places Numerical representation: 100 if x < y goto 103 101 t1 := 0 102 goto 104 103 t1 := 1 104 if a = b goto 107 105 t2 := 0 106 goto 108 107 t2 := 1 108 t3 := t1 and t2 Implicit representation: if x < y goto L1 goto LFalse L1: if a = b goto LTrue goto LFalse... LTrue: LFalse: Three-Address Code, Implicit Representation false attribute will be inserted here Input: x < y and a == b true = false= E code = if x < y goto L1 goto L1:if a = b goto goto true = L1 false= and code = if x < y goto L1 goto E true = false= true attribute will be inserted here code = if a = b goto goto E E relop code = op = < E code = relop E code = op = == E code = place = x id name = x place = y id name = y place = a id name = a place = b id name = b 8
Flow-of-Control Statements If-then-else Branch based on the result of boolean test expression test expression Y N then-block else-block Next block Flow-of-Control Statements: Code Structure We have to decide on the code layout for the code for flow-of-control S if E then S 1 else S 2 E.code to E.true to E.false Labels E.true: S 1.code if E evaluates to true goto S.next E.false: S 2.code if E evaluates to false S.next: 9
Flow-of-Control Statements, Three-Address Code, Assuming Implicit Representation for Boolean Expressions Attributes : S.code: sequence of instructions that are generated for S S.next: label of the instruction that will be executed immediately after S (S.next is an inherited attribute) Productions Semantic Rules ASSUMPTION: the code generated S if E then S 1 else S 2 E.true newlabel(); for boolean expression E will branch E.false newlabel(); to E.true or E.false label based on the S 1.next S. next; result of the expression S 2.next S. next; S.code E.code gen(e.true : ) S 1.code gen( goto S.next) gen(e.false : ) S 2.code ; Flow-of-Control Statements Loops Evaluate condition before loop (if needed) Evaluate condition after loop Branch back to the top if condition holds Pre-test The layout shown here merges loop test with last block of loop body Loop body While, for, do, and until all fit this basic model You can use different layouts for example, put the test before the loop body and branch back to it at the end of the loop body Post-test Next block 10
Flow-of-Control Statements: Code Structure Two different layouts for while statements: The algorithms I give in the following slides use this layout: S while E do S 1 This layout places E.code after S 1.code : S while E do S 1 S.begin: E.true: E.false: E.code S 1.code goto S.begin to E.true to E.false E.true: E.begin: E.false: goto E.begin S 1.code E.code to E.true to E.false Flow-of-Control Statements, Three-Address Code, Assuming Implicit Representation for Boolean Expressions Attributes : S.code: sequence of instructions that are generated for S S.next: label of the instruction that will be executed immediately after S (S.next is an inherited attribute) Productions Semantic Rules S while E do S 1 S.begin newlabel(); E.true newlabel(); E.false S. next; S 1.next S. begin; S.code gen(s.begin : ) E.code gen(e.true : ) S 1.code gen( goto S.begin); S S 1 ; S 2 S 1.next newlabel(); S 2.next S.next; S.code S 1.code gen(s 1.next : ) S 2.code 11
Example Input code fragment: while (a < b) { if (c < d) x = y + z; else x = y z } L1: if a < b goto L2 goto LNext L2: if c < d goto L3 goto L4 L3: t1 := y + z x := t1 goto L1 L4: t2 := y z x := t2 goto L1 LNext:... E.true for a < b E.false for a < b E.true for c < d E.false for c < d Backpatching E.true, E.false, S.next may not be computed in a single pass (they are inherited attributes) Backpatching is a technique for generating labels for E.true, E.false, S.next and inserting them to the appropriate locations Basic idea Keep lists E.truelist, E.falselist, S.nextlist E.truelist (E.falselist): the list of instructions where the label for E.true (E.false) have to be inserted when it becomes available S.nextlist: the list of instructions where the label for S.next have to be inserted when it becomes available When labels E.true, E.false, S.next are computed these labels are inserted to the instructions in these lists 12
x86 C with Static Linking Compile time Run time Executable x86 Assembly x86 Assembly x86 x86 Assembler x86 Object File (.o) x86 Assembler x86 Object File (.o) Libraries At compile time, all of the code is transformed into x86 object files and then the object files are linked together with the libraries to create a statically linked executable. This executable then runs directly on the x86 hardware. Linker x86 Executable One drawback, is if you wanted to run the same code on a Power-PC (even with the same OS) you have to recompile everything. x86 C with Dynamic Linking x86 Assembly x86 Assembler x86 Assembly x86 Assembler Compile time Run time Not-fully Linked x86 Executable Loader Final Executable loaded in Memory x86 Dynamic Libraries x86 Object File (.o) The linker just links together the.o files, and does not link calls to dynamically loaded libraries (DLLs in Windows or Shared Libraries in Unix) x86 Object File (.o) Linker Not-fully Linked x86 Executable This is similar to before, but now the final linking occurs when the program is loaded (or even during program execution) Here, libraries can be shared and they can be updated across the whole system without re-linking every single executable 13
Java Compile time Java Bytecode (class) Run time Java Bytecode (class) Java Classes Java Assembly Java Assembler Java Bytecode (class) Java Assembly Java Assembler Java Bytecode (class) Here the compiler targets java bytecode and the bytecode is then run on top of the Java Virtual Machine (JVM). The JVM really just interprets (simulates) the bytecode like any scripting language. Because of this, any java program compiled to bytecode is portable to any machine that someone has already ported the JVM too. No need to recompile. Java Virtual Machine (bytecode interpreter) x86 Java with Just-in-Time (JIT) Compilation Compile time Java Bytecode (class) Run time Java Bytecode (class) Java Classes Java Assembly Java Assembly Java Assembler Java Assembler Java Bytecode (class) Java Bytecode (class) JIT Here the compiler targets java bytecode (which is what we do in this class) and the bytecode is then run on top of the Java Virtual Machine (JVM). The JVM really just interprets (simulates) the bytecode like any scripting language. Because of this, any java program compiled to bytecode is portable to any machine that someone has already ported the JVM too. No need to recompile. Java Virtual Machine (bytecode interpreter) x86 On Call to Compiled Method On Call or Return back to uncompiled code X86 native code compiled from Java class 14