Intermediate generation source program lexical analyzer tokens parser parse tree generation intermediate language The intermediate language can be one of the following: 1. postfix notation, 2. graphical representation (such as syntax tree or dag), 3. three address Example: consider the following assignment statement a := b * -c + b * -c This assignment statement can has a corresponding intermediate representation as one of the following: 1. the postfix notation: a b c - * b c - * + := 2. the graphical representation: (a) syntax tree: := a + b - * * b - c c
(b) dag: := a + * b - c 3. the three address : (a) the three address for the above syntax tree is: t 1 := - c t 2 := b * t 1 t 3 := - c t 4 := b * t 3 t 5 := t 2 + t 4 a := t 5 (b) the three address for the above dag is t 1 := - c t 2 := b * t 1 t 5 := t 2 + t 2 a := t 5 2
Dag: a dag gives the same information as the syntax tree but in a more compact way because common expressions are identified. Three address Three address is a sequence of statements of the general form: x := y op z where x, y, and z are variables, constants, or compiler-generated temporary variables; op stands for any operator. Example: a source language expression like x+y*z might be translated into a sequence t 1 := y * z t 2 := x + t 1 because there is only one operator on the right side of a statement. Here t 1 and t 2 are compiler-generated temporary variables. Three address vs. graphical representation Three address is a linearized representation of a syntax tree or a dag in which explicit names correspond to the interior nodes of the graph. 3
Code generation Code generation is the last phase of the compiler. It takes an (optimized) intermediate as its input and produce the equivalent target. The following figure shows this process: generator optimizer optimized generator target program The generator depends on 1. the target language and 2. the operating system We also need to know issues such as 1. memory management 2. instruction selection 3. register allocation 4. evaluation order 4
1. Memory management Memory management is the process of mapping names in the source program to addresses of data objects in run-time memory. This process is done cooperatively by the front end and the generator. 2. Instruction selection The selected set of instructions depends on the nature of the target machine. If we don t care about the efficiency of the target program, instruction selection is straightforward. Example: The three-address statement of the form x := y + z can be translated into the sequence MOV y, R0 /* load y into register R0 */ ADD z, R0 /* add z to R0 */ MOV R0, x /* store R0 into x */ Unfortunately, this kind of statement-by-statement generation often produces poor. For example, would be translated into MOV b, R0 ADD c, R0 MOV R0, a MOV a, R0 ADD e, R0 MOV R0, d a := b + c d := a + e Here the 4th and 3rd statements are redundant. 5
3. Register allocation Using registers yields a shorter and a faster instructions than using memory locations. The use of registers is divided into: 1. register allocation: selecting variables that will reside in registers 2. register assignment: picking up a specific register 4.Evaluation order The order of performing computations can affect the efficiency of the target The target machine Our target machine is described as follows: 1. it is a byte-addressable machine with 4 bytes to a word and n general-purpose registers, R0, R1,..., Rn 1. 2. it has two-address instructions of the form op source, destination where op is an operator (op-), and source and destination are data fields. Examples of op- are: MOV (move source to destination) ADD (add source to destination) SUB (subtract source from destination) MUL (multiply source to destination) DIV (divide source by destination) 6
Storing values Examples of storing the contents of registers into memory locations can be as follows: 1. MOV R0, M stores the contents of register R0 into memory location M 2. MOV 4(R0), M stores the value contents(4 + contents(r0)) into memory location M 3. MOV *4(R0), M stores the value contents(contents(4 + contents(r0))) into memory location M 4. MOV #n, R0 loads the constant number n into register R0 Example: consider the three-address instruction a := b + c where b and c are simple variables denoting distinct memory locations. The corresponding generated could be: MOV b, R0 ADD c, R0 MOV R0, a 7