Advanced Compilers Code Generation Fall. 2017 Chungnam National Univ. Eun-Sun Cho 1
Backend of Compilers Machine -independent Optimization Machine-independent Optimization Virtual to physical Mapping / Machine-dependent Optimization Instruction Selection Instruction Scheduling Register Allocation Machine Code Emission/Opti Backend = Code generation + Optimization
Storage Management Exception Handling Instruction Selection Register Allocation Code Generation 3
Storage Management 4
Management of Storage In compiler generated machine codes, memory management codes play critical roles. #include <stdio.h> void main(){ int i; printf( Hello, CSE!\n ); }.file "s09.c".section.rodata.lc0:.string "Hello, CSE!".text.globl main.type main, @function main: pushl %ebp allocate stack movl %esp, %ebp memory for subl $20, %esp int i, and the string movl $.LC0, (%esp) parameter call puts compile addl $20, %esp popl %ebp ret.size main,.-main 5
2 Classes of Storage in Process Registers Fast access Invisible for users (programmers) in most cases NO indirect access is allowed Memory (relatively) Slow access, indirect accesses are allowed Candidates: Globals/statics, Composite types (structs, arrays..), Variables accessed via & operator *Whether a variable is translated as a register or a memory variable should be determined in the middle of HIR to LIR translation
4 Categories of Memory Code space : an area of memory for instruction sequence read-only, if possible Static (or Global) an area of memory for a set of variables with the same life time as the program Stack an area of memory for a set of local variables (with block life time) Heap an area of dynamically allocated memory by System calls (via malloc, new, etc.)
Memory Organization Stack... Heap Static Data Code stack, heap : variable sizes at runtime Stack: grows upward Heap; grows downward The relative positions of stack/heap might be switched code, static data : fixed sizes (by the compiler)
Executable Formats Windows PE (Portable Excutable) ELF (Executable and Linkable Format) 9
Memory Organization Stack... Heap Static Data Code stack, heap : variable sizes at runtime Stack: grows upward Heap; grows downward The relative positions of stack/heap might be switched code, static data : fixed sizes (by the compiler)
Run-time stack A stack made of frames one frame (or an activation record) for each function call Activation record : execution environment for execution of a corresponding function Each call has one frame even for recursive calls contents: local variables, arguments, return values, other temporary storage... Heap allocation a contiguous portion of the global area, returned from OS operations for memory-request and memory-return during the program execution are necessary, otherwise, garbage collection should be supported in the programming language keep available memory categorized into free section and in-use section (see OS textbook!)
Initial Stack Frame (startup state) Command line arguments argc, argv Environment variables (env) NULL env[0] env[1] env[n] NULL argv[1] argv[arc-1] argv[0] argc end of environment (integer) environment variables (pointers) end of args (integer) program args (pointer) program name (poiner) argument counter (integer) bottom top <Initial stack layout for ELF binaries> A figure in http://asm.sourceforge.net/articles/startup.html#st used with some modificcation address decreasing
stack system env / argv / argc stack frame for main() available for stack growth higher addr. ebp of main() stack pointer : esp Runtime Layout (ELF) shared library malloc.o (lib*.so) printf.o (lib*.so) library functions (dynamically linked) available for heap heap data text heap (malloc(), calloc(), new) int x; (global var) int y = 100; (global var) xx.o (lib*.a) xxx.o (lib*.a) file.o what existed from before loading.text,.data.. library functions (static linked) What if main() calls function func(72, 73)? (a.out) main.o func(72,73); crt0.o (startup routine) lower addr. 13
stack system env / argv / argc stack frame for main() stack frame for func() available for stack growth higher addr. ebp of main() ebp of func() stack pointer : esp Runtime Layout (ELF) ; while executing func() shared library malloc.o (lib*.so) printf.o (lib*.so) library functions (dynamically linked) available for heap heap data text heap (malloc(), calloc(), new) int x; (global var) int y = 100; (global var) xx.o (lib*.a) xxx.o (lib*.a) file.o (a.out) main.o func(72,73); crt0.o (startup routine) what existed from before loading.text,.data.. library functions (static linked) lower addr. What if main() calls function func(72, 73)? 14
Functions and Run-time stacks Call/return of a function and run-time stack operation when f is called, push f s frame to RT stack when f is returned, pop-up f s frame from RT stack Top frame = frame of the function currently being executed How to access the top frame? Stack pointer (esp): top position of the frame Base pointer (ebp): base position of the frame A local variable is accessed via its offset from FP (or SP) Role of Compiler to generate codes which force the above system 15
When main()calls func(72, 73) stack system env / argv / argc mpf main() s local variables +12 73 y frame for main() +8 72 x +4 ra return address ebp 0 mpf caller s frame pointer func( -4 garbage a -8 garbage b[2] -12 garbage b[1] frame for func() { int x, int y) int a; -16 garbage b[0] int b[3]; esp available for stack growth 16
mpf ebp esp stack Variables and Arguments func(72, 73) system env / argv / argc main() s local variables +12 73 y +8 72 x +4 ra return address 0 mpf caller s frame pointer -4 garbage a -8 garbage b[2] -12 garbage b[1] -16 garbage b[0] available for stack growth [ebp+4] : return address [ebp+8] : 72, that is, x [ebp+12] : 73, that is, y [ebp] : main() s ebp a b[1] frame for main() frame for func() : [ebp-4] : [ebp-12] func( { int x, int y) int a; int b[3]; 17
mpf ebp esp stack Variables and Arguments system env / argv / argc func(72, 73) main() s local variables +12 73 y +8 72 x +4 ra return address 0 mpf caller s frame pointer -4 garbage a -8 garbage b[2] -12 garbage b[1] -16 garbage b[0] available for stack growth push 73 ; y push 72 ; x call func ; frame for main() frame for func() func( { int x, int y) int a; int b[3]; 18
Alignment In most cases, a variable is aligned based on its size eg. C/C++ : char byte aligned, short halfword aligned, int word aligned char w; eg. int x[3] char y; short z; char w 1 byte x[3] 12 bytes, starting at a word aligned address (3 empty bytes between w and x) char y 1byte, starting at any address short z 2 bytes, starting at a halfword aligned address (1 empty byte between y and z) Total size = 20 bytes!
Alignment of Structures struct { char w; int x[3] char y; short z; } fields in struct : align to the largest field size eg. the largest field is int (4 bytes) size of the struct : a multilcation of 4 starting address of the struct : also a multiplication of 4 word aligned
Example. GCC-x86 4.7.x 16 for i and 4 for the string pointer in gcc-x86 16 for i, because of their own alignment policy #include <stdio.h> void main(){ } int i; printf( Hello,CSE!\n );.file "s09.c".section.rodata.lc0:.string "Hello, CSE!".text.globl main.type main, @function main: pushl %ebp movl %esp, %ebp subl $20, %esp movl $.LC0, (%esp) compile call puts addl $20, %esp popl %ebp ret.size main,.-main allocate stack memory for int i, and the string parameter 21
Exception Handling Codes 22
Exceptions Exception is for error-handling invalid input invalid resource state file not exists, network error, erroraneous execution condition divide-by-zero, In real production code, error-handling code may be a large part (30%-50% or more) 23
C++ #include <iostream> #include <fstream> using namespace std; int main () { ifstream file; //Set the state flags for which a failure exception is thrown. file.exceptions ( ifstream::failbit ifstream::badbit ); try { file.open ("test.txt"); while (!file.eof()) file.get(); } catch (ifstream::failure e) { } cout << "Exception opening/reading file"; file.close(); class ios_base::failure : public exception { // the exceptions thrown by the elements of // the standard input/output library public: explicit failure (const string& msg); virtual ~failure(); virtual const char* what() const noexcept; } flag values of std::ios_base::iostate eofbit failbit badbit goodbit 24
Java InputStream input = null; try{ input = new FileInputStream("c:\\data\\input-text.txt"); int data = input.read(); while(data!= -1) { //do something with data... dosomethingwithdata(data); data = input.read(); } }catch(ioexception e){ //do something with e... log, perhaps rethrow etc. } finally { if(input!= null) input.close(); } Note : C++ does not support 'finally' blocks. 25
Throw int main () { } try { throw 20; } catch (int e) { } cout << "An exception occurred. Exception Nr. " return 0; << e << endl; An exception occurred. Exception Nr. 20 26
Chaining InputStream input = null; try{ input = new FileInputStream("c:\\data\\input-text.txt"); int data = input.read(); while(data!= -1) { //do something with data... dosomethingwithdata(data); data = input.read(); } }catch(ioexception e){ throw new MyException(); } 27
What Should Do for An Exception Occurs try { f(1); Object x; g(2); }catch (Exc) { } when an exception occurs when an exception occurs // handler goto handler A destroy x + goto handler A handler A: catches type Exc Note: Try can be nested, so the handlers are organized in a stack 28
Basic Exception Handling Mechanism 1 Setjmp/longjmp-based global goto C s primitive exception 2 Table-driven method more complex and more space usage but faster 29
1 Setjmp/longjmp #include < setjmp.h > main() { jmp_buf env; int i; i = setjmp(env); printf("i = %d\n", i); if (i!= 0) exit(0); longjmp(env, 2); printf("get printed?\n"); } setjmp() : save the contents of the registers longjmp() : restore them later. ``returns'' to the state of the program when setjmp() was called. $ sj1 i = 0 i = 2 $ _ First, we call setjmp(), and it returns 0. Then we call longjmp() with a value of 2, which causes the code to return from setjmp() with a value of 2. That value is printed out, and the code exits. ( get printed? will not be printed) 30
Setjmp/longjmp Approach buffer buf; void f() { if (0 == setjmp(buf)) g(); } void g() { h(); } void h() { longjmp(buf, 1); } struct context { int ebx; int edi; int esi; int ebp; int esp; int eip; }; typedef struct context buffer[1]; 31
Setjmp/longjmp Approach Conts' buffer buf; void f () { if (0==setjmp (buf)) g (); else k(); } void g () { h (); } void h () { longjmp (buf, 1); } try..catch Save the context before try block This context also calls handler Handle exception with k() throws Fetch the handler, restore machine states and jump to the handler s code 32
2 Table Driven Approach Table 1 : Each throw point to its action table from the program counter (PC) at the point where the exception is thrown to an action table Table 2 : Action table perform the various operations required for exception processing eg. invoking destructors adjusting the stack matching the exception type to the address of an exception handling 33
Discussions All variables that are declared outside the try block have to be restored to their initial value Lecture s = new Lecture(); // s.lecturer is assumed initially null try { s.lecturer = new ThatMan(); FileInputStream(); // exception! // s.lecturer (in memory) should be restored... } catch (IOException e) {...} 34
Discussions Setjmp/longjmp approach setjump should be called at the beginning of every try block even if no exception is ever thrown list of buf must be maintained list of objects on the stack must be maintained (in C++) 35
Discussions Conts Table driven approach Mostly used Significantly more efficient than setjmp/longjmp approach Table themselves have to encode a lot of possible actions Space problem Reorganizing the code implies reorganizing the table accordingly Vulnerable to attack Compiler optimization should not be allowed void f(){ int x = 0; // dead code, but cannot be optimized out try { x = f1(x); } catch ( ) { cout << ;} } 36
Exception Handling in GIMPLE throw is NOT directly supported BUT by function calls 37
invoking destructors and adjusting the stack 38
for throwing an exception ref 39
Instruction Selection 40
Low-level, Tree-based Intermediate Representation Tree-based IR With abstract machine instructions used in machine code generation eg) from Tiger book MEM cf. RTL BINOP PLUS CONST e c e + c 41
Tree-based Intermediate Representation from Tiger book MEM(e) : this means the value of one word of memory starting at the address e. When this is used at left-hand side of MOVE, it is interpreted as store, otherwise it means a fetch operation TEMP(t) : register t SEQ (s1, s2) : after evaluation of statement s1, statement s2 is evaluated ESEQ(s,e) : statement s evaluated for side effects and then e is evaluated for a result BINOP(o, e1, e2) : o is a binary operator like PLUS and MINUS. The result is the evaluation of o with e1 and e2 as operands This result is saved in memory and the address is returned const(i) : integer constant i 42
Simple Equivalence Relationships We can choose one among the sub-trees of the same semantics ESEQ ESEQ s1 ESEQ SEQ e s2 e s1 s2 43
op BINOP e1 ESEQ s e2 ESEQ MOVE TEMP e1 t ESEQ s BINOP op TEMP e2 t BINOP BINOP ESEQ op e1 ESEQ op ESEQ e2 s BINOP s e2 s e1 op e1 e2 44
More Instruction Selection (Option1) MOVE MEM MEM BINOP BINOP PLUS MEM BINOP PLUS TEMP CONST PLUS BINOP TEMP fp MULT TEMP CONST i a CONST 4 fp x 45
More Instruction Selection (Option2) MOVE MEM MEM BINOP BINOP PLUS MEM BINOP PLUS TEMP CONST PLUS BINOP TEMP fp MULT TEMP CONST i a CONST 4 fp x 46
Equivalence of The Machine Codes LOAD r1 M[fp+a] ADDI r2 r0 + 4 MUL r2 ri r2 ADD r1 r1 + r2 LOAD r2 M[fp+x] STORE M[r1+0] r2 LOAD r1 M[fp+a] ADDI r2 r0 + 4 MUL r2 ri r2 ADD r1 r1 + r2 LOAD r2 fp + x STORE M[r1] M[r2] 47
Register Allocation 48
Operand in Low Level IR Review Operands Virtual registers We assume infinitely many virtual registers Special registers stack pointer, pc, Literals We assume there is no limits of values of literals Symbolic names in most cases, labels 49
Register Allocation Motivation Virtual register (VR) Although we assume infinitely many virtual registers The number of actual registers is finite, and various from machine to machine Register allocation Put as many as VRs to physical registers, and allocate the remained VRs to memory Optimization for the best performance : put frequently used VRs to physical registers Spilling : allocating virtual registers to memory, inevitably
Interference Interference : two different definitions have a common operations in their live ranges Live range : generated from liveness analysis and reaching definition analysis Interference graph Nodes of the graph = variables Edges : linked if two nodes interfere each other 1: a = 0 2: b = a 3: b*b 4: c = 2 5: a*c+3 b a c For def1 a = {1,2,3,4,5} For def2 b = {2,3} For def4 c = {4,5} examples and materials from Princeton Univ. 51
Graph Coloring Graph Coloring Used to allocate virtual registers (that is, variables) to physical registers Linked nodes should be painted in different colors Simple example: Two registers : 2-coloring (two colors) color register 1: a = 0 2: b = a 3: b*b 4: c = 2 5: a*c+3 b a c eax ebx
K-Graph Coloring Algorithm Kempe s algorithm [1879] --- Old problem Step 1 (simplify) Find a node linked with less than k edges, and cut that node with the edges linked to it save these to a stack Step 2 (color) if a remaining graph is a simplied subgraph and can be k-graphed colored pop a node (and all the related edges pushed together) from the stack, and color the node in different colors from all the neighbor nodes Step 3 (Spill) optional If failed with above the algorithm Actually Step1~step2 is not applicable to many cases Graph coloring is NP-complete problem Solution : select several (victim) variables and allocate them to memory 53
Step 1 stack: a stack: a c b c e c b c d e d e stack: b a e c b a c stack: a e c b a c d e d e
stack: b a e c b a c Step 2 stack: a e c b a c d e d e stack: a stack: a c b c e c b c d e d e
Case of Step 3(1) Some lucky cases! color register eax a ebx b c stack: d d e all nodes have 2 neighbours!
Case of Step 3 (2) But there exist graphs where coloring with only k colors is impossible spilling! a b c d e no colors left for e or a!
Spilling code Code rewriting Introduce new temporary, and rewrite codes eg. Assuming that t2 is supposed to be spilled Then, add t1, t2 will be; define a memory area bound to to-be-spilled variables (here, t2) eg. [ebp-24] in runtime stack and introduce a new temporary variable (here, t35) mov t35, [ebp 24] add t1, t35 note : t35 s live range is very short (one or two commands) so possibility of interference is very low (much less than t2)