LLVM IR Code Generations Inside YACC Li-Wei Kuo
LLVM IR LLVM code representation In memory compiler IR (Intermediate Representation) On-disk bitcode representation (*.bc) Human readable assembly language LLVM IR (*.ll) Our Target LLVM IR is SSA form (Static single assignment form) Each variable is assigned exactly once Use-def chains are explicit and each contains a single element
LLVM command Generate the *.bc $ clang -c emit-llvm a.c o a.bc $ llvm-dis a.bc -o a.ll Generate the *.ll (human-readable) $ clang S emit-llvm a.c o a.ll Using interpreter to run bitcode $ lli test.bc $ lli test.ll
LLVM IR example test1.ll Header test1.c Global Function Local clang Body
LLVM Module In LLVM, a module represents a single unit of code that is to be processed together. A module contains things like global variables, function declarations, and implementations. Format: ; Module ID = file name
Target data layout & triple Data layout - A module may specify a target specific data layout string that specifies how data is to be laid out in memory. Triple - Helper class for working with autoconf configuration names (they used to contain exactly three fields).
Overview of routines extdef: TYPESPEC notype_declarator ';' { if (TRACEON) printf("7 "); set_global_vars($2); } notype_declaratory { if (TRACEON) printf("10 "); cur_scope++; set_scope_and_offset_of_param($1); code_gen_func_header($1); } '{' xdecls { if (TRACEON) printf("10.5 "); set_local_vars($1); } stmts { if (TRACEON) printf("11 "); pop_up_symbol(cur_scope); cur_scope--; code_gen_at_end_of_function_body($1); }
Overview of routines extdef: TYPESPEC notype_declarator ';' { if (TRACEON) printf("7 "); set_global_vars($2); } TYPESPEC notype_declaratory { if (TRACEON) printf("10 "); cur_scope++; set_scope_and_offset_of_param($2); code_gen_func_header($2); } '{' xdecls { if (TRACEON) printf("10.5 "); set_local_vars($2); } stmts { if (TRACEON) printf("11 "); pop_up_symbol(cur_scope); cur_scope--; code_gen_at_end_of_function_body($2); }
Code generation with header file pointer: f_llvm void f(a) float a; { /*... */ } Both C89/90 and C99 still officially support K&R style declarations.
Global variable: int, float, double clang @variable_name = linkage_type global variable_type value, alignment 32-bit x86 alignment: A char will be 1-byte aligned. A short will be 2-byte aligned. An int will be 4-byte aligned. A long will be 4-byte aligned. A float will be 4-byte aligned. A double will be 8-byte aligned.
Code generation with global Vars Only implement integer type without initial value
Local variable: int, float, double clang %variable_name = alloca variable_type, alignment
Setup local variables Only implement integer type without initial value
Function clang define return_type @function_name (parm_type %parm_name) function_attributes { entry: %parm_name.addr = alloca parm_type, alignment store parm_type %parm_name, parm_type* %parm_name.addr, alignment } ret return_type value
Code generation function header Only implement integer return type and no parameter
Code generation function end Only implement integer return type
Arithmetic operation: Add Add 2 operand clang %SSA_form_temp_var = load variable_type % @var, alignment %SSA_form_temp_var = add nsw nuw variable_type % @op1, variable_type % @op2 store variable_type % @var, result_type % @result, alignment nuw and nsw stand for No Unsigned Wrap and No Signed Wrap
Arithmetic operation: Add Add 3 operand a = b + c + d; clang Add 4 operand a = b + c + d + e; clang
Grammar expr_no_commas: primary { } expr_no_commas '+' expr_no_commas { } expr_no_commas '=' expr_no_commas { } expr_no_commas '*' expr_no_commas { } ; primary: IDENTIFIER { } CONSTANT { } STRING { } primary PLUSPLUS { } ;
Grammar expr_no_commas: primary { } expr_no_commas '+' after_expr_no_commas { } expr_no_commas '=' expr_no_commas { } expr_no_commas '*' after_expr_no_commas ; { } after_expr_no_commas: primary { } ; primary: IDENTIFIER { } CONSTANT { } STRING { } primary PLUSPLUS ; Load operand 1 Store result { } Handle int Handle string Handle operand 2 Handle variable Load operand 2
Type conflict Solution: change grammar or use variable to record value. primary: IDENTIFIER { } CONSTANT { } STRING { } primary PLUSPLUS { } ;
Handle SSA Using global counter to store SSA value. Implement +, -, *, / instruction SSA temporal variables.
Load variable: int Global, Local Only implement integer type
Add operation: int Using global variable to store operand value. Handle add instruction SSA temporal variables.
Add operation: int (cont.)
Implement instruction Use node to store each operand and SSA variable. e.g. fprintf(f_llvm, %s = load %s* %s, align %d\n, SSA, type, var, align); fprintf(f_llvm, %s = add nsw %s %s, %s\n, addssa, type, op1, op2); fprintf(f_llvm, store %s %s, %s* %s, align %d\n, type, var, type, result, align);
Store result: int Global, Local Only implement integer type add operation
Optimized IR clang Self-made
Optimized IR (cont.) a = a + 1 + 2 + 3 + 4; clang a = 1 + 2 + 3 + 4 + a; clang
Unimplemented part Declaration initialize Precedence: a + (b + c) Different type: char, string, float, double Function call Signed, unsigned If then else printf
Char, string clang
Type conversion clang
printf clang
Reference LLVM Language Reference Manual http://llvm.org/docs/langref.html lex & yacc, 2nd Edition by John R.Levine, Tony Mason & Doug Brown O Reilly ISBN: 1-56592-000-7
First compiler? Bootstrapping http://en.wikipedia.org/wiki/bootstrapping_%28compilers%29 History of compiler construction http://en.wikipedia.org/wiki/history_of_compiler_writing
Compiler is a software Compiler + + = Machine code, Assembler