Lecture 16-18 18 Compiler Middle-End Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1
What We Have Done A lot! Compiler Frontend Defining language Generating scanner and parser Generating parse tree (AST) Performing semantic analysis Type analysis Jianwen Zhu 2009 - P. 2
Compiler Middle End Intermediate Representation (IR) Language-independent data structure to capture program Code generation From AST to IR Data flow analysis Infer program info from IR Optimization Code improvement IR->IR Covered by another course Jianwen Zhu 2009 - P. 3
Intermediate Representation Jianwen Zhu 2009 - P. 4
A Problem Modern compiler handles multiple languages gcc: GNU C Compiler gcc: GNU Compiler collection Assuming M languages + N processors Engineering effort for each lang/proc pair M * N effort Jianwen Zhu 2009 - P. 5
A Solution M + N effort! X86 Lang 1 Tree 1 MIPS Lang 2 Tree 2 IR ARM Lang 3 Tree 3 AVR Intermediate Representation (IR) Jianwen Zhu 2009 - P. 6
TinyC Program state defined by a set of variables Actions defined by statements Simplifying assumptions All statements are in a single procedure, and there are no procedure calls Only support int and bool primitive types Only support one-dimensional array No pointers are supported Jianwen Zhu 2009 - P. 7
TinyC Example: Dot Product int A[100], B[100]; int sum, i; sum = i = 0; while( i < 100 ) { sum = sum + A[i] *B[i]; i = i + 1; } Jianwen Zhu 2009 - P. 8
TinyC Syntax Defined by a set of production rules in Backus-Naur Form (BNF) Key constructs Declarations: define scalars or array variables Statements: assignment or control flow statements Expressions: transformations of scalar, defined primitive or program variable values Jianwen Zhu 2009 - P. 9
TinyC in One Page program: declaration* statement* statement: variable '=' expression ';', 'if' '(' expression ') ' statement ( 'else' statement )*, 'while' ' (' expression ') ' statement, 'break' '; ', '{' declaration* statement* '}' declaration: type identifier ['=' expression] '; ', type identifier '[' expression ']' '; ' type: 'int', 'bool' expression: '-' expression, '!' expression, expression '+' expression, expression '-' expression, expression '*' expression, expression '/' expression, expression '^' expression, expression '>>' expression, expression '<<' expression, expression '&' expression, expression ' ' expression, expression '=' expression, expression '!=' expression, expression '<' expression, expression '<=' expression, expression '>' expression, expression '>=' expression, '(' expression ')', integer, identifier, 'TRUE', 'FALSE', identifier '[' expression ']' Jianwen Zhu 2009 - P. 10
Notations A data type T corresponds to a set T, in particular the integer type Z corresponds to set Z A linked list or arrays whose elements are of type T corresponds to the power set of T, or the set of all subsets of T, denoted as T[] A record with fields a of type A, and b of type B corresponds to a set of named tuples, denoted as <a: A, b: B> A graph R whose nodes are of type A corresponds to a relation R: A x A A hash table or dictionary F that maps a value of type A to a value of type B corresponds to a function F: A --> B Jianwen Zhu 2009 - P. 11
TinyIR Why IR? Decouple optimization algorithms from input languages and target architectures Definition: A TinyIR is a tuple <O, S,V, B> with the following elements: A set O = {lds, sts, lda, sta, ba, br, cnst, +, -, *, /, <<, >>, } of operation codes, which corresponds to the set of all virtual instruction types. A set S of symbols, which corresponds to the scalar and array variables A set V: <opcode: O, src1: V, src2: V, symb: S B Z> of virtual instructions, which corresponds to the expressions and control transfers in the program. A set B: V[] of basic blocks, each containing a sequence of virtual instructions Jianwen Zhu 2009 - P. 12
From TinyC to TinyIR Constructs in TinyC have equivalent representation in TinyIR Declarations correspond to symbols Statements and expressions correspond to virtual instructions Virtual instructions are grouped within different basic blocks Jianwen Zhu 2009 - P. 13
Dot product in TinyIR scalar sum; scalar i; array A[100]; array B[100]; B1: (0) cnst 0 (1) sts (0), sum (2) sts (0), i B2: (3) lds i (4) lda (3), A (5) lda (3), B (6) * (4) (5) (7) lds sum (8) + (6) (7) (9) sts (8), sum (10) cnst 1 (11) + (3) (10) (12) sts (11), i (13) cnst 100 (14) < (11) (13) (15) bt (14), B2 Jianwen Zhu 2009 - P. 14
Code Generation Jianwen Zhu 2009 - P. 15
Assignment Statment sum = 0 (0) cnst 0 (1) sts (0), sum RHS expression Store instruction Symbol for scalar variable Jianwen Zhu 2009 - P. 16
If Statement if( c ) { stmt1; } else { stmt2; } Condition evaluation Branch instruction Fall-through else branch (10) c (11) bt L1 (12) stmt2 (20) ba L2 L1: (30) stmt1 Jump to merge point L2: Then branch Merge point Jianwen Zhu 2009 - P. 17
While Statement (Layout 1) while( c ) { stmt; } Loop entry Condition Evaluation Loop body Loop back L1: (10)!c (11) bt L2 (12) stmt (20) ba L1 L2: Loop exit Jianwen Zhu 2009 - P. 18
While Statement (Layout 2) while( c ) { stmt; } Loop entry Loop body Condition Evaluation Loop back (1) ba L3 L1: (10) stmt L3: (10) c (11) bt L1 L2: Loop exit Jianwen Zhu 2009 - P. 19
Data Flow Analysis Jianwen Zhu 2009 - P. 20
Data Flow Analysis A framework for proving facts about programs Reasons about lots of little facts Little or no interaction between facts Works best on properties about how program computes Based on all paths through program Including infeasible paths Jianwen Zhu 2009 - P. 21
Available Expressions An expression e is available at program point p if e is computed on every path to p, and the value of e has not changed since the last time e is computed on Optimization If an expression is available, need not be recomputed (At least, if it s still in a register somewhere) Jianwen Zhu 2009 - P. 22
Data Flow Facts Is expression e available? Facts: a + b is available a * b is available a + 1 is available Jianwen Zhu 2009 - P. 23
Gen and Kill What is the effect of each statement on the set of facts? Stmt Gen Kill x := a + b a + b y := a * b a * b Jianwen Zhu 2009 - P. 24
Computing Available Expressions {a + b} {a + b, a * b} {a + b, a * b} {a + b} {a + b} {a + b} Ø {a + b} Jianwen Zhu 2009 - P. 25
Terminology A joint point is a program point where two branches meet Available expressions is a forward must problem Forward = Data flow from in to out Must = At join point, property must hold on all paths that are joined Jianwen Zhu 2009 - P. 26
Data Flow Equations Let s be a statement succ(s) = { immediate successor statements of s } pred(s) = { immediate predecessor statements of s} In(s) = program point just before executing s Out(s) = program point just after executing s Jianwen Zhu 2009 - P. 27
Liveness Analysis A variable v is live at program point p if v will be used on some execution path originating from p... before v is overwritten Optimization If a variable is not live, no need to keep it in a register If variable is dead at assignment, can eliminate assignment Jianwen Zhu 2009 - P. 28
Data Flow Equations Available expressions is a forward must analysis Data flow propagate in same dir as CFG edges Expr is available only if available on all paths Liveness is a backward may problem To know if variable live, need to look at future uses Variable is live if used on some path Out(s) = s succ(s) In(s ) In(s) = Gen(s) (Out(s) - Kill(s)) Jianwen Zhu 2009 - P. 29
Gen and Kill What is the effect of each statement on the set of facts? Stmt Gen Kill x := a + b a, b x y := a * b a, b y y > a a, y Jianwen Zhu 2009 - P. 30
Computing Live Variables {a, b} {x, a, b} {x, {x, y, y, a, a} b} {y, a, b} {x} {y, a, b} {x, {x, y, y, a, a} b} Jianwen Zhu 2009 - P. 31
Very Busy Expressions An expression e is very busy at point p if On every path from p, expression e is evaluated before the value of e is changed Optimization Can hoist very busy expression computation What kind of problem? Forward or backward? May or must? backward must Jianwen Zhu 2009 - P. 32
Reaching Definitions A definition of a variable v is an assignment to v A definition of variable v reaches point p if There is no intervening assignment to v Also called def-use information What kind of problem? Forward or backward? May or must? forward may Jianwen Zhu 2009 - P. 33
Space of Data Flow Analyses Forward Backward May Reaching definitions Live variables Must Available expressions Very busy expressions Most data flow analyses can be classified this way A few don t fit: bidirectional analysis Lots of literature on data flow analysis Jianwen Zhu 2009 - P. 34