Generalizing Data-flow Analysis Announcements Project 2 writeup is available Read Stephenson paper Last Time Control-flow analysis Today C-Breeze Introduction Other types of data-flow analysis Reaching definitions, available expressions, reaching constants Abstracting data-flow analysis What s common among the different analyses? Introduce lattice-theoretic frameworks CS553 Lecture Generalizing Data-flow Analysis 2 The C-Breeze Compiler Characteristics A C source-to-source translator Implemented in C++ Makes heavy use of templates and the Standard Template Library Organized as a set of phases Parse an ANSI C program, producing an AST Dismantle the C program into a LIR form Emit C code as output Can perform various analyses and transformations Can create new phases easily Uses common design patterns: walkers and changers CS553 Lecture Generalizing Data-flow Analysis 3 1
C-Breeze Structure ANSI C cbz myphase.o libc-breeze.a ANSI C, call graph, assembly, statistics, etc. Output is determined by the phases that are invoked CS553 Lecture Generalizing Data-flow Analysis 4 C-Breeze Terminology Phase An organizational unit of work (analysis, transformation, or both) Translation unit A file of a source program Walker Uses visitor pattern to traversing the AST Changer Uses visitor pattern for traversing and changing nodes in the AST CS553 Lecture Generalizing Data-flow Analysis 5 2
Walker Example (src/main/print_walker.*) void print_walker::at_basicblock(basicblocknode * the_bb, Order ord) { if (ord == Preorder) { indent(the_bb); _out << "basicblock: "; if (the_bb->decls().empty()) _out << "(no decls) "; if (the_bb->stmts().empty()) _out << "(no stmts) "; _out << endl; in(); } } if (ord == Postorder) out(); print_walker pw; unit_list_p u; // for each translation unit in the program... for (u=cbz::program.begin(); u!= CBZ::Program.end(); u++) //... walk the AST using the CountAssg walker (*u)->walk (pw); CS553 Lecture Generalizing Data-flow Analysis 6 Other C-Breeze Information No Symbol Table Alias Analysis int main() { int a[10],b,c,d; } a[3] = 4; b = 5+3; c = c+b + a[3]; return 0;... a[3] = 4; T2 = 8; b = 8; T3 = c + 8; T4 = T3 + a[3]; c = T4;... CS553 Lecture Generalizing Data-flow Analysis 7 3
Evaluating Compiler Infrastructures Efficiency Execution time for parsing and transformation Memory requirements Speed of resulting code Front-ends and back-ends Which ones are available and how robust? Intermediate Representation How well defined is the IR-API? Pass Construction How difficult is it to write analyses and transformations? What basic analyses are available? Debugging assistance? CS553 Lecture Generalizing Data-flow Analysis 8 Reaching Definitions Definition A definition (statement) d of a variable v reaches node n if there is a path from d to n such that v is not redefined along that path d v :=... def[v] Uses of reaching definitions Build use/def chains Constant propagation Loop invariant code motion 1 a =...; 2 b =...; 3 for (...) { 4 x = a + b; 5... 6 } d n x := 5 f(x) Reaching definitions of a and b To determine whether it s legal to move statement 4 out of the loop, we need to ensure that there are no reaching definitions of a or b inside the loop CS553 Lecture Generalizing Data-flow Analysis 9 n Does this def of x reach n? can we replace n with f(5)? 4
Computing Reaching Definitions Assumption At most one definition per node We can refer to definitions by their node number Gen[n]: Definitions that are generated by node n (at most one) Kill[n]: Definitions that are killed by node n Defining Gen and Kill for various statement types statement Gen[s] Kill[s] s: t = b op c {s} def[t] {s} s: t = M[b] {s} def[t] {s} s: M[a] = b {} {} s: if a op b goto L {} {} statement Gen[s] Kill[s] s: goto L {} {} s: L: {} {} s: f(a, ) {} {} s: t=f(a, ) {s} def[t] {s} CS553 Lecture Generalizing Data-flow Analysis 10 A Better Formulation of Reaching Definitions Problem Reaching definitions gives you a set of definitions (nodes) Doesn t tell you what variable is defined Expensive to find definitions of variable v Solution Reformulate to include variable e.g., Use a set of (var, def) pairs a x= b y= n in[n] = {(x,a),(y,b),...} CS553 Lecture Generalizing Data-flow Analysis 11 5
Recall Liveness Analysis Definition A variable is live at a particular point in the program if its value at that point will be used in the future (dead, otherwise). Uses of Liveness Register allocation Dead-code elimination d n...:= v def[v] 1 a =...; 2 b =...; 3... 4 x = f(b); If a is not live out of statement 1 then statement 1 is dead code. CS553 Lecture Generalizing Data-flow Analysis 12 Available Expressions Definition An expression, x+y, is available at node n if every path from the entry node to n evaluates x+y, and there are no definitions of x or y after the last evaluation entry...x+y......x+y... x and y not defined along blue edges n...x+y... CS553 Lecture Generalizing Data-flow Analysis 13 6
Available Expressions for CSE How is this information useful? Common Subexpression Elimination (CSE) If an expression is available at a point where it is evaluated, it need not be recomputed Example 2 i := i + 1 b := 4 * i 1 i := j a := 4 * i 3 c := 4 * i 2 i := i + 1 t := 4 * i b := t 1 i := j t := 4 * i a := t 3 c := t CS553 Lecture Generalizing Data-flow Analysis 14 Aspects of Data-flow Analysis Must or may Information guaranteed or possible Direction forward or backward Flow values variables, definitions,... Initial guess universal or empty set Kill due to semantics of stmt what is removed from set Gen due to semantics of stmt what is added to set Merge how sets from two control paths compose CS553 Lecture Generalizing Data-flow Analysis 15 7
Must vs. May Information Must information Implies a guarantee May information Identifies possibilities Liveness? Available expressions? May Must safe overly large set overly small set desired information small set large set Gen add everything that add only facts that are might be true guaranteed to be true Kill remove only facts that remove everything that are guaranteed to be true might be false merge union intersection initial guess empty set universal set CS553 Lecture Generalizing Data-flow Analysis 16 Reaching Definitions: Must or May Analysis? Consider constant propagation d x = 4 n d x = 5 f(x) We need to know if d might reach node n CS553 Lecture Generalizing Data-flow Analysis 17 8
Defining Available Expressions Analysis Must or may Information? Direction? Flow values? Initial guess? Kill? Gen? Merge? Must Forward Sets of expressions Universal set Set of expressions killed by statement s Set of expressions evaluated by s Intersection CS553 Lecture Generalizing Data-flow Analysis 18 Reaching Constants Goal Compute value of each variable at each program point (if possible) Flow values Set of (variable,constant) pairs Merge function Intersection Data-flow equations Effect of node n x = c kill[n] = {(x,d) d} gen[n] = {(x,c)} Effect of node n x = y + z kill[n] = {(x,c) c} gen[n] = {(x,c) c=valy+valz, (y, valy) in[n], (z, valz) in[n]} CS553 Lecture Generalizing Data-flow Analysis 19 9
Reality Check! Some definitions and uses are ambiguous We can t tell whether or what variable is involved e.g., *p = x; /* what variable are we assigning?! */ Unambiguous assignments are called strong updates Ambiguous assignments are called weak updates Solutions Be conservative Sometimes we assume that it could be everything e.g., Defining *p (generating reaching definitions) Sometimes we assume that it is nothing e.g., Defining *p (killing reaching definitions) Try to figure it out: alias/pointer analysis (more later) CS553 Lecture Generalizing Data-flow Analysis 20 Lattices Define lattice L = (V, ) V is a set of elements of the lattice is a binary relation over the elements of V (meet or greatest lower bound) 001 011 000 010 101 100 110 111 Properties of x,y V x y V (closure) x,y V x y = y x (commutativity) x,y,z V (x y) z = x (y z) (associativity) CS553 Lecture Generalizing Data-flow Analysis 21 10
Lattices (cont) Under ( ) Imposes a partial order on V x y x y = x 001 = 000 010 100 Top ( ) A unique greatest element of V (if it exists) x V { }, x 011 101 = 111 110 Bottom ( ) A unique least element of V (if it exists) x V { }, x Height of lattice L The longest path through the partial order from greatest to least element (top to bottom) CS553 Lecture Generalizing Data-flow Analysis 22 Data-Flow Analysis via Lattices Relationship Elements of the lattice (V) represent flow values (in[] and out[] sets) e.g., Sets of live variables for liveness represents best-case information (initial flow value) e.g., Empty set represents worst-case information e.g., Universal set (meet) merges flow values e.g., Set union If x y, then x is a conservative approximation of y e.g., Superset {i} {i,j} {} {j} {i,k} {i,j,k} {k} {j,k} CS553 Lecture Generalizing Data-flow Analysis 23 11
Data-Flow Analysis via Lattices (cont) Remember what these flow values represent At each program point a lattice element represents an in[] set or an out[] set {} Initially print(x) x = y print(y) {x} {x,y} {y} Finally x = y { y } { x,y } { x } print(x) print(y) { y } CS553 Lecture Generalizing Data-flow Analysis 24 Data-Flow Analysis Frameworks Data-flow analysis framework A set of flow values (V) A binary meet operator ( ) A set of flow functions (F) (also known as transfer functions) Flow Functions F = {f: V V} f describes how each node in CFG affects the flow values Flow functions map program behavior onto lattices CS553 Lecture Generalizing Data-flow Analysis 25 12
Visualizing DFA Frameworks as Lattices Example: Liveness analysis with 3 variables S = {v1, v2, v3} = T V: 2 S = {{v1,v2,v3}, {v1,v2},{v1,v3},{v2,v3}, {v1},{v2},{v3}, } Meet ( ): { v1 } { v2 } { v3 } : Top(T): { v1,v2 } { v1,v3 } { v2,v3 } Bottom ( ): V F: {f n (X) = Gen n (X Kill n ), n} { v1,v2,v3 } = Inferior solutions are lower on the lattice More conservative solutions are lower on the lattice CS553 Lecture Generalizing Data-flow Analysis 26 Concepts Data-flow analyses are distinguished by Flow values (initial guess, type) May/must Direction Gen Kill Merge Complication Ambiguous references (strong/weak updates) Lattices Conservative approximation Optimistic (initial guess) Data-flow analysis frameworks Tuples of lattices CS553 Lecture Generalizing Data-flow Analysis 27 13