The attack model: Static Program Analysis
How making SPA? DFA - Data Flow Analysis CFA - Control Flow Analysis Proving invariance: theorem proving Checking models: model checking Giaco & Ranzato
DFA: The people Gary Kildall Ken Kennedy Jeffrey D. Ullman Giaco & Ranzato
The source Flemming Nielson, Hanne Riis Nielson, Chris Hankin: Principles of Program Analysis. Springer (Corrected 2 nd printing, 452 pages, 2005. Alfred V. Aho, Ravi Sethi and Jefferey D. Ullman: Compilers: Principles, Techniques, and Tools. Addison-Wesley. 2006. Giaco & Ranzato
DFA Giaco & Ranzato
Data Flow Analysis in history Scanner Parser Semantic analysis Optimizer Code generator CFA DFA Improve ment We start from a program representation: CFG The semantics is given by recursive equations specifying the i/o behavior at each program point Giaco & Ranzato
CFG Giaco & Ranzato
What is DFA Wiki: Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A better definition? Data-flow analysis is a technique for gathering information about the how data flows at run time in at various points in a computer program. Giaco & Ranzato
Example: Live Variable Analysis Essential for register allocation: two contemporary alive variables cannot be stored into the same register! x and y cannot be stored into the same location n if they are both in use! Useful for SW watermarking (the QP algorithm) Giaco & Ranzato
Example a and b are never in use at the same time: they can be substituted with x Giaco & Ranzato
Live variables x is live at the exit of C if x holds a value that will be used after (will be read: right-hand side) x is not live after C if before its future use it will be reassigned (x := exp and x exp) If x is not live, it is dead! dead-code elimination: if x is dead after x:=exp then we can erase x:=exp dead code is undecidable!! Giaco & Ranzato
Live variables The last use of b as r-value is in 4 b used in 4 and it is live in the arc 3 4 No assignment to b in 3: it is live in 2 3 b is assigned in 2: no one will use b before 1after 2 Live range of b: {2 3, 3 4} Giaco & Ranzato
Live variables a is live in 4 5 and 5 2 a is live in 1 2 a is not live in 2 3 and 3 4 even if in 3 variable a is defined, this value will not be used until a will be assigned a new value in 4 Giaco & Ranzato
Live variables c is live in all arcs liveness can be used to deduce that if c is a local variable, then c is used without being initialized! (warning!!!!) Giaco & Ranzato
Live variables It is enough to have 2 registers: a and b are never alive together! Giaco & Ranzato
Live variables a and b are never alive along the same arcs! we can optimize P: new register ab Giaco & Ranzato
Basic notation CFG with out-edges and in-edges pre[n] & post[n] denote predecessors nodes and successors nodes of n. Example: post[5]={2,6} because 5 6 and 5 2 pre[2]={1,5} because 5 2 and 1 2 Giaco & Ranzato
Notation A variable is defined when it is the L-value of assignment: x :=... A variable is used when it is a R- value in an expression:... :=.. x.. def[n] are the variables defined in n use[n] are the variables used in n Example: def[3]={c}, def[5]= use[3]={b,c}, use[5]={a} Giaco & Ranzato
Formalizing liveness Definition x is live on e f if there exists and execution path C from e to n such that: e f is the first arc in C x use[n] For any n' e and n' n in C, x def[n']. x is live-in in a node n if x is live on all in-edges of n. x è live-out (or simply live) in a node n if it is live on at least one of the out-edges of n. Example: a is live on 1 2, 4 5 e 5 2 b is live on 2 3, 3 4 c is live on all arcs a is live-in at 2, BUT it is not live-out at 2 a is live-out at 5 Giaco & Ranzato
Computing Liveness Liveness information (i.e., live-in and live-out for all nodes) can be over approximated as follows: 1. If a variable x use[n], then x is live-in at n. Namely, if a node n uses x as R-value then x è live for any incoming arc in n. 2. If a variable x is live-out at n and x def[n], then the variable x is also live-in at n Namely, if x is live for some arc outgoing n and x is not defined in n then x is live for all arcs incoming in n. 3. If a variable x is live-in at m, then x is live-out for all nodes c pre[m]. Correctness: If x is truly live-in (live-out) at n then the static analysis will find that x is live-in (live-out) at n. Giaco & Ranzato
Approximating Liveness Liveness analysis is approximate: the assumption is that all paths in the CFG are possible!!! The analysis determines that a is live-in in 5, and therefore a is live-out in 3. BUT there is no true execution path from 3 to 5 and therefore a is not concretely live at the exit of 3! Giaco & Ranzato
Data-Flow equations Define: in[n] the set of variables that are classified as live-in at the node n out[n] the set of variables that are classified as live-out at the node n This can be expressed with 2 equations (or a system of equations): 1. in[n] = use[n] (out[n] - def[n]) 2. out[n] = {in[m] m post[n]} Giaco & Ranzato
Least fixpoint Least fix-point of the system of equations: n nodes(cfg(p)): in[n] = use[n] (out[n] - def[n]) out[n] = {in[m] m post[n]} Formally: Let Vars(P) < ω and nodes(cfg(p)) = N then live : (2 Vars(P) X 2 Vars(P) ) N (2 Vars(P) X 2 Vars(P) ) N (2 Vars(P) X 2 Vars(P) ) N is a finite complete lattice! live is a monotone function such that: live( in1,out1,...,inn,outn ) = in[1] = use[1] (out[1] - def[1]),out[1] = {in[m] m post[1]},..., in[n] = use[n] (out[n] - def[n]),out[n] = {in[m] m post[n]} Giaco & Ranzato
Correctness Theorem n nodes(p): live-in[n] in[n] and live-out[n] out[n]. Proof idea: Both in[n] and out[n] compute over the CFG statically, i.e. following possibly non-real executions!! Giaco & Ranzato
Approximation soundness How can we read the answer of a static analysis? If x will be live in n in some program execution path then x out[n] If x will not be live in n in some program computation it may well happen that x out[n] For liveness sound approximation means: we can erroneously derive that x is live, BUT we CANNOT erroneously derive that a variable is dead!! If x out[n] then x may be live at program point n If x out[n] then x is definitively dead at program point n. Giaco & Ranzato
Giaco & Ranzato
Giaco & Ranzato
Giaco & Ranzato
The approximation is complete!! out[1]={a,c}, out[2]={b,c}, out[3]={b,c}, out[4]={a,c}, out[5]={a,c} Giaco & Ranzato
Backward analysis Live variable analysis is indeed backward: information propagates backward from out to in I can compute in[n] if I know out[n]; I can compute out[n] if I know in[m] for all successors of n Giaco & Ranzato
Backward analysis Giaco & Ranzato
Reaching definitions Given a program point n, what are the definitions (assignments) that are available and not overwritten, when program execution reaches this point along some path? And what definitions are available after n? A program point n may kill a definition: if the command in n is an assignment x:=exp. In this case we kill definitions for x which are available in entry at n. We can generate new definitions by assignments. We are interested in entry and exit reaching definitions for any program point in CFG.... it is one of the simplest data-flow analysis in compilers! Giaco & Ranzato
Forward analysis Giaco & Ranzato
Formal definition Definitions are pairs of variable-program-point: {(x,p) x Vars, p is a program point} 2 (Vars Points) where (x,p) means that x is assigned at point p. The analysis computes the set of reaching definitions for each program point: definition chains. If (x,p) is computed at point q then the assignement to x at point p is available in q.? is a special symbol in Points, which is used for uninstantiated variables The value ι = {(x,?) x Vars} denotes uninstantiated variables Giaco & Ranzato
Formal definition The analysis is given by the following system of fix-point equations for any program point in CFG: ι if p is a program entry point RD entry (p) {RD exit (q) q pre[p]} otherwise RD exit (p) (RD entry (p) \ kill RD [p] ) gen RD [p] RD is a possible analysis: if x:=a in program point q is really available at the entry of point p then (x,q) RD entry (p) (the converse may not hold) Giaco & Ranzato
Formal definition {(x,q) q Points, x def[q]} {(x,?)} if x def[p] kill RD [p] if x def[p] {(x,p)} if x def[p] gen RD [p] if x def[p] As usual: def[p] = {x} if the instruction at program point p is x:=exp Otherwise def[p]=?. The analysis is forward with least fixpoint. Giaco & Ranzato
RD entry (1)= {(n,?),(m,?)} RD exit (1) = {(n,?),(m,?)} 1 input n; RD entry (2)= {(n,?),(m,?)} RD exit (2)= {(n,?),(m,2)} 6 2 m:= 1; 3 n>1; output m; 4 m:= m*n; 5 n:= n-1; RD entry (3)= RD exit (2) U RD exit (5) ={(n,?),(n,5),(m,2),(m,4)} RD exit (3)= {(n,?),(n,5),(m,2),(m,4)} RD entry (4)= {(n,?),(n,5),(m,2),(m,4)} RD exit (4)= {(n,?),(n,5),(m,4)} RD entry (5)= {(n,?),(n,5),(m,4)} RD exit (5)= {(n,5),(m,4)} RD entry (6)= {(n,?),(n,5),(m,2),(m,4)} RD exit (6)= {(n,?),(n,5),(m,2),(m,4)} Giaco & Ranzato
DFA training On-line analyzer: http://pag.cs.uni-sb.de/ it implements standard DFA with an intuitive interface! Giaco & Ranzato
http://pag.cs.uni-sb.de/ Giaco & Ranzato
http://pag.cs.uni-sb.de/ Giaco & Ranzato
http://pag.cs.uni-sb.de/ Giaco & Ranzato
http://pag.cs.uni-sb.de/ Giaco & Ranzato
DFA Framework Is there a common structure in DFA? Having a framework allows the design of a common algorithm and specification (correctness proofs, complexity evaluation etc) Giaco & Ranzato
A common structure? Forward in[n]! out[n] pre! post Backward out[n]! in[n] post! pre Possible Analysis Semantics Analysis Reaching definitions Live variables Definite Analysis Analysis Semantics Available expressions Very busy expressions Giaco & Ranzato
A common pattern ι if p E GA (p) { GA (q) (q,p) F } otherwise GA (p) f p (GA (p)) where: E are the initial/terminal points in CFG ι is the initial/final information F are the arcs or inverse arcs in CFG is the either or f p is a transfer function associated with node p Giaco & Ranzato
Forward vs Backward ι if p E GA (p) = { GA (q) (q,p) F } otherwise GA (p) = f p (GA (p)) In forward analysis E are the initial points, F = {(q,p) q" p}, GA is GA entry and GA is GA exit In backward analysis E are the final points, F = {(q,p) p" q}, GA is GA exit and GA is GA entry Giaco & Ranzato
Possible vs Definite ι if p E GA (p) = { GA (q) (q,p) F } otherwise GA (p) = f p (GA (p)) When = we look for the largest set satisfying the equations on all possible computation paths entering (exiting) a node: This is a definite (or must) analysis! Quando = we look for the least set satisfying the equations on at least one possible computation path entering (exiting) a node: This is a possible (or may) analysis! Giaco & Ranzato
Distributive Dataflow Analysis Assume transfer functions monotone and = A dataflow analysis problem is distributive if all transfer functions are additive, namely for any f we have that for any x,y C: f(x y) = f(x) f(y) Note that by f monotonicity: f(x y) f(x) f(y) Giaco & Ranzato
A distributive transfer function Giaco & Ranzato
A non-distributive transfer function Giaco & Ranzato
An example f g h k(h(f(0) U g(0))) = k(h(f(0)) U h(g(0))) = k(h(f(0))) U k(h(g(0))) k The analysis is equivalent to combine the result of the analysis along all separate paths Giaco & Ranzato
DFA of a distributive problem If a problem is distributive then the minimal solution to its system of equations is equivalent to the combination of the separate analysis applied to all program execution paths (including infinite ones). does not cause a loss of precision! Giaco & Ranzato
What problems are distributive? Distributive problems are easy. DFA concerning the structure of code are typically distributive! Example: live variables, available expressions, reaching definitions, very busy expressions are all distributive problems. These are properties concerning HOW the program executes. Giaco & Ranzato
Non-distributive problems Typical non-distributive problems concern WHAT programs compute. Example: the output is a constant, a positive value, belongs to an interval, is bounded etc. Example: Constant Propagation Analysis For every program point p determine whether a variable always has the same constant value whenever the execution reach p. Giaco & Ranzato
Constant Propagation Analysis The domain of properties is (Var # Z ) where: Var is the set of variables in P Z is the dual CPO to Z T -4-3 -2-1 0 1 2 3 4 Giaco & Ranzato
Constant Propagation Analysis Var # Z are the states evaluating variables in Z with meaning dont know. Var # Z is a CPO under the usual point-wise order : If σ,σ' Var # Z then σ σ' iff x Var. σ(x) Z % σ(x) Z σ'(x). is a bottom state (totally undefined function) in (Var # Z ). Giaco & Ranzato
({x,y} # Z ) T= {(x, ), (y, )} {(x, ), (y,4)} {(x,1), (y,4)} {(x,1), (y,2)} {(y,4)} {(x,1)} {(y,7)} Giaco & Ranzato
Analyzing expressions In order to specify transfer functions we need to be able to evaluate (integer) expressions in Aexp in a state σ (Var! Z ) : A:(Aexp " (Var! Z ) ) " Z, A x σ = A n σ = if σ = or σ(x) = undef σ(x) otherwise if se σ = n otherwise A a 1 op a 2 σ = A a 1 σ op A a 2 σ where op is the interpretation of op on Z, defined as follows: let opz :Z 2 " Z an arithmetic operation on Z: (A) if z 1,z 2 Z then z 1 op z 2 = z 1 opz z 2 ; (B) op z = z op = ; (C) z 1 op z 2 = otherwise. Giaco & Ranzato
The transfer functions The transfer functions for constant propagation are: f p : (Var & Z ) ' (Var & Z ) and defined as follows: if p is a node containing an assignment [x:=a]p then f p (σ) σ[x A a σ] if p is a node containing a non assignment command: f p (σ) σ This is a possible/forward analysis Giaco & Ranzato
Example Consider the program [x:=10] 1 ; [y:=x+10] 2 ; ([while x<y] 3 [y:=y-1] 4 ); [z:=x-1] 5 The minimal solution of Constant Propagation Analysis is: CP entry (1) = CP exit (1) = {(x 10)} CP entry (2) = {(x 10)} CP exit (2) = {(x 10), (y 20)} CP entry (3) = CP exit (3) = CP entry (4) = CP exit (4) = {(x 10), (y )} CP entry (5) = {(x 10), (y )} CP exit (5) = {(x 10), (y ), (z 9)} Giaco & Ranzato
Non-distributivity Constant Propagation Analysis is not distributive: consider the transfer function for the command line [y:= x * x] p We consider two states σ 1 and σ 2 such that σ 1 (x) = 1 e σ 2 (x) = -1. In this case: (σ 1 σ 2 )(x) = and therefore f p (σ 1 σ 2 )(y) = while f p (σ 1 )(y) = 1 = f p (σ 2 )(y) Giaco & Ranzato
Abstract Interpretation
The people Patrick Cousot Radhia Cousot Made in France Giaco & Ranzato
Applications Developed in 77 for generalizing DFA Successful model for: DFA, Model Checking, Types, Program transformation, etc. Successfully used in concrete analysis systems since Y2000 analyzed ~2M lines of safety critical C code with no false alarms! Giaco & Ranzato