Thursday, December 23, The attack model: Static Program Analysis

Similar documents
Principles of Program Analysis: Data Flow Analysis

Register allocation. Register allocation: ffl have value in a register when used. ffl limited resources. ffl changes instruction choices

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Foundations of Dataflow Analysis

Compiler Design. Fall Data-Flow Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Lecture 6 Foundations of Data Flow Analysis

Introduction to Machine-Independent Optimizations - 4

Data-flow Analysis. Y.N. Srikant. Department of Computer Science and Automation Indian Institute of Science Bangalore

Lecture 6. Abstract Interpretation

Static Program Analysis

Lecture 6 Foundations of Data Flow Analysis

Static Program Analysis

A Gentle Introduction to Program Analysis

Static Analysis. Systems and Internet Infrastructure Security

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

Dataflow analysis (ctd.)

Lecture 5. Data Flow Analysis

Live Variable Analysis. Work List Iterative Algorithm Rehashed

MIT Introduction to Dataflow Analysis. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Program Static Analysis. Overview

Principles of Program Analysis: A Sampler of Approaches

PROGRAM ANALYSIS & SYNTHESIS

Compiler Optimization and Code Generation

Compiler Optimisation

Semantics with Applications 3. More on Operational Semantics

Compiler Design. Fall Control-Flow Analysis. Prof. Pedro C. Diniz

Data Flow Analysis using Program Graphs

Data Flow Information. already computed

3.7 Denotational Semantics

Symbolic Solver for Live Variable Analysis of High Level Design Languages

Page # 20b -Advanced-DFA. Reading assignment. State Propagation. GEN and KILL sets. Data Flow Analysis

Advanced Programming Methods. Introduction in program analysis

Example of Global Data-Flow Analysis

20b -Advanced-DFA. J. L. Peterson, "Petri Nets," Computing Surveys, 9 (3), September 1977, pp

Lecture Compiler Middle-End

Lecture 4 Introduction to Data Flow Analysis

Program Analysis and Verification

Register allocation. CS Compiler Design. Liveness analysis. Register allocation. Liveness analysis and Register allocation. V.

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

CS422 - Programming Language Design

Chapter 1 Introduction

Iterative Program Analysis Abstract Interpretation

4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis?

(How Not To Do) Global Optimizations

We can express this in dataflow equations using gen and kill sets, where the sets are now sets of expressions.

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.

Lecture Notes: Dataflow Analysis Examples

Plan for Today. Concepts. Next Time. Some slides are from Calvin Lin s grad compiler slides. CS553 Lecture 2 Optimizations and LLVM 1

Data Flow Analysis. Agenda CS738: Advanced Compiler Optimizations. 3-address Code Format. Assumptions

Applications of Program analysis in Model-Based Design

Middle End. Code Improvement (or Optimization) Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code

CS153: Compilers Lecture 17: Control Flow Graph and Data Flow Analysis

Abstract Interpretation

Data-flow Analysis - Part 2

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

Lecture 2. Introduction to Data Flow Analysis

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Toward Abstract Interpretation of Program Transformations

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)

APA Interprocedural Dataflow Analysis

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Compiler Construction 2009/2010 SSA Static Single Assignment Form

Compiler Optimizations. Chapter 8, Section 8.5 Chapter 9, Section 9.1.7

Data Flow Analysis. Program Analysis

More Dataflow Analysis

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

Introduction to Machine-Independent Optimizations - 6

Semantics of COW. July Alex van Oostenrijk and Martijn van Beek

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Functions. How is this definition written in symbolic logic notation?

Intra-procedural Data Flow Analysis Introduction

Lecture 21 CIS 341: COMPILERS

EECS 583 Class 6 Dataflow Analysis

Static analysis and all that

Constructing Control Flow Graph for Java by Decoupling Exception Flow from Normal Flow

A main goal is to achieve a better performance. Code Optimization. Chapter 9

CS553 Lecture Generalizing Data-flow Analysis 3

CSCE 548 Building Secure Software Data Flow Analysis

Why Global Dataflow Analysis?

Optimizing Compilers. Vineeth Kashyap Department of Computer Science, UCSB. SIAM Algorithms Seminar, 2014

Composing Dataflow Analyses and Transformations

Lecture Notes: Widening Operators and Collecting Semantics

Code Optimization. Code Optimization

Logic-Flow Analysis of Higher-Order Programs

COMP80 Lambda Calculus Programming Languages Slides Courtesy of Prof. Sam Guyer Tufts University Computer Science History Big ideas Examples:

Control Flow Analysis. Reading & Topics. Optimization Overview CS2210. Muchnick: chapter 7

CIS 890: Safety Critical Systems

CA Compiler Construction

CS202 Compiler Construction

ABSTRACT INTERPRETATION

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization

Formal Semantics of Programming Languages

Static Analysis by A. I. of Embedded Critical Software

Lectures 20, 21: Axiomatic Semantics

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

Program analysis for determining opportunities for optimization: 2. analysis: dataow Organization 1. What kind of optimizations are useful? lattice al

Formal Semantics of Programming Languages

Compiler Construction 2010/2011 Loop Optimizations

Transcription:

The attack model: Static Program Analysis

How making SPA? DFA - Data Flow Analysis CFA - Control Flow Analysis Proving invariance: theorem proving Checking models: model checking Giaco & Ranzato

DFA: The people Gary Kildall Ken Kennedy Jeffrey D. Ullman Giaco & Ranzato

The source Flemming Nielson, Hanne Riis Nielson, Chris Hankin: Principles of Program Analysis. Springer (Corrected 2 nd printing, 452 pages, 2005. Alfred V. Aho, Ravi Sethi and Jefferey D. Ullman: Compilers: Principles, Techniques, and Tools. Addison-Wesley. 2006. Giaco & Ranzato

DFA Giaco & Ranzato

Data Flow Analysis in history Scanner Parser Semantic analysis Optimizer Code generator CFA DFA Improve ment We start from a program representation: CFG The semantics is given by recursive equations specifying the i/o behavior at each program point Giaco & Ranzato

CFG Giaco & Ranzato

What is DFA Wiki: Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A better definition? Data-flow analysis is a technique for gathering information about the how data flows at run time in at various points in a computer program. Giaco & Ranzato

Example: Live Variable Analysis Essential for register allocation: two contemporary alive variables cannot be stored into the same register! x and y cannot be stored into the same location n if they are both in use! Useful for SW watermarking (the QP algorithm) Giaco & Ranzato

Example a and b are never in use at the same time: they can be substituted with x Giaco & Ranzato

Live variables x is live at the exit of C if x holds a value that will be used after (will be read: right-hand side) x is not live after C if before its future use it will be reassigned (x := exp and x exp) If x is not live, it is dead! dead-code elimination: if x is dead after x:=exp then we can erase x:=exp dead code is undecidable!! Giaco & Ranzato

Live variables The last use of b as r-value is in 4 b used in 4 and it is live in the arc 3 4 No assignment to b in 3: it is live in 2 3 b is assigned in 2: no one will use b before 1after 2 Live range of b: {2 3, 3 4} Giaco & Ranzato

Live variables a is live in 4 5 and 5 2 a is live in 1 2 a is not live in 2 3 and 3 4 even if in 3 variable a is defined, this value will not be used until a will be assigned a new value in 4 Giaco & Ranzato

Live variables c is live in all arcs liveness can be used to deduce that if c is a local variable, then c is used without being initialized! (warning!!!!) Giaco & Ranzato

Live variables It is enough to have 2 registers: a and b are never alive together! Giaco & Ranzato

Live variables a and b are never alive along the same arcs! we can optimize P: new register ab Giaco & Ranzato

Basic notation CFG with out-edges and in-edges pre[n] & post[n] denote predecessors nodes and successors nodes of n. Example: post[5]={2,6} because 5 6 and 5 2 pre[2]={1,5} because 5 2 and 1 2 Giaco & Ranzato

Notation A variable is defined when it is the L-value of assignment: x :=... A variable is used when it is a R- value in an expression:... :=.. x.. def[n] are the variables defined in n use[n] are the variables used in n Example: def[3]={c}, def[5]= use[3]={b,c}, use[5]={a} Giaco & Ranzato

Formalizing liveness Definition x is live on e f if there exists and execution path C from e to n such that: e f is the first arc in C x use[n] For any n' e and n' n in C, x def[n']. x is live-in in a node n if x is live on all in-edges of n. x è live-out (or simply live) in a node n if it is live on at least one of the out-edges of n. Example: a is live on 1 2, 4 5 e 5 2 b is live on 2 3, 3 4 c is live on all arcs a is live-in at 2, BUT it is not live-out at 2 a is live-out at 5 Giaco & Ranzato

Computing Liveness Liveness information (i.e., live-in and live-out for all nodes) can be over approximated as follows: 1. If a variable x use[n], then x is live-in at n. Namely, if a node n uses x as R-value then x è live for any incoming arc in n. 2. If a variable x is live-out at n and x def[n], then the variable x is also live-in at n Namely, if x is live for some arc outgoing n and x is not defined in n then x is live for all arcs incoming in n. 3. If a variable x is live-in at m, then x is live-out for all nodes c pre[m]. Correctness: If x is truly live-in (live-out) at n then the static analysis will find that x is live-in (live-out) at n. Giaco & Ranzato

Approximating Liveness Liveness analysis is approximate: the assumption is that all paths in the CFG are possible!!! The analysis determines that a is live-in in 5, and therefore a is live-out in 3. BUT there is no true execution path from 3 to 5 and therefore a is not concretely live at the exit of 3! Giaco & Ranzato

Data-Flow equations Define: in[n] the set of variables that are classified as live-in at the node n out[n] the set of variables that are classified as live-out at the node n This can be expressed with 2 equations (or a system of equations): 1. in[n] = use[n] (out[n] - def[n]) 2. out[n] = {in[m] m post[n]} Giaco & Ranzato

Least fixpoint Least fix-point of the system of equations: n nodes(cfg(p)): in[n] = use[n] (out[n] - def[n]) out[n] = {in[m] m post[n]} Formally: Let Vars(P) < ω and nodes(cfg(p)) = N then live : (2 Vars(P) X 2 Vars(P) ) N (2 Vars(P) X 2 Vars(P) ) N (2 Vars(P) X 2 Vars(P) ) N is a finite complete lattice! live is a monotone function such that: live( in1,out1,...,inn,outn ) = in[1] = use[1] (out[1] - def[1]),out[1] = {in[m] m post[1]},..., in[n] = use[n] (out[n] - def[n]),out[n] = {in[m] m post[n]} Giaco & Ranzato

Correctness Theorem n nodes(p): live-in[n] in[n] and live-out[n] out[n]. Proof idea: Both in[n] and out[n] compute over the CFG statically, i.e. following possibly non-real executions!! Giaco & Ranzato

Approximation soundness How can we read the answer of a static analysis? If x will be live in n in some program execution path then x out[n] If x will not be live in n in some program computation it may well happen that x out[n] For liveness sound approximation means: we can erroneously derive that x is live, BUT we CANNOT erroneously derive that a variable is dead!! If x out[n] then x may be live at program point n If x out[n] then x is definitively dead at program point n. Giaco & Ranzato

Giaco & Ranzato

Giaco & Ranzato

Giaco & Ranzato

The approximation is complete!! out[1]={a,c}, out[2]={b,c}, out[3]={b,c}, out[4]={a,c}, out[5]={a,c} Giaco & Ranzato

Backward analysis Live variable analysis is indeed backward: information propagates backward from out to in I can compute in[n] if I know out[n]; I can compute out[n] if I know in[m] for all successors of n Giaco & Ranzato

Backward analysis Giaco & Ranzato

Reaching definitions Given a program point n, what are the definitions (assignments) that are available and not overwritten, when program execution reaches this point along some path? And what definitions are available after n? A program point n may kill a definition: if the command in n is an assignment x:=exp. In this case we kill definitions for x which are available in entry at n. We can generate new definitions by assignments. We are interested in entry and exit reaching definitions for any program point in CFG.... it is one of the simplest data-flow analysis in compilers! Giaco & Ranzato

Forward analysis Giaco & Ranzato

Formal definition Definitions are pairs of variable-program-point: {(x,p) x Vars, p is a program point} 2 (Vars Points) where (x,p) means that x is assigned at point p. The analysis computes the set of reaching definitions for each program point: definition chains. If (x,p) is computed at point q then the assignement to x at point p is available in q.? is a special symbol in Points, which is used for uninstantiated variables The value ι = {(x,?) x Vars} denotes uninstantiated variables Giaco & Ranzato

Formal definition The analysis is given by the following system of fix-point equations for any program point in CFG: ι if p is a program entry point RD entry (p) {RD exit (q) q pre[p]} otherwise RD exit (p) (RD entry (p) \ kill RD [p] ) gen RD [p] RD is a possible analysis: if x:=a in program point q is really available at the entry of point p then (x,q) RD entry (p) (the converse may not hold) Giaco & Ranzato

Formal definition {(x,q) q Points, x def[q]} {(x,?)} if x def[p] kill RD [p] if x def[p] {(x,p)} if x def[p] gen RD [p] if x def[p] As usual: def[p] = {x} if the instruction at program point p is x:=exp Otherwise def[p]=?. The analysis is forward with least fixpoint. Giaco & Ranzato

RD entry (1)= {(n,?),(m,?)} RD exit (1) = {(n,?),(m,?)} 1 input n; RD entry (2)= {(n,?),(m,?)} RD exit (2)= {(n,?),(m,2)} 6 2 m:= 1; 3 n>1; output m; 4 m:= m*n; 5 n:= n-1; RD entry (3)= RD exit (2) U RD exit (5) ={(n,?),(n,5),(m,2),(m,4)} RD exit (3)= {(n,?),(n,5),(m,2),(m,4)} RD entry (4)= {(n,?),(n,5),(m,2),(m,4)} RD exit (4)= {(n,?),(n,5),(m,4)} RD entry (5)= {(n,?),(n,5),(m,4)} RD exit (5)= {(n,5),(m,4)} RD entry (6)= {(n,?),(n,5),(m,2),(m,4)} RD exit (6)= {(n,?),(n,5),(m,2),(m,4)} Giaco & Ranzato

DFA training On-line analyzer: http://pag.cs.uni-sb.de/ it implements standard DFA with an intuitive interface! Giaco & Ranzato

http://pag.cs.uni-sb.de/ Giaco & Ranzato

http://pag.cs.uni-sb.de/ Giaco & Ranzato

http://pag.cs.uni-sb.de/ Giaco & Ranzato

http://pag.cs.uni-sb.de/ Giaco & Ranzato

DFA Framework Is there a common structure in DFA? Having a framework allows the design of a common algorithm and specification (correctness proofs, complexity evaluation etc) Giaco & Ranzato

A common structure? Forward in[n]! out[n] pre! post Backward out[n]! in[n] post! pre Possible Analysis Semantics Analysis Reaching definitions Live variables Definite Analysis Analysis Semantics Available expressions Very busy expressions Giaco & Ranzato

A common pattern ι if p E GA (p) { GA (q) (q,p) F } otherwise GA (p) f p (GA (p)) where: E are the initial/terminal points in CFG ι is the initial/final information F are the arcs or inverse arcs in CFG is the either or f p is a transfer function associated with node p Giaco & Ranzato

Forward vs Backward ι if p E GA (p) = { GA (q) (q,p) F } otherwise GA (p) = f p (GA (p)) In forward analysis E are the initial points, F = {(q,p) q" p}, GA is GA entry and GA is GA exit In backward analysis E are the final points, F = {(q,p) p" q}, GA is GA exit and GA is GA entry Giaco & Ranzato

Possible vs Definite ι if p E GA (p) = { GA (q) (q,p) F } otherwise GA (p) = f p (GA (p)) When = we look for the largest set satisfying the equations on all possible computation paths entering (exiting) a node: This is a definite (or must) analysis! Quando = we look for the least set satisfying the equations on at least one possible computation path entering (exiting) a node: This is a possible (or may) analysis! Giaco & Ranzato

Distributive Dataflow Analysis Assume transfer functions monotone and = A dataflow analysis problem is distributive if all transfer functions are additive, namely for any f we have that for any x,y C: f(x y) = f(x) f(y) Note that by f monotonicity: f(x y) f(x) f(y) Giaco & Ranzato

A distributive transfer function Giaco & Ranzato

A non-distributive transfer function Giaco & Ranzato

An example f g h k(h(f(0) U g(0))) = k(h(f(0)) U h(g(0))) = k(h(f(0))) U k(h(g(0))) k The analysis is equivalent to combine the result of the analysis along all separate paths Giaco & Ranzato

DFA of a distributive problem If a problem is distributive then the minimal solution to its system of equations is equivalent to the combination of the separate analysis applied to all program execution paths (including infinite ones). does not cause a loss of precision! Giaco & Ranzato

What problems are distributive? Distributive problems are easy. DFA concerning the structure of code are typically distributive! Example: live variables, available expressions, reaching definitions, very busy expressions are all distributive problems. These are properties concerning HOW the program executes. Giaco & Ranzato

Non-distributive problems Typical non-distributive problems concern WHAT programs compute. Example: the output is a constant, a positive value, belongs to an interval, is bounded etc. Example: Constant Propagation Analysis For every program point p determine whether a variable always has the same constant value whenever the execution reach p. Giaco & Ranzato

Constant Propagation Analysis The domain of properties is (Var # Z ) where: Var is the set of variables in P Z is the dual CPO to Z T -4-3 -2-1 0 1 2 3 4 Giaco & Ranzato

Constant Propagation Analysis Var # Z are the states evaluating variables in Z with meaning dont know. Var # Z is a CPO under the usual point-wise order : If σ,σ' Var # Z then σ σ' iff x Var. σ(x) Z % σ(x) Z σ'(x). is a bottom state (totally undefined function) in (Var # Z ). Giaco & Ranzato

({x,y} # Z ) T= {(x, ), (y, )} {(x, ), (y,4)} {(x,1), (y,4)} {(x,1), (y,2)} {(y,4)} {(x,1)} {(y,7)} Giaco & Ranzato

Analyzing expressions In order to specify transfer functions we need to be able to evaluate (integer) expressions in Aexp in a state σ (Var! Z ) : A:(Aexp " (Var! Z ) ) " Z, A x σ = A n σ = if σ = or σ(x) = undef σ(x) otherwise if se σ = n otherwise A a 1 op a 2 σ = A a 1 σ op A a 2 σ where op is the interpretation of op on Z, defined as follows: let opz :Z 2 " Z an arithmetic operation on Z: (A) if z 1,z 2 Z then z 1 op z 2 = z 1 opz z 2 ; (B) op z = z op = ; (C) z 1 op z 2 = otherwise. Giaco & Ranzato

The transfer functions The transfer functions for constant propagation are: f p : (Var & Z ) ' (Var & Z ) and defined as follows: if p is a node containing an assignment [x:=a]p then f p (σ) σ[x A a σ] if p is a node containing a non assignment command: f p (σ) σ This is a possible/forward analysis Giaco & Ranzato

Example Consider the program [x:=10] 1 ; [y:=x+10] 2 ; ([while x<y] 3 [y:=y-1] 4 ); [z:=x-1] 5 The minimal solution of Constant Propagation Analysis is: CP entry (1) = CP exit (1) = {(x 10)} CP entry (2) = {(x 10)} CP exit (2) = {(x 10), (y 20)} CP entry (3) = CP exit (3) = CP entry (4) = CP exit (4) = {(x 10), (y )} CP entry (5) = {(x 10), (y )} CP exit (5) = {(x 10), (y ), (z 9)} Giaco & Ranzato

Non-distributivity Constant Propagation Analysis is not distributive: consider the transfer function for the command line [y:= x * x] p We consider two states σ 1 and σ 2 such that σ 1 (x) = 1 e σ 2 (x) = -1. In this case: (σ 1 σ 2 )(x) = and therefore f p (σ 1 σ 2 )(y) = while f p (σ 1 )(y) = 1 = f p (σ 2 )(y) Giaco & Ranzato

Abstract Interpretation

The people Patrick Cousot Radhia Cousot Made in France Giaco & Ranzato

Applications Developed in 77 for generalizing DFA Successful model for: DFA, Model Checking, Types, Program transformation, etc. Successfully used in concrete analysis systems since Y2000 analyzed ~2M lines of safety critical C code with no false alarms! Giaco & Ranzato