Program Analysis Course Notes

Size: px
Start display at page:

Download "Program Analysis Course Notes"

Transcription

1 1. Background / overview 1.1 Course overview Program Analysis Course Notes Ashok Sreenivas, 2008 Introduction: what and why of program analysis Background and program analysis techniques Lattice theory Data flow analysis Abstract interpretation Non-standard type inference Inter-procedural analysis Analysis I: Identifying equivalent expressions Different approaches Relative merits, demerits Analysis II: Pointer analysis Theoretical complexities Families of algorithms 1.2 Program analysis what and why What is program analysis? Infer properties of a given program Analogies to other kinds of analysis (Shakespeare's poetry or an airplane) What do we mean by properties? Syntactic properties analogous to physical properties. Not of interest in this course. Semantic properties properties that hold when the program runs. Similar to a flying plane or understanding Shakespeare. Much more interesting and relevant. Of course, we want to find properties without running the program. (Why?) Why should we study program analysis? Program verification Ensure that a program meets its specifications Discover (all) invariants about the program Property verification static debugging Relatively more modest aim of ensuring if a given property holds for the program Works against partial specifications Examples: safety of file operations, array index overflows etc. Program optimization 1

2 Finding properties that hold which ensure that a given transformation on a program does not change its behaviour [eg., eliminating constant computations] Preferably, it should also make the program run faster! Translation validation Does the object code of your program faithfully reflect the source? Requires identifying and comparing properties across two languages! Program understanding, re-engineering etc. Software engineering applications very little completely new code is ever written Want to understand the program s behaviour, perhaps in pieces, perhaps under specific conditions, Useful to maintain legacy systems, re-target them Analysis results primarily intended for human consumption (unlike in other cases) What do we mean by a program? Or, more precisely, what kind of programs are we talking about? Program analysis techniques talk about analyzing a class of programs not one program (like a compiler) Class of programs defined by a language or semantic model Languages syntactically different but (almost) similar semantically can be treated similarly Obviously, difficulty of analysis proportional to the complexity of the language / semantic-model For the purposes of this course: semantic model would broadly include all imperative, maybe object-oriented programs. In particular, it includes variables, assignments, control-flow (sequence, if-then-else, loops). Also includes pointers, procedures / functions (with parameters). Aggregate types structures, arrays etc. also considered. It does not include higher-order functions, functions as first-class values (though it may include function pointers ). Example analyses: sign analysis, interval analysis. Sign analysis: If some value should never become negative (say, temperature or pressure ) Interval analysis: Similarly, for some critical values such as temperature, pressure etc. Or also very relevant to array index analysis, and therefore security violations in web applications. Sample program and solutions for these analyses on the sample program. <<x: unknown, y: unknown>> <<x: [], y: []>> x = -10; <<x: -ve, y: unknown>> <<x: [-10, -10], y: []>> y = 1; <<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>> while (x <= 100) <<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>> 2

3 { <<x: any, y: +ve>> <<x: [-10, 100], y: [1, inf]>> x = x + y; <<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>> y = y * 2; <<x: any, y: +ve>> <<x: [-10, inf], y: [1, inf]>> } <<x: any, y: +ve>> <<x: [101, inf], y: [1, inf]>> Points to note One has to design suitable abstractions for the analyses One needs info at each program point Info should represent all possible executions The idea of approximations: Why is x s sign <<any>> at the top of the while loop? Why is the interval of y [1,inf] at the top of the while loop? Can it be better? 1.3 Fundamentals Underlying principles of Program Analysis The actual (or 'concrete') program works on concrete values giving concrete outputs say program with integer inputs and outputs. Questions you ask of the program, i.e. the properties of interest are abstract. Eg., range of values, set of values (why set?), signs etc. So, notion of abstraction is the key. The concrete values almost always abstracted (eg. signs, intervals etc.), i.e. the domain on which the program operates is changed from concrete to abstract. The program itself may also be abstracted to simplify the analysis. Approximation Finding exact information about the program often impossible Why? Halting problem is a program analysis problem! Many others undecidable too. Even if theoretically possible, may be extremely hard, computationally intractable. Therefore, will have to settle for approximate answers many times. Approximations introduce notions of soundness and completeness how correct is the approximation and how good is it. Soundness Everything inferred by the analysis is also true of the program True in analysis => True in program Though everything true at run-time may not be inferred Basically determines the direction of approximation For interval analysis, the natural direction would be that it is OK to predict wider intervals Say x really takes on values 11, 16, 19. 3

4 If analysis says x: [0, 25], i.e. it is never the case that x takes values outside of this range it is OK. But if analysis says x: [12, 18] or even x: [12, 28], it is unsound because it says x never takes the value 11 which is wrong. But notion of soundness depends on what you want to use analysis information for. Example (variable initialization) If application is to detect bugs arising due to uninitialized variables, you want to report a super-set of all actual uninitialized variables. That is, you want to catch all 'real' bugs and perhaps also some spurious 'bugs' that are not really so. If application is to pre-initialize uninitialized variables (rather than when encountered first assume this is more efficient or easy), then you want to report a subset of all uninitialized variables. That is, it is OK to miss out on a few uninitialized variables, because this is likely to save effort at run-time (if we caught a superset of uninitialized variables, some variables that are deemed uninitialized but are actually initialized will get initialized twice which may not be desirable). But most often, the direction of approximation for soundness is obvious. Example: If analysis predicts no invalid memory access, or overflowing computations, program is really free of such errors. If it points out potential invalid memory accesses or overflowing computations, these may or may not be so. Completeness Converse, i.e. everything true in program is also predicted by the analysis True in program => True in analysis The other direction of approximation. Not everything predicted by the analysis may be true of the program! If analysis predicts invalid memory access, then it is an invalid memory access. In almost all situations, an analysis must be sound. Preferably, it should also be complete, but as we have seen this may be extremely hard or even impossible. Being sound and incomplete is also called being conservative or 'safe' Being just sound (and horribly incomplete!) is always very easy. Just pick the extreme solution! Infinite intervals for interval analysis, +/ for the sign analysis, Every variable for the uninitialized variables analysis etc. But these are also completely useless solutions. Hence precision important. Precision Try to get as close to complete as possible without losing out on soundness Tighter intervals, fewer false-alarms with uninitialized variables etc. Often, trade-off between precision and effort. Sometimes (very rarely) maybe trade-off between soundness and effort. 4

5 2. Program analysis techniques 2.1 Lattice theory The mathematical underpinning of the idea of approximations, soundness etc. Useful/ relevant in many analyses Partial order with the ordering relation representing the notion of approximation Notions of joins, meets, product lattices, functions over lattices, monotonicity and fixed points. Details in lattice.pdf and lattice-others.pdf Example lattices Consider the set S = {1, 2, 3}. (S, \subseteq, \cup, \cap, \phi, S) is a lattice of subsets of S ordered by the subset relation (which is a partial order). The LUB (join) operator is union as it gives the smallest element larger than two elements that contains both the elements. Similarly, the GLB (meet) operator is set intersection. The bottom element is the empty set and the top element is the entire set. Consider the set S = {1, 2, 3, 4, 6, 8, 12, 24} and the 'divides' relation ( ). is a partial order as x x \forall x; x y \wedge x \neq y => y does not divide x x y \wedge y z => x z (S,, LCM, GCD, 1, 24) is a complete lattice. LCM is the join/lub operator as it gives the least element 'larger' (i.e. multiple of) than any two given elements under the chosen ordering. Similarly, GCD is the meet/glb operator, least element (bottom) is 1 which divides everything and greatest (top) element is 24 which everything divides. Both these lattices can be extended to all natural numbers (> 0). But would result in lattices of infinite height and infinite 'width'. Lattice of signs {Bottom, +, 0, -, Any} Bottom <= x \forall x Any >= x \forall x +, -, 0 unrelated to each other Note: One can have other sign lattices too with elements such as non-negative, non-positive and non-zero to represent other classes of numbers Finite lattice (with obviously finite chains) Lattice of intervals Elements of the form [x, y] [x, y] <= [a, b] iff x >= a \wedge y <= b, i.e. a 'tighter' interval is lower than a more loose one. Bottom element is the empty interval (special case for the <= relation defined above) 5

6 2.2 Analysis techniques Top element is the complete interval [-\inf, \inf] Infinite height and width Initially, focus only on single procedure programs. Inter-procedural analysis introduced later Three approaches to program analysis Data flow analysis Abstract interpretation Non-standard type inference The three approaches not independent just different ways of looking at the problem. Sometimes ideas from multiple approaches work best. Running example program across all techniques: S0: ENTRY S1: x = 10 S2: while (x < 100) do S3: if (x > 0) S4: x = x - 3 else S5: x = x + 2 fi S6: x = x * 4 od S7: EXIT Example analysis: Sign analysis Five elements (bottom, 0, +, -, top/any/+-) ordered in the usual way 2.3 Data flow analysis Developed primarily in the context of program optimization References: Kildall 73, Hecht 77, ASU 86. Uses two abstractions The program is always abstracted to a control flow graph (see below) The information abstraction depends on the analysis. This defines the desired lattice (called L) A control flow graph (CFG) abstracts a program Description of CFG with example A node for every basic statement (can be extended to basic blocks ) An edge for every possible transfer of control Cycles in the presence of loops Not every path in the CFG may be a path in P (even if every branch in the CFG may have an equivalent). (Why?) Develop (monotone) functions corresponding to basic constructs in the program Each node type corresponds to a 'basic construct' of the language 6

7 Flow functions that describe what happens to property of interest when the program executes that construct Set of monotone, abstract flow functions defined for each node type : F <L, F> defines a data flow problem Given a program P and a <L,F> pair Build a CFG for P Instantiate F for each node in P Assume some information (usually no information ) at program entry Results in a set of (mutually recursive) equations for information at each program point Can be in of node, out of node or just one of them. Mutually recursive equation set Multiple solutions! For example, in interval analysis, all variables having [-inf, +inf] interval or the computed interval Lattice properties ensure existence of solution. See p 27 of lattice.pdf Least (Greatest in data flow literature) fixed point gives us the best solution for the chosen abstraction. Two questions How to solve the mutually recursive system of equations? What is the relationship between this solution and desired analysis property? Solution can be through iterative or elimination approaches. Guaranteed to terminate if chains are finite. Example worked through iterative approach. The concept of meet-over-all-paths (MOP) solution Assume all paths are executable The desired solution at a point is the meet of information at that point along all paths to that point Meet to combine information Information along a path is just function composition of individual functions Discovered fixed point is equal to MOP solution if the analysis framework (i.e. all flow functions) is distributive Distributive framework: f (x \meet y) = (f x) \meet (f y) Even if framework not distributive, solution is conservative That is discovered solution is a safe approximation of MOP solution, i.e. <= MOP solution in the upside down data flow lattice The classical 'separable' or 'bit-vector' problems Available expressions, 'reaching definitions', 'live' variables, 'very busy' expressions Each 'element' (expression, definition, variable, expression) can be dealt with independently That is, whether an expression is available or not etc. does not depend on any other expression Note: Terminology confusion between data-flow analysis and other semantics / analysis literature Semantics / abstract interpretation literature typically orders lattices such that the 'smaller' elements represent more precise information and therefore, the 7

8 desired solution is the least fixed point. This is computed by repeated application of the function to the bottom element of the lattice, and the best approximation of any two elements is the join or LUB. Data flow analysis literature typically orders lattices such that the 'larger' elements represent more precise information and therefore, the desired solution is the maximal fixed point. This is computed by repeated application of the function to the maximal element of the lattice, and the best approximation of any two elements is the meet or GLB. Hence the term MOP solution. In other words, the two sets of literature view the lattice 'upside-down' with respect to the other. 2.4 Abstract interpretation Use running example Formally defines multiple levels of semantics References: CoCo 77, CoCo 79, Nielson-Nielson-Hankin (PPA) book Lowest level of semantics describes actual program execution Semantic state transformer functions defined for each construct Overall program semantics is defined as a fixed point of (composition of) these functions These functions operate over a concrete domain of values Integers, Booleans, physical memory locations etc. Each domain is described by a (natural) lattice This is the concrete interpretation the base abstract interpretation Each analysis is now described as an abstract interpretation A (semi- or complete) lattice of abstract values (intervals, signs etc.) with its own ordering An abstract semantics, i.e. a semantic function for each construct that operates on the abstract values The analysis itself is now the semantics as derived from these abstract functions Fixed point computation to determine analysis solution Existence of fixed point, its computability etc. follows from lattice theory Semantics can be described at any 'level' and in any 'form': Eg: the CoCo77 papers work on 'trace-like' semantics, i.e. semantics of flow-chart like programs But it can also be a denotational semantics (or big step semantics) etc. Also see absint.pdf Consistent abstract interpretations To prove correctness of analysis Define a pair of functions abstraction (alpha) and concretization (gamma) functions from concrete to abstract domains and vice-versa These functions may introduce loss of information gamma. alpha >= id; alpha. gamma <= id Alpha and gamma may form a Galois connection best abstraction Use these functions to show correctness 8

9 Showing that information loss in alpha/gamma is consistent (commuting diagram on p 242 of CoCo77 paper), i.e. safe approximations Often fixed points impossible or expensive to compute Extremely long chains Example: Interval analysis Means of approximating fixed points Widening operators (from interval widening i.e. approximating) to ensure safe approximation of least fixed point reached Narrowing operators to get you back towards the least fixed point while always remaining a safe approximation Examples Widening operator (p 246, 247, Sec of CoCo 77) Widening operator not commutative, unlike the join operator! Narrowing operator (p 248, 249, Sec of CoCo 77, and example) Diagrams to explain widening / narrowing (p 249) Note: The CFG of data flow analysis itself is an abstract interpretation (albeit a very low level abstraction) Basically throwing away some control flow information but keeping all the data flow information 2.5 Non-standard type inference Use running example Logic-based approach to analysis Consider the problem of program analysis as a type inference problem where the information we require corresponds to the types to be discovered (and the corresponding program constructs are the identifiers or variables to whom we need to assign types) Such types should form an inclusion or subtype hierarchy this corresponds to the information lattice Similar to the flow functions of data flow analysis and abstract semantic functions of abstract interpretation, you define typing rules Typing rules specify when a construct is well-typed, i.e. what is the best type assignment to the constituents of a construct which would make them mutually consistent Example: y + z is the construct and (say) + is known to be an operator Arith x Arith -> Arith. Then the typing rule for the addition construct would say the typing is consistent if y is Arith, z is Arith and the expression itself is also Arith. For y < z, the last part would change to Bool. Since we often want information at each point in the program, there may be different types associated to each program point This in turn, means that the type of a statement is really a type transformer i.e. modifying a type to another, just as a flow function or semantic function Defining the analysis Similar to defining the lattice / flow functions / semantic functions 9

10 Deciding on the set of types, their inclusion relationship and the typing rules Performing the analysis Finding the most general types (equivalent to best approximations) under the giving typing rules for a given program Done similar to type inferencing algorithms such as Milner s Could be expensive depends on the typing hierarchy Can have special subsumption rules to help approximate faster Similar to widening Also see TypeSyst.pdf 2.6 Inter-procedural analysis Analysis of programs with multiple procedures / functions raises issues different from analysis of single procedure programs Two papers to go through One, introducing general techniques of inter-procedural analysis Two, better algorithms for a specific class of analyses 2.7 Two approaches to inter-procedural analysis Classic Sharir, Pnueli paper (1981) laying out the general techniques In a data-flow analysis context Consider an inter-procedural control flow graph (ICFG) with calls connected to procedures and returns connected to the node(s) after the call. Example on p 198, Fig 7-1 of paper Many inter-procedural paths in the ICFG are obviously wrong since a return must return back to the corresponding call The set of inter-procedurally valid paths (IVP) in an ICFG is a subset of the set of paths in the ICFG Considering all paths in the ICFG sound but highly imprecise Formally, the set of paths in a CFG are generated by a regular language, while the set of IVPs in an ICFG are generated by a context-free language. Because of the need to simulate a call stack There is also a need to handle scopes, lifetimes of variables, parameter passing etc. These are ignored for now, as they are easy to do. Therefore, need techniques to only consider IVPs rather than all paths. Broadly, two approaches to address the problem Functional approach Call strings approach Functional approach Define functional equations for the information at a point in terms of the information at the entry of the procedure, along IVPs [pp of paper] Solve for the functional equations to obtain solutions that are functions Existence of the solution depends on the height of the function lattice Approximation techniques can be used 10

11 Having obtained functions for each point along IVPs from its procedure entry, actual information at each point can be found using another set of recursive (non-functional) equations [pp of paper] Example on page of paper Can show that this approach yields the MOP solution over IVPs if the functions are distributive, and that it yields a sound approximation if the functions are non-distributive. Proof on pp of paper. Practical problem: to represent the computed functions efficiently A purely iterative algorithm to implement the functional approach is also possible Algorithm on pp of paper Example on pp Does not explicitly represent functions but directly applies the function (only) to values occurring in the analysis But does not necessarily make it cheap In fact, may not even converge - if the chains are infinitely high etc. But guaranteed to yield a correct result if it converges The call strings approach Resembles the iterative data flow approach Explicitly carry the call stack with the information Propagate only relevant information back from return edges using the call strings at return nodes Obvious problem when call strings unbounded (due to recursion) Formal definitions of call strings and their extensions for each edge in the CFG [pp 212 of paper] Using these definitions, define an augmented data flow framework for the inter-procedural case: <L*, F*> L* defines functions from call strings to L (equivalently pairs of call strings and lattice values). This is the information at a point in the new framework i.e. information at a point is parametrized by its calling context hence the name context sensitive analysis F* defines functions over L* and is derived from F and properties of call strings. Functions for inter-procedural edges change only the call-string part Functions for intra-procedural edges change only the L part Should be closed under composition and meet, and contain the identity function. Note: this framework depends on the ICFG and not independent unlike in the intra-procedural case Solving the resultant data flow equations results in the MOP solution over all IVPs. After solving, the solutions for all the call-strings merged (through a join) to get the solution valid for all call-sequences (and eliminating the call-strings in the process). See p 215, 7-12 of paper. Proof given in paper. 11

12 Solution may not converge for recursive programs even if L is finite. But possible to converge by choosing appropriately finite subset of call strings Finite prefix closed subset of all call strings Basically choose them long enough to allow convergence even if longer paths possible Height of lattice * longest cycle in call graph. Or for simplicity, size of lattice * number of calls. But the size of the call string set may be still too huge But not so in the case of the so called separable problems where the effective height of the lattice is 1! Example pp Special case of inter-procedural analysis Reps-Horwitz-Sagiv paper of 1995 For a special sub-class of problems, one can have efficient algorithms for precise inter-procedural analysis The class is defined as the set of problems which have distributive transfer functions and finite data-flow facts i.e. transfer functions are from (finite) sets of facts to (finite) sets of facts, and the meet operator is either union or intersection. Inter-procedural, finite, distributive, subset (IFDS) problems Transfer (flow) functions associated with edges This includes the classical bit-vector or separable problems and also problems such as copy-constant propagation, possibly uninitialized variables etc. Precise inter-procedural analysis for IFDS problems is reduced to an equivalent problem of graph reachability over IVPs (or equivalent). Adapts the functional approach of Sharir-Pnueli Interprocedural control-flow graph ( supergraph in the paper) has 4 kinds of edges: Normal intraprocedural edges Edges from call nodes to entry nodes Edges from exit nodes to return nodes Edges from call nodes to return nodes Example: Fig 1, p 3 of paper Each flow function f from 2 D -> 2 D is mapped to a binary relation R f over (D union {0}) where {0} represents the empty set. R f has at most (D+1) 2 elements. R f is defined as follows: (0, 0) \in R f \forall y \in f(\phi). (0,y) \in R f \forall y \in f({x}). (x,y) \in R f if y \not\in f(\phi) Basically the bottom element maps to itself. The bottom element also maps to all those elements that are generated (i.e. obtained by applying f to the empty set in other words independent of the input) If the singleton x maps to y, then (x,y) is also part of the relation Some examples on p 4, Sec 3 of paper. Mapping from a representation relation to a function is also easily possible: [R] (X) = ({y \exists x \in X. (x,y) \in R} union {y (0,y) \in R }) \ {0} Easy to see that [R f ] = f. 12

13 Composition of two flow functions also maps to composition of corresponding relations R f ; R g = { (x, y) \exists z. (x,z) \in R f \wedge (z, y) \in R g } Therefore, path functions are compositions of relations, i.e. [R f ; R g ] = g \circ f In other words, if relation expressed as graph, composition of path functions is equivalent to tracing a path in the graph! Translating the IFDS problem to a graph reachability problem Associated with every flow graph node, have D+1 points corresponding the elements of D and 0. Exploded super graph whose nodes are pairs consisting of original ICFG nodes and a data-flow fact Corresponding to flow function f of every edge, connect corresponding points as defined by R f. Assuming no information is available at the entry of main, the solution to the IFDS problem is simply the set of points reachable from the point <entry of main, 0> along IVPs. But, how to determine reachability along IVP? Done by a work-list algorithm using path edges and summary edges Similar in spirit to the Sharir-Pnueli functional approach without actually computing functions Path edge is an edge of the form <e p, d1> to <n, d2> where e p is the entry of a procedure containing node n. Indicates that there is an IVP from the <entry of main, 0> to <e p, d1>, and a same-level IVP from <e p, d1> to <n, d2>. In other words, the data-flow fact at the target of the path edge is part of the IFDS solution at that node. Summary edges similarly capture the effect a procedure, i.e. they are edges of the form <c, d1> to <r, d2> where c is a call node and r its return node. It represents information that (may have) passed through the procedure called. Algorithm for computation of path and summary edges in Fig 3 (page 7) of paper. Detailed example with possibly uninitialized variables problem in Fig 1 (page 3) and Fig 2 (page 5). Some special cases such as h-sparse and separable problems lead to more optimal algorithms. Complexities of all given in Table 5.2 (page 9). Worst case is O(ED 3 ). Becomes O(ED) for separable problems. 'Real' analyses of course have to contend with many more issues/constructs!! Arrays Records/structures Polymorphism Sub-typing / inheritance Pointers... 13

14 2.9 Analysis problems to focus on Equivalence between expressions General question: Does expression x + 2y have the same value at a program point as expression c 3b + d? (Obviously!) un-decidable Still useful to find approximate solutions and solve special cases as many applications Software verification Does assertion x + y = 3 * z hold at this point? Constant propagation Does variable x have constant value c? Or, does x c = 0 hold? Replace variable with constant, if so. Copy propagation Do two variables hold the same value? Or, does x y = 0 hold? Can be used in, say, efficient register usage. Common sub-expression elimination Do x+y and c+d have the same values, and has one of them already been computed? If so, use it in place of the other. Alias / pointer analysis Alias analysis: Do two names refer to the same location? Eg., <*p, *q> if p, q are pointers to the same type, and can point to the same memory location (stack or heap) at that point in execution. Points to analysis: What are the names to which a pointer may point to? Eg., <p, x>, <q, x> if both p and q may point to x at a point in execution. 3. Herbrand equivalence analysis 3.1 Problem definition Most general problem statement During the execution of the program, find the relationships between values of expressions that hold at each point. Eg. at point p, x 3 2xy + 3yz 23 <= 0 Similar to discovering program invariants! Obviously undecidable in most general form Many simpler variants One variant Finding relations (equality and inequality) among linear expressions Essentially, restrict the kind of expressions being dealt with Herbrand equivalence of expressions Operators in the expression are un-interpreted So, only structural equivalence to be checked Herbrand equivalence => expression equivalence but not the other way round, i.e. sound but incomplete, as desired. Can be solved 'precisely', even though expensive. 14

15 Expression x+y equivalent to a+b only if both can be reduced to structurally equivalent expression Say, at the point in question, x has value c, y has value d+e; while a has value c+d and b has value e Both reduce to c+d+e and hence Herbrand equivalent Is this right? Not really, as it needs the knowledge that + is associative! Applications (Copy) Constant propagation Common sub-expression elimination Invariant code motion Detecting / verifying invariants 3.2 Cousot-Halbwachs (1978) Focuses on linear restraints (which include linear equalities and inequalities) so closer to invariant discovery but restricted to linear expressions Tries to find relationships such as x + 3y z <= 0 Uses abstract interpretation approach In a sense, completely orthogonal to Herbrand equivalences Here operators are interpreted Subsumes Herbrand equivalences and therefore constant propagation, available expressions etc. But only for linear expressions Lattice of linear restraints with geometric representation Each restraint (i.e. equality or inequality) represented by (an approximation of) the set of points allowed by it Geometric interpretation (one 'dimension' per variable) 'Continuous' domains Example of space for a set of restraints on p 86 of paper Partial ordering by geometric inclusion If the region of one restraint subsumes the region of another, it is more approximate Basically allows more values Lattice of infinite height (and width!) So need widening to converge Determining the region from a set of restraints (and vice-versa) requires a lot of complex math Polyhedra, convex hulls, frames and so on Usually approximations used as exact intersection of two restraints may be impossible or very hard to find Intersection of two regions represents the merge (meet) information The abstract semantic functions Describe how the state of restraints is transformed by each statement Semantic function for assignment has to consider different possibilities such as assignment of non-linear expressions, linear expressions etc. Semantic functions given from p 90 to p 93 of paper for flow chart programs 15

16 Assignment of non-linear expression to x means we know nothing about value of x. So other relationships involving x have to be modified to eliminate x - cutting out one dimension! Assignment of linear expressions does not result in elimination of a dimension but requires complex geometric jugglery Similarly, linear and non-linear equality and inequality tests also change the set of restraints Start with initial restraints (relationships among input parameters) Keep applying semantic functions (with widening) until it saturates Example from the paper on p 94 (Sec 5) of paper Pictorial intuition to how it works, including widening Expensive but comprehensive technique Example given in paper (p 84) finds a whole bunch of inequalities for bubble sort Shows the power of program analysis even if not practically feasible 3.3 Kildall s paper (1973) The paper proposed the idea of a general data flow framework and the meet-over-all-paths solution One of the problems addressed was common sub-expression elimination If the equivalent of an expression has already been computed, do not compute it again At each program point, compute a set of equivalent expressions Abstract lattice Partitioning of expressions (computed so far) Each equivalence class containing equivalent expressions Ordering corresponds to partition refinement P1 <= P2 if P1 is a 'coarser' partitioning 'More expressions are equivalent' in P1 Example {{a,b,c,d},{e,f}} <= {{a, b}, {c, d}, {e, f}} {{a,b,c}, {d,e,f}}?? {{a,b}, {c,d}, {e,f}} : These partitionings are unrelated! Join is difficult Prune equivalence classes to relevant ones Details on p 198 of paper Identify expressions common to the two partitionings For each common expression, intersect corresponding partitions to get partitions of the joined partioning The flow function works on the equivalence classes depending on the computations inside a node (p 198 of paper) Assume a partitioning P at the entry to a node N For each (partial) computation exp at N, if exp is already in some partition of P, it is redundant Else, create a new partition for exp 16

17 Also have to add other elements to exp s partition depending on equivalences of exp s sub-expressions Infer new expressions to be added to equivalence classes to make them complete ( structuring the equivalence classes) Eg., if exp is a+b, and a is in a partition with c+d and b is in a partition with e+f, then have to add all of a+e+f, c+d+b, c+d+e+f to the partition with exp. These operations make it hard If the node has an assignment (say v = exp) Remove all expressions containing v from their partitions. For all expressions exp' that have exp as a sub-expression, create a new entry in the partition with exp replaced by v. The flow function is also distributive! Can easily add constant propagation to this function Add constants also to the equivalence classes If operands of an expression are in an equivalence class with constants, then compute the expression itself If computed constant has a class of its own, add this expression to that class, else add the constant to the class of the expression Basic complexity exponential because of trying to deal with partitions, and trying to complete partitions Basically have to look at all possible ways of combining operands based on equivalence classes of operands Meet also expensive Global value numbering Number the partitions and represent expressions by operators operating on partition numbers Decreases number of expressions in a partition Brings down the cost Small examples on pp 203 of the paper of decrease in partition size Meet becomes more complex Need to recover the 'hidden' information from value-numbered expressions Details on p 203 of paper if required And complexity remains as bad Example on p 235 (Fig 1) and p 236 (Fig 2) of Ruthing-Knoop-Steffen (RKS) paper and discussion on difficulties with Kildall on p Alpern-Wegman-Zadeck (1988) A simple, cheap algorithm to detect some but not all equivalences between variables (and expressions) Single global data-structure suffices unlike Kildall Data flow framework Uses an SSA (static single assignment form) representation Replace each variable by many copies so that only one assignment to a copy Introduce merge functions (non-deterministic phi functions) at join points 17

18 Phi-functions subscripted by the node in which they belong (Fig 3 on p 3 of paper) Otherwise might detect two phi-expressions as equivalent just because they have equivalent operands, even if the conditions under which the branching (corresponding to the phi) occurred was different Given SSA form, a value graph is built for the program that describes how the SSA variables are related So, if x1 = y0 + z1 and z1 = x0 * 3: there would be a + node labelled with x1 with edges to nodes labelled y0 and z1 The z1 node would have operator * and have edges to nodes labelled x0 and 3. Given the value graph representation, it does a partition refinement (on the lines of FSA minimization) to merge or collapse isomorphic parts of the graph Two nodes with the same operator and dependences would be collapsed into one partition (and carry labels from both the nodes) Partitioning algorithm in Fig 7 (p 5) of paper Begin with putting all expressions with common root operator in same partition Then for each partition of expressions of a particular operator If corresponding (m'th) operands are not in the same partition, split the original partition so that all operands of operators in one partition come from the same partiton Keep doing this in a worklist until no more partitions can be split After partitioning, variables sharing a node are said to be congruent Congruence implies equivalence (not the other way around) That is, sound but not complete Slightly super-linear, so cheap to do Fig 3, 4 (Sec 4, p ) of RKS paper gives example where this algorithm works Fig 5, 6 (Sec 4, p 240) of RKS paper gives example where this algorithm does not work Because phi functions also treated as un-interpreted operators So, right at the beginning (most optimistic point), an expression rooted at a phi can never match an expression rooted at a proper operator meaning it can never later be merged Even if they are the same inside the phi 3.5 Ruthing, Knoop, Steffen (1999) An improvement to AWZ paper One problem with AWZ is that it misses out on many equivalences since phi-nodes also left un-interpreted. Overcomes this lacuna of the AWZ by interpreting phi nodes, i.e. distributing the phi over an operator Presents two simple graph rewrite rules to do this Calls them normalization rules as the aim is to convert the value graph to a normal form 18

19 One rewrite rule just eliminates a phi node both of whose operands are the same. The variables associated with the phi are moved to the operand Another distributes the phi over an operator if both operands of the phi are the same operator Exact rules in Fig 7 (p 242) of paper The rewrite system consists of these two rules plus the graph partitioning exercise the three together are repeatedly applied The rewrite system is sound Because each of the three rules is The rewrite system also has the nice desired properties of confluence and termination The rewriting process will terminate It does not matter in what order you apply the rules where in the graph Given multiple choices for rule application, you can choose one randomly May of course affect how many steps it takes to converge! Example in Fig 8 (p 243) for confluence The rewrite system is also complete for an acylic program Proof in the paper Examples in Fig 9 and Fig 10 (p 244 of paper) The failure cases for AWZ which work here Complexity O(n 4 log(n)) in the worst case (repeated partitioning is the most expensive step) O(n 2 log(n)) expected in practice Example where RKS is incomplete: Can we detect that x and y have the same values after the first assignment inside the loop (even assuming we can perform computations involving constants)? x = 0; y = x + 1; while (C1) { } x = x + 1; if (C2) else x = x + 1; y = y + 2; x = x + 2; y = y + 3; 3.6 Probabilistic approaches (Gulwani-Necula) Both AWZ and RKS incomplete but sound Can we get completeness if we are ready to sacrifice soundness? That is what probabilistic approaches try to do The probability of error should be very small Confidence in results 19

20 Typically, tuneable lower the error probability, greater the cost So, you choose the cost-precision trade-off 3.7 Discovering linear equalities Discovers relationships of the form a 1 x 1 + a 2 x a n x n + c = 0 where the a i s and c are constants So, only discovers specific kind of relationship between variables Generalization of constant propagation Not all Herbrand equivalences Idea is extremely simple Just run the program on different sets of (randomly chosen) input values So multiple parallel executions Each execution results in a state at a point Collection of states called a sample At each branch point, ignore the condition and execute both the branches! At join points, combine the values obtained from the two branches using a freshly chosen random affine combination of weights That is, values on one branch are given weight w and on the other are given weight (1 w) - an affine combination One weight per state in the sample Example in Fig 1, p 2 of the paper Multiple executions help decrease probability of error Particularly for expressions of the form x = k Geometric intuition behind the idea Each state is a point in n-dimensional space Each sample is a set of points in the n-dimensional space Merging two samples using affine combinations of weights Is randomly choosing a point on the line connecting the two points representing the two states So merger of two samples picks a set of points on lines joining corresponding pairs of states in the two samples Example in Fig 2, p3 of paper Completeness: Such affine combinations preserve any linear relationship (of the kind desired) And since the desired property is expected to be valid for all executions, it should be valid for the sample executions too! Lemma 1, p 4 of paper, and associated proof 'Almost' soundness: Has a very low probability of satisfying any non-existent linear relationship Lemma 2, p 4 and associated proof Schwartz's theorem (Theorem 1, p3 of second Gulwani-Necula paper) Relation to testing Testing equivalent to choosing affine weights 0 and 1. Example program in Fig 1, p2 of paper 20

21 3 paths exhibit relationship while only one doesn't - still easy to catch by this method while testing will only find that the relationship does not hold if exactly that path is executed. Intersection of spaces Identifying relationships within branches (of linear conditions) Example in Sec 5, p 4 of paper, also in Fig 3 Need to 'derive' a sample that satisfies the given condition from the current sample Geometric intuition: Points in the sample 'projected' on to the hyperplane represented by the condition But projection not orthogonal so as to preserve linear relationships But as given in Fig 4 (p 5). Connect two samples with a (hyper)line Choose a point on this hyperline different from all other points and not on the hyperplane of the condition Draw hyperlines from this chosen point to all other points and take the intersections of these hyperlines with the hyperplane. These points will all satisfy the linear relationships found so far and will also satisfy the chosen condition. Note: in the process one point in the sample is 'sacrificed' as the original two samples will both result in the same final point. Details in Fig 3, Fig 4 But all this goes well beyond Herbrand equivalences, as you detect value equivalent expressions and not just Herbrand equivalent ones!! Technical details regarding union of spaces, soundness, completeness, fixed point computation available in the paper. 3.8 Global value numbering using random interpretation Extends the discovery of linear relationships to discovering Herbrand equivalences Kildall s is exponential, AWZ is efficient but highly imprecise, Ruthing-Knoop-Steffen is in between. This one catches as much as Kildall in polynomial time but is (probabilistically) unsound. Unsoundness probability tuneable through parameters Idea is to choose random interpretations for operators and execute the program to discover relationships, rather than leaving them uninterpreted. Previous paper chose random affine interpretations for the phi-operators and natural interpretations for the linear operators Joins are still treated using affine combinations. Interpretation for each operator F can be made by choosing p parameters from field L E.g. if p = 2 and F is binary, F(a,b) may be interpreted as p1 * a + p2 * b (where p1 and p2 are the two random parameters ) Interpretation should be linear as it should distribute over the interpretation for the affine join (\phi) combinations. Equation 5 on p 4 of paper. Unfortunately, this is not (probabilistically) sound as two distinct functions can easily get the same interpretation (Fig 3, p 4 of paper). No point having more than 2 or 3 parameters for a binary operator with a linear interpretation, but these are too few to distinguish between the 21

22 different leaves of a complex reason as there could be more than 2 leaves in the expression. To overcome this, choose k parallel values for each variable (and therefore expression) Uses 4k 2 parameters for the interpretation, namely r1 rk, r1 rk, s1 sk-1, s1 sk'-1. Interpretation of ith (i between 1 and k) linear interpretation of F is a regression as follows: P(x,i) = x P(F(e1,e2),1) = r1 * P(e1, 1) + r1 * P(e2, 1) P(F(e1,e2),i) = ri * P(e1,i) + ri * P(e2,i) + si-1 * P(e1, i-1) + si-1 * P(e2, i-1) Degree of P(e,i) is same as depth of e. Implicit ordering among parameters, i.e. P(e,i) does not contain ri+1 rk or si sk (and their primed varieties). Can be shown that this interpretation is sound, i.e. if P(e1,i) = P(e2,i) for i > j (where j is log of max leaves of e1 and e2), then e1 = e2 under Herbrand equivalence. Lemma 7, p 5 of paper. Essentially, induction on the depth of the expression(s). Work out interpretations of example in Fig 3 (p 4) of paper to show that this interpretation does distinguish between the two non Herbrand equivalent expressions (if required). Therefore, max value of k should be log of depth of largest expression in program. This also appears very conservative, i.e. it can be smaller. This gives us a sound interpretation for each operator as a polynomial over the parameters, where soundness is defined as: if the interpretations of two expressions are equal, they are Herbrand equivalent. Essentially, defines a non-standard semantics for Herbrand equivalence. The analysis proceeds by random interpretation similar to the previous paper Run the program on a sample of size k Choose interpretations for each operator as shown earlier (by picking 4k-2 parameters and an interpretation) Compute values of expressions not necessarily by computing polynomials first, but can compute values directly. See function V on page 5 of paper. A sample S satisfies Herbrand equivalence e1 = e2 if V(e1, k, S) = V(e2, k, S). Note: comparing the kth values of the polynomials as they have the greatest distinguishing power. At join points, perform random affine joins. Fixed point (i.e. same set of Herbrand equivalences attained) guaranteed as lattice has finite depth bounded by n, the number of program variables (page 7). Intuition: All possible herbrand equivalences can be represented by a pair (I,E) where I is a set of independent variables and E is a set of expressions of the form x=e, one for each non-independent variable such that e contains only variables from I. So, if one set of Herbrand equivalences is 'less than' the other, then the 'lesser' one has lesser variables in I. So, any chain length is bounded by the number of program variables. (Lemma 13 of paper) 22

23 Error probability derives directly from Schwartz s theorem regarding probability of random values being the root of a polynomial Turns out the error probability is d/l (for one union operation) where d is the degree of the polynomial and L is the size of the field from which random values are chosen Obviously, bigger L is, the better! Probability of error of whole analysis <= (2n 2 + t) / L, if k >= (2n 2 + t) where n is the max of number of variables, function applications and join points; t is the max depth of expressions. Probability can be still further decreased by performing the random interpretation m times, decreasing the error probability to:((2n 2 + t) / L) m 4. Alias / Pointer analysis 4.1 Problem definition Do two names refer to the same object, or does a pointer point to a given object? Relevant for most analysis problems in modern languages First analysis whose results are used by other analyses Examples Array over-flow analysis: both subscripts and arrays: a [*p], p[*q + 3] Escape analysis for security: Which objects are leaked? Constant propagation / common sub-expression elimination etc. Flavours of the problem Exact definition will depend on the particular language's semantics. For example, C,C++ with explicit pointers that may point to the stack or heap v/s Java which has only object references that point only to heap. Soundness Typically interested in superset of 'actual' aliases That is, it is OK to say that a and b are aliased when they are not, but it is not to OK to miss out the alias pair (a,b) if there may be an execution on which (a, b) may be aliased Because missing out such aliases may result in unsound predictions. For example, one may conclude that an array index is never out of bounds as we missed out a possible alias pair and hence a possible array index value. 4.2 Theoretical complexities Bill Landi / Barbara Ryder Complexities of aliasing problems in languages like C <a,b> is an alias pair at a program point if they refer to the same location. Typically, each of a and b is something of the form *p or x or p->left etc. Binary relation over the space of names. Reflexive and symmetric relation but not transitive! Why? (<a,b>, <b,c> could be along different paths, so <a,c> may never hold!) 23

A Gentle Introduction to Program Analysis

A Gentle Introduction to Program Analysis A Gentle Introduction to Program Analysis Işıl Dillig University of Texas, Austin January 21, 2014 Programming Languages Mentoring Workshop 1 / 24 What is Program Analysis? Very broad topic, but generally

More information

arxiv: v2 [cs.pl] 16 Sep 2014

arxiv: v2 [cs.pl] 16 Sep 2014 A Simple Algorithm for Global Value Numbering arxiv:1303.1880v2 [cs.pl] 16 Sep 2014 Nabizath Saleena and Vineeth Paleri Department of Computer Science and Engineering National Institute of Technology Calicut,

More information

Static Analysis. Systems and Internet Infrastructure Security

Static Analysis. Systems and Internet Infrastructure Security Systems and Internet Infrastructure Security Network and Security Research Center Department of Computer Science and Engineering Pennsylvania State University, University Park PA Static Analysis Trent

More information

Advanced Programming Methods. Introduction in program analysis

Advanced Programming Methods. Introduction in program analysis Advanced Programming Methods Introduction in program analysis What is Program Analysis? Very broad topic, but generally speaking, automated analysis of program behavior Program analysis is about developing

More information

Static Analysis and Dataflow Analysis

Static Analysis and Dataflow Analysis Static Analysis and Dataflow Analysis Static Analysis Static analyses consider all possible behaviors of a program without running it. 2 Static Analysis Static analyses consider all possible behaviors

More information

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013 Principles of Program Analysis Lecture 1 Harry Xu Spring 2013 An Imperfect World Software has bugs The northeast blackout of 2003, affected 10 million people in Ontario and 45 million in eight U.S. states

More information

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014 CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014 1 Introduction to Abstract Interpretation At this point in the course, we have looked at several aspects of programming languages: operational

More information

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997 1) [10 pts] On homework 1, I asked about dead assignment elimination and gave the following sample solution: 8. Give an algorithm for dead assignment elimination that exploits def/use chains to work faster

More information

COS 320. Compiling Techniques

COS 320. Compiling Techniques Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly

More information

Program verification. Generalities about software Verification Model Checking. September 20, 2016

Program verification. Generalities about software Verification Model Checking. September 20, 2016 Program verification Generalities about software Verification Model Checking Laure Gonnord David Monniaux September 20, 2016 1 / 43 The teaching staff Laure Gonnord, associate professor, LIP laboratory,

More information

Program Static Analysis. Overview

Program Static Analysis. Overview Program Static Analysis Overview Program static analysis Abstract interpretation Data flow analysis Intra-procedural Inter-procedural 2 1 What is static analysis? The analysis to understand computer software

More information

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts Compiler Structure Source Code Abstract Syntax Tree Control Flow Graph Object Code CMSC 631 Program Analysis and Understanding Fall 2003 Data Flow Analysis Source code parsed to produce AST AST transformed

More information

Static Program Analysis CS701

Static Program Analysis CS701 Static Program Analysis CS701 Thomas Reps [Based on notes taken by Aditya Venkataraman on Oct 6th, 2015] Abstract This lecture introduces the area of static program analysis. We introduce the topics to

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

Iterative Program Analysis Abstract Interpretation

Iterative Program Analysis Abstract Interpretation Iterative Program Analysis Abstract Interpretation Summary by Ben Riva & Ofri Ziv Soundness Theorem Theorem: If a computation fixed-point is sound, then its least-fixed-point is sound. More precisely,

More information

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM Flow Analysis Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM Helpful Reading: Sections 1.1-1.5, 2.1 Data-flow analysis (DFA) A framework for statically proving facts about program

More information

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Semantics via Syntax. f (4) = if define f (x) =2 x + 55. 1 Semantics via Syntax The specification of a programming language starts with its syntax. As every programmer knows, the syntax of a language comes in the shape of a variant of a BNF (Backus-Naur Form)

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Discrete Mathematics Lecture 4. Harper Langston New York University

Discrete Mathematics Lecture 4. Harper Langston New York University Discrete Mathematics Lecture 4 Harper Langston New York University Sequences Sequence is a set of (usually infinite number of) ordered elements: a 1, a 2,, a n, Each individual element a k is called a

More information

Lecture 6. Abstract Interpretation

Lecture 6. Abstract Interpretation Lecture 6. Abstract Interpretation Wei Le 2014.10 Outline Motivation History What it is: an intuitive understanding An example Steps of abstract interpretation Galois connection Narrowing and Widening

More information

Math 5593 Linear Programming Lecture Notes

Math 5593 Linear Programming Lecture Notes Math 5593 Linear Programming Lecture Notes Unit II: Theory & Foundations (Convex Analysis) University of Colorado Denver, Fall 2013 Topics 1 Convex Sets 1 1.1 Basic Properties (Luenberger-Ye Appendix B.1).........................

More information

Lectures 20, 21: Axiomatic Semantics

Lectures 20, 21: Axiomatic Semantics Lectures 20, 21: Axiomatic Semantics Polyvios Pratikakis Computer Science Department, University of Crete Type Systems and Static Analysis Based on slides by George Necula Pratikakis (CSD) Axiomatic Semantics

More information

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers

CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers CS 4120 Lecture 31 Interprocedural analysis, fixed-point algorithms 9 November 2011 Lecturer: Andrew Myers These notes are not yet complete. 1 Interprocedural analysis Some analyses are not sufficiently

More information

Program Analysis and Verification

Program Analysis and Verification Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 12: Interprocedural Analysis + Numerical Analysis Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav 1 Procedural program void main()

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Verifying Program Invariants with Refinement Types

Verifying Program Invariants with Refinement Types Verifying Program Invariants with Refinement Types Rowan Davies and Frank Pfenning Carnegie Mellon University Princeton and Yale Colloquium Talks February, 2001 Acknowledgments: Robert Harper 1 Overview

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 600.469 / 600.669 Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 9.1 Linear Programming Suppose we are trying to approximate a minimization

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

Chapter 4. Number Theory. 4.1 Factors and multiples

Chapter 4. Number Theory. 4.1 Factors and multiples Chapter 4 Number Theory We ve now covered most of the basic techniques for writing proofs. So we re going to start applying them to specific topics in mathematics, starting with number theory. Number theory

More information

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler

Compiler Passes. Optimization. The Role of the Optimizer. Optimizations. The Optimizer (or Middle End) Traditional Three-pass Compiler Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Synthesis of output program (back-end) Intermediate Code Generation Optimization Before and after generating machine

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week 02 Module 06 Lecture - 14 Merge Sort: Analysis So, we have seen how to use a divide and conquer strategy, we

More information

Sets 1. The things in a set are called the elements of it. If x is an element of the set S, we say

Sets 1. The things in a set are called the elements of it. If x is an element of the set S, we say Sets 1 Where does mathematics start? What are the ideas which come first, in a logical sense, and form the foundation for everything else? Can we get a very small number of basic ideas? Can we reduce it

More information

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p.

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p. CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections 10.1-10.3 p. 1/106 CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer

More information

(Refer Slide Time: 4:00)

(Refer Slide Time: 4:00) Principles of Programming Languages Dr. S. Arun Kumar Department of Computer Science & Engineering Indian Institute of Technology, Delhi Lecture - 38 Meanings Let us look at abstracts namely functional

More information

Exact Algorithms Lecture 7: FPT Hardness and the ETH

Exact Algorithms Lecture 7: FPT Hardness and the ETH Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,

More information

Discrete Mathematics. Kruskal, order, sorting, induction

Discrete Mathematics.   Kruskal, order, sorting, induction Discrete Mathematics wwwmifvult/~algis Kruskal, order, sorting, induction Kruskal algorithm Kruskal s Algorithm for Minimal Spanning Trees The algorithm constructs a minimal spanning tree as follows: Starting

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 INTERPROCEDURAL ANALYSIS The slides adapted from Vikram Adve So Far Control Flow Analysis Data Flow Analysis Dependence

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/22891 holds various files of this Leiden University dissertation Author: Gouw, Stijn de Title: Combining monitoring with run-time assertion checking Issue

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Fall 2016 Lecture 3a Andrew Tolmach Portland State University 1994-2016 Formal Semantics Goal: rigorous and unambiguous definition in terms of a wellunderstood formalism (e.g.

More information

A Propagation Engine for GCC

A Propagation Engine for GCC A Propagation Engine for GCC Diego Novillo Red Hat Canada dnovillo@redhat.com May 1, 2005 Abstract Several analyses and transformations work by propagating known values and attributes throughout the program.

More information

A Note on Karr s Algorithm

A Note on Karr s Algorithm A Note on Karr s Algorithm Markus Müller-Olm ½ and Helmut Seidl ¾ ½ FernUniversität Hagen, FB Informatik, LG PI 5, Universitätsstr. 1, 58097 Hagen, Germany mmo@ls5.informatik.uni-dortmund.de ¾ TU München,

More information

Lattice Tutorial Version 1.0

Lattice Tutorial Version 1.0 Lattice Tutorial Version 1.0 Nenad Jovanovic Secure Systems Lab www.seclab.tuwien.ac.at enji@infosys.tuwien.ac.at November 3, 2005 1 Introduction This tutorial gives an introduction to a number of concepts

More information

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np

Complexity Theory. Compiled By : Hari Prasad Pokhrel Page 1 of 20. ioenotes.edu.np Chapter 1: Introduction Introduction Purpose of the Theory of Computation: Develop formal mathematical models of computation that reflect real-world computers. Nowadays, the Theory of Computation can be

More information

Types and Type Inference

Types and Type Inference Types and Type Inference Mooly Sagiv Slides by Kathleen Fisher and John Mitchell Reading: Concepts in Programming Languages, Revised Chapter 6 - handout on the course homepage Outline General discussion

More information

Calvin Lin The University of Texas at Austin

Calvin Lin The University of Texas at Austin Loop Invariant Code Motion Last Time SSA Today Loop invariant code motion Reuse optimization Next Time More reuse optimization Common subexpression elimination Partial redundancy elimination February 23,

More information

Lecture Notes on Contracts

Lecture Notes on Contracts Lecture Notes on Contracts 15-122: Principles of Imperative Computation Frank Pfenning Lecture 2 August 30, 2012 1 Introduction For an overview the course goals and the mechanics and schedule of the course,

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

looking ahead to see the optimum

looking ahead to see the optimum ! Make choice based on immediate rewards rather than looking ahead to see the optimum! In many cases this is effective as the look ahead variation can require exponential time as the number of possible

More information

MA651 Topology. Lecture 4. Topological spaces 2

MA651 Topology. Lecture 4. Topological spaces 2 MA651 Topology. Lecture 4. Topological spaces 2 This text is based on the following books: Linear Algebra and Analysis by Marc Zamansky Topology by James Dugundgji Fundamental concepts of topology by Peter

More information

Lecture Notes on Alias Analysis

Lecture Notes on Alias Analysis Lecture Notes on Alias Analysis 15-411: Compiler Design André Platzer Lecture 26 1 Introduction So far we have seen how to implement and compile programs with pointers, but we have not seen how to optimize

More information

Abstract Interpretation

Abstract Interpretation Abstract Interpretation Ranjit Jhala, UC San Diego April 22, 2013 Fundamental Challenge of Program Analysis How to infer (loop) invariants? Fundamental Challenge of Program Analysis Key issue for any analysis

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Uncertain Data Models

Uncertain Data Models Uncertain Data Models Christoph Koch EPFL Dan Olteanu University of Oxford SYNOMYMS data models for incomplete information, probabilistic data models, representation systems DEFINITION An uncertain data

More information

Software Testing. 1. Testing is the process of demonstrating that errors are not present.

Software Testing. 1. Testing is the process of demonstrating that errors are not present. What is Testing? Software Testing Many people understand many definitions of testing :. Testing is the process of demonstrating that errors are not present.. The purpose of testing is to show that a program

More information

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012 CS-XXX: Graduate Programming Languages Lecture 9 Simply Typed Lambda Calculus Dan Grossman 2012 Types Major new topic worthy of several lectures: Type systems Continue to use (CBV) Lambda Caluclus as our

More information

Symmetry in Type Theory

Symmetry in Type Theory Google May 29th, 2012 What is Symmetry? Definition Symmetry: Two or more things that initially look distinct, may actually be instances of a more general underlying principle. Why do we care? Simplicity.

More information

Hardware versus software

Hardware versus software Logic 1 Hardware versus software 2 In hardware such as chip design or architecture, designs are usually proven to be correct using proof tools In software, a program is very rarely proved correct Why?

More information

Programming Languages Third Edition

Programming Languages Third Edition Programming Languages Third Edition Chapter 12 Formal Semantics Objectives Become familiar with a sample small language for the purpose of semantic specification Understand operational semantics Understand

More information

Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation TDDC90: Software Security

Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation TDDC90: Software Security Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation TDDC90: Software Security Ahmed Rezine IDA, Linköpings Universitet Hösttermin 2014 Outline Overview Syntactic Analysis Abstract

More information

Type Checking and Type Equality

Type Checking and Type Equality Type Checking and Type Equality Type systems are the biggest point of variation across programming languages. Even languages that look similar are often greatly different when it comes to their type systems.

More information

SORTING AND SELECTION

SORTING AND SELECTION 2 < > 1 4 8 6 = 9 CHAPTER 12 SORTING AND SELECTION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND GOLDWASSER (WILEY 2016)

More information

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University Static Program Analysis Part 9 pointer analysis Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University Agenda Introduction to points-to analysis Andersen s analysis Steensgaards s

More information

Spring 2017 DD2457 Program Semantics and Analysis Lab Assignment 2: Abstract Interpretation

Spring 2017 DD2457 Program Semantics and Analysis Lab Assignment 2: Abstract Interpretation Spring 2017 DD2457 Program Semantics and Analysis Lab Assignment 2: Abstract Interpretation D. Gurov A. Lundblad KTH Royal Institute of Technology 1 Introduction In this lab assignment the abstract machine

More information

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5

Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5 Formal Methods of Software Design, Eric Hehner, segment 24 page 1 out of 5 [talking head] This lecture we study theory design and implementation. Programmers have two roles to play here. In one role, they

More information

Matching Theory. Figure 1: Is this graph bipartite?

Matching Theory. Figure 1: Is this graph bipartite? Matching Theory 1 Introduction A matching M of a graph is a subset of E such that no two edges in M share a vertex; edges which have this property are called independent edges. A matching M is said to

More information

Lecture 3: Recursion; Structural Induction

Lecture 3: Recursion; Structural Induction 15-150 Lecture 3: Recursion; Structural Induction Lecture by Dan Licata January 24, 2012 Today, we are going to talk about one of the most important ideas in functional programming, structural recursion

More information

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION DESIGN AND ANALYSIS OF ALGORITHMS Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION http://milanvachhani.blogspot.in EXAMPLES FROM THE SORTING WORLD Sorting provides a good set of examples for analyzing

More information

Principles of Program Analysis: A Sampler of Approaches

Principles of Program Analysis: A Sampler of Approaches Principles of Program Analysis: A Sampler of Approaches Transparencies based on Chapter 1 of the book: Flemming Nielson, Hanne Riis Nielson and Chris Hankin: Principles of Program Analysis Springer Verlag

More information

2017 SOLUTIONS (PRELIMINARY VERSION)

2017 SOLUTIONS (PRELIMINARY VERSION) SIMON MARAIS MATHEMATICS COMPETITION 07 SOLUTIONS (PRELIMINARY VERSION) This document will be updated to include alternative solutions provided by contestants, after the competition has been mared. Problem

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

Analysis of Pointers and Structures

Analysis of Pointers and Structures RETROSPECTIVE: Analysis of Pointers and Structures David Chase, Mark Wegman, and Ken Zadeck chase@naturalbridge.com, zadeck@naturalbridge.com, wegman@us.ibm.com Historically our paper was important because

More information

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and Computer Language Theory Chapter 4: Decidability 1 Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

More information

Lecture Notes on Real-world SMT

Lecture Notes on Real-world SMT 15-414: Bug Catching: Automated Program Verification Lecture Notes on Real-world SMT Matt Fredrikson Ruben Martins Carnegie Mellon University Lecture 15 1 Introduction In the previous lecture we studied

More information

Subsumption. Principle of safe substitution

Subsumption. Principle of safe substitution Recap on Subtyping Subsumption Some types are better than others, in the sense that a value of one can always safely be used where a value of the other is expected. Which can be formalized as by introducing:

More information

General properties of staircase and convex dual feasible functions

General properties of staircase and convex dual feasible functions General properties of staircase and convex dual feasible functions JÜRGEN RIETZ, CLÁUDIO ALVES, J. M. VALÉRIO de CARVALHO Centro de Investigação Algoritmi da Universidade do Minho, Escola de Engenharia

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2017 Lecture 4a Andrew Tolmach Portland State University 1994-2017 Semantics and Erroneous Programs Important part of language specification is distinguishing valid from

More information

Why Global Dataflow Analysis?

Why Global Dataflow Analysis? Why Global Dataflow Analysis? Answer key questions at compile-time about the flow of values and other program properties over control-flow paths Compiler fundamentals What defs. of x reach a given use

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 4: CSPs 9/9/2008 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore 1 1 Announcements Grading questions:

More information

Announcements. CS 188: Artificial Intelligence Fall Large Scale: Problems with A* What is Search For? Example: N-Queens

Announcements. CS 188: Artificial Intelligence Fall Large Scale: Problems with A* What is Search For? Example: N-Queens CS 188: Artificial Intelligence Fall 2008 Announcements Grading questions: don t panic, talk to us Newsgroup: check it out Lecture 4: CSPs 9/9/2008 Dan Klein UC Berkeley Many slides over the course adapted

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

Lecture Notes: Widening Operators and Collecting Semantics

Lecture Notes: Widening Operators and Collecting Semantics Lecture Notes: Widening Operators and Collecting Semantics 15-819O: Program Analysis (Spring 2016) Claire Le Goues clegoues@cs.cmu.edu 1 A Collecting Semantics for Reaching Definitions The approach to

More information

More Dataflow Analysis

More Dataflow Analysis More Dataflow Analysis Steps to building analysis Step 1: Choose lattice Step 2: Choose direction of dataflow (forward or backward) Step 3: Create transfer function Step 4: Choose confluence operator (i.e.,

More information

Lecture 5: Properties of convex sets

Lecture 5: Properties of convex sets Lecture 5: Properties of convex sets Rajat Mittal IIT Kanpur This week we will see properties of convex sets. These properties make convex sets special and are the reason why convex optimization problems

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

Dependent Object Types - A foundation for Scala s type system

Dependent Object Types - A foundation for Scala s type system Dependent Object Types - A foundation for Scala s type system Draft of September 9, 2012 Do Not Distrubute Martin Odersky, Geoffrey Alan Washburn EPFL Abstract. 1 Introduction This paper presents a proposal

More information

4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis?

4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis? Overview Program Static Analysis Program static analysis Abstract interpretation Static analysis techniques 2 What is static analysis? The analysis to understand computer software without executing programs

More information

Interprocedural Analysis. CS252r Fall 2015

Interprocedural Analysis. CS252r Fall 2015 Interprocedural Analysis CS252r Fall 2015 Procedures So far looked at intraprocedural analysis: analyzing a single procedure Interprocedural analysis uses calling relationships among procedures Enables

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

9.5 Equivalence Relations

9.5 Equivalence Relations 9.5 Equivalence Relations You know from your early study of fractions that each fraction has many equivalent forms. For example, 2, 2 4, 3 6, 2, 3 6, 5 30,... are all different ways to represent the same

More information

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont)

Reuse Optimization. LLVM Compiler Infrastructure. Local Value Numbering. Local Value Numbering (cont) LLVM Compiler Infrastructure Source: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation by Lattner and Adve Reuse Optimization Eliminate redundant operations in the dynamic execution

More information

Chapter 3. Set Theory. 3.1 What is a Set?

Chapter 3. Set Theory. 3.1 What is a Set? Chapter 3 Set Theory 3.1 What is a Set? A set is a well-defined collection of objects called elements or members of the set. Here, well-defined means accurately and unambiguously stated or described. Any

More information