Change- and Precision-sensitive Widening for BDD-based Integer Sets

Size: px

Start display at page:

Download "Change- and Precision-sensitive Widening for BDD-based Integer Sets"

Domenic Owen
6 years ago
Views:

Sibylle Schupp Sven Mattsen Hamburg University of echnology (UHH)

1 Bachelor hesis elix Lublow Change- and Precision-sensitive Widening for BDD-based Integer Sets October 06, 2016 supervised by: Prof. Dr. Sibylle Schupp Sven Mattsen Hamburg University of echnology (UHH) echnische Universität Hamburg-Harburg Institute for Software Systems Hamburg

3 Eidesstattliche Erklärung Ich versichere an Eides statt, dass ich die vorliegende Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Die Arbeit wurde in dieser oder ähnlicher orm noch keiner Prüfungskommission vorgelegt. Hamburg, den 06. Oktober 2016 elix Lublow iii

5 Contents Contents 1. Introduction 1 2. About Program Analysis and Integer Sets as BDDs Data-flow Analysis Abstract Interpretation Using BDDs to Represent Integer Sets A Change- and Precision-sensitive Widening Operator Concept Naive Widening Precision-sensitive Widening Change-sensitive Widening Operator Behavior and Characteristics Design Considerations Application Context Operator Properties Evaluation of an Implementation in Jakstab Implementation and Evaluation Method Results Related Work Conclusion 31 A. Appendix 35 A.1. Algorithm for Change- and Precision-sensitive Widening A.2. Development of State Spaces During Analyses A.3. Implementation Source Code A.4. Evaluation Results v

7 1. Introduction Automated analysis of computer programs has become an integral part of modern software development processes. During static analysis, exhaustive testing, i.e., consideration of all possible branches of execution in a precise manner, is usually infeasible. An abstract representation can be utilized to model program functionality. Approximation of program characteristics is possible by estimating development of the abstract model, increasing analysis scalability and convergence speed at the cost of precision. his process of approximation finds implementation in form of a widening operator. Depending on the type of analysis used, the choice of widening method greatly influences the analysis result. Successful termination requires usage of a sensible estimation heuristic. Construction of a powerful and efficient widening operator facilitates functional program analysis and remains an area of continuous research. In this work, we consider widening in the context of static analysis of binaries. Static program analysis can be used to find a model representing the procedure implemented by the subject program, such as a control flow graph (CG). In order to do so, the program is represented in an abstract fashion. Information about each program point is represented by an abstract state, which provides information about potential configuration of program variables. Program instructions can be understood as functions, operating on abstract states. During analysis, instructions are considered one at a time. New information, resulting from function application, is propagated to all affected states before the next instruction is considered. his procedure is repeated until no change is observed, i.e., analysis reaches a fixed point. Consider igure 1.1, in which a code segment is listed and a table depicts possible values of variables in different states. A CG, as would be the result of a successful and complete analysis, represents program function. If no prior information is available, for a naive analysis to determine abstract representation as depicted, loop states S1 and S2 require repeated consideration. In every iteration, one element is added to the set of values loop variable i might assume; states S1 and S2 require adjustment 100 and 99 times respectively. Assume the exit condition of the loop instead read S1: i < z, i.e., depended on some unknown value (which might be supplied as function input). A naive analysis might encounter difficulties dealing with occurrences like these ignoring the fact, that the number of integers representable by digital computers is limited, the set of possible values of i might grow to arbitrary size, yet the analysis might never conclude that the exit condition is met. he usefulness of an analysis which does not terminate is severely limited. o ensure that occurrences like these can be dealt with, it is possible to allow the estimation of values in the course of CG construction strict over-approximation ensures that no branch of execution is excluded from consideration. he aforementioned case might be trivially solved by widening (i.e., increasing) the set of values of i in state S1 such that every possible integer might be assumed (i {,..., }). As a consequence, in S3, the same set will be representing values of x. 1

8 1. Introduction S0 : i = 1 ; x = 3 ; S1 : while ( i < 100) S2 : i ++; S3 : x = i ; S0: i = 1 x = 3 State Possible values i x S0 {1} {3} S1 {1,..., 100} {3} S2 {2,..., 100} {3} S3 {100} {100} S1: i < 100 true S2: i++ false S3: x = i igure 1.1.: Example program. In allowing approximation, an analysis will be able to terminate, but generally result in a less precise representation of the actual program functionality. Considering our example, a slightly more precise approximation might take into account that i is initially set to 1, and only ever incremented in the loop. An approximation might yield i {1,..., }. Additionally, during analysis, approximation might be performed repeatedly as a series of smaller steps, resulting in a stepwise increase of approximation level. Of course, more precise approximation requires additional computation, and is thus more taxing on analysis scalability. In general, solving these kinds of problems through approximation, a trade-off between precision and scalability is required. he function of approximation is realized through use of a widening operator. Static analysis employs the use of this operator where precise computation of values might not terminate, or take too long. Approximation through the operator can be done in a number of ways; finding a method appropriate for a particular context is a challenge and greatly influences the analysis process. In this thesis, we propose a widening operator for analyses that use binary decision diagrams to represent sets of integers, motivate and discuss our design choices, demonstrate our proposition for widening through implementation in an existing analysis framework, and 2

9 evaluate performance of the implementation by comparing it to the naive approach. Our heuristic for approximation takes into account two factors precision, which is supplied to the operator as a value defining the degree of acceptable loss of rigor, and change, which is the previously observed development of the widening argument. We aim to establish a sensible method, which estimates future analysis development by extrapolating from observed progress, implementing a practical trade-off between precision and scalability. In Chapter 2, we establish the context, in which our operator is applied. In Chapter 3, we develop our proposal and define a change- and precision-sensitive widening operator for BDD-based integer sets. In Chapter 4, we discuss behavior and characteristics of our operator. In Chapter 5, we present an implementation of our method in the analysis framework Jakstab, and evaluate performance. In Chapter 6, we present related work. We conclude in Chapter 7. 3

11 2. About Program Analysis and Integer Sets as BDDs In this chapter, we describe program analysis, establishing a context for widening operations. We briefly consider abstract interpretation. We explain the concept of ROBDDs. inally, we present a method, with which arbitrary sets of integers can be represented by BDDs. In the field of computer science, the automated analysis of programs is a practical possibility to gain understanding of program attributes and to prove system properties. ormal verification of properties has become crucial during construction and deployment of software used in many safety-critical environments, as found in aviation, space flight, and medicine. Less formal analysis finds widespread use, supporting many processes, e.g., where insight into program functionality is desirable (bug detection, reverse engineering), or efficient modification without change in functionality necessary (compilation, optimization, translation). Many integrated development environments feature built-in analysis tools to facilitate modern computer programming. he process of program analysis can be performed during runtime (dynamic analysis), without execution (static analysis) or in a combination of both. Static analysis can be performed on source code, on execution-ready binary files, or any intermediate representation. Analysis of source code, while generally requiring less effort, can be infeasible or even straight up impossible for a number of reasons source code might not be available for legacy systems; vendors or third parties might keep source code private to secure their intellectual property. In cases like these, to derive property information, to verify and to test, a more complicated analysis on the executable binary has to be performed Data-flow Analysis oday, many areas of both research and application require in-depth reasoning about the concrete function of computational processes, including not only informal argumentation regarding program characteristics, but also mathematical proof of attribute. Of interest are possible states the program can be in, and properties of these states, such as values of variables. o model a stateful representation of a program, a control flow graph (CG) can be used. Different branches of execution are represented as edges, traversed depending on conditionals, to nodes, representing program points, which can hold information on a range of program properties. In order to compute control flow, properties are expressed as equations, e.g., the variable x, which might assume values 1 or 2, has a corresponding equation of x {1, 2}. he set of possible values is called the variation domain. Iteratively, equations (i.e., variation domains of variables) of nodes are changed, depending on the code semantics of the respective program point and information received through incoming edges. In turn, information outgoing to other nodes 5

12 2. About Program Analysis and Integer Sets as BDDs is altered. hese nodes then need to be updated again. he process is repeated until no change occurs, i.e., analysis is complete. Consider igure 1.1. Given a section of source code, a CG is constructed, and equations for variables x and y are initialized as empty in each state (x, y ). he computation algorithm starts at the entry point S0 and computes state equations. Variable assignments x = 1, y = 3 yield x {1}, y {3}. Information outgoing changes for states following S0 in execution, i.e., S1. In S1, no assignment occurs and equations are used as received from S0: x {1}, y {3}. Since the condition in S1 can at this point only be evaluated to true, information is propagated only to S2. In S2, increment of x by 1 yields x {2}, y {3}. Using these equations, in S1 x can now assume one of two values: x {1, 2}, y {3}. Repetition of this cycle creates equations x {1,..., 100}, y {3} in S1, x {2,..., 100}, y {3} in S2. At this point, the condition in S1 can be evaluated to false (for x = 100). Propagation to S3 yields final state equations as depicted in the table in igure Abstract Interpretation While concrete and exact analysis is desirable, it is also often infeasible, since state sets and inter-state relations portraying even minor real-world applications can quickly become immense in size and complexity. Limited by modern computer processing power and memory space, computation of concrete program functionality might not always terminate, and is decidable only for simple systems. In general, some form of abstraction has to be made. Moving from concrete to abstract representation comes at a cost. An abstract analysis will not always be complete properties of the real system, true in a concrete model, might not hold true when represented in the abstract. Of course, this change in model also implies loss of information an analysis of an imperfect representation can never be as precise as a study of the actual thing itself. What is lost in completeness and precision can be made up in ease of computability; simplification and approximation allow inference of statements, which might not have been possible to make without. Abstract semantics can be constructed in any number of ways, where abstraction closer to concrete semantics is more precise, but harder to compute, and formulation with a greater level of abstraction loses precision, but allows for easier computation. In many cases, such as the formal verification of system properties, it is necessary for an analysis to be sound properties proven true in the less precise model should also hold true for the actual program. Correctly performed abstraction results in sound analysis. he way in which abstraction is performed has been a field of extensive study in the past. Shaping the modern process of program analysis, Patrick and Radhia Cousot introduced the theory of abstract interpretation [5, 6], allowing for sound approximation of program semantics. 6

13 2.3. Using BDDs to Represent Integer Sets 2.3. Using BDDs to Represent Integer Sets Working with Boolean functions or anything that can be understood as such (e.g., the bits of an integer variable) the choice of representation greatly influences the level of computational difficulty. he state space of non-trivial programs, subject to program analysis, often contains a vast number of abstract states in complex relation. States can require an arbitrary number of variation domains, each a large, non-convex set, containing an arbitrary combination of elements. Obviously, using a naive representation, where each element of these sets is stored separately, is not feasible. One possibility, increasing analysis scalability, is to store integer sets in binary decision diagrams (BDDs) [2, 3]. ROBDDs A BDD is a rooted, directed, and acyclic graph. Non-terminal nodes have two children nodes. In our representation, terminal nodes can be either rue or alse, and nodes do not hold any additional information. Each path from the root node to a terminal node can be understood as the representation of a particular set of values, used as the variables of a Boolean function igure 2.1 gives an example. Circles represent nonterminal nodes, squares terminal nodes, i.e., leafs. A solid/dashed edge points to the subtree holding information corresponding to the node variable being true/false (i.e., the corresponding bit set/unset). α β α β β 1 0 α β 1 1 igure 2.1.: BDD representing α β. Great levels of efficiency can be achieved by reducing the ordered BDD. In doing so, any isomorphic subtrees are merged, and any node whose children are isomorphic, i.e., any node in which no information is stored, is removed. urther reduction is possible by adding a complement bit to each edge, which asserts inversion of the corresponding subtree. o standardize, one can choose to allow inversion of subtrees only on either the true or the false branch, allowing for canonical, unique BDDs. In igure 2.2, the Boolean function (α β γ) is represented as a BDD, and its reduction considered. A black dot is used if a subtree is inverted. or efficiency, BDDs are usually ordered, i.e., the ordering of different variables is the same in all paths emanating from the root node. Depending on the represented Boolean 7

14 2. About Program Analysis and Integer Sets as BDDs α α α β β β β γ γ γ γ 1 γ γ igure 2.2.: Reduction of (α β) γ. function, the variable ordering of a BDD greatly influences graph complexity, varying from linear in the best, to exponential in the worst case. In igure 2.3, two different variable orderings of the Boolean function (α β) (γ δ) are considered. While in practice, when using the term BDD, we refer to reduced and ordered diagrams, for the sake of clarity, we use their equivalent non-reduced graphs in our examples. α β γ δ. α γ β δ. α α β γ γ γ β β δ δ 1 1 igure 2.3.: Different variable orderings of (α β) (γ δ). Integer Sets as BDDs Observe igure 2.4 as an example, in which a BDD of depth four is used to represent the set of all possible unsigned 4-bit integer values. In this non-reduced graph, one terminal 8

15 2.3. Using BDDs to Represent Integer Sets???????????????? igure 2.4.: Complete BDD for sets of 4-bit integers. exists for each possible value. A terminal is rue if the corresponding value, represented by the path from the root to it, is element of the set, alse if it is not. Note that nonterminal nodes at depth n represent the (n + 1)th bit of variables, e.g., the root node of the tree represents the first bit: for those variables of which the first bit is set, the left subtree specifies the information corresponding to the remaining bits. All variables, of which the first bit is unset, can be found in the right subtree. Equivalently, it can be said that the subtree of a node at depth n in our example represents an interval of size 2 4 n, or more generally 2 m n, where m is the maximum possible depth of the tree. As such, using two s complement to convert the Boolean sequence to an integer, the BDD can be understood as an indicator function of the integer set represented. igure 2.5 shows how the arbitrary set {8, 9, 10, 11, 14} is represented as a BDD. Since this graph is reduced, in each subtree, if all terminal nodes hold the same value, the subtree is removed and replaced by the corresponding terminal. None of the values in the interval {0,..., 7} is element of the set, which is why the right child of the root node is set to alse. his node represents all integers with bit string 0---, an interval of = 8 elements, none of which is contained in the set. Analogously, values in {8,..., 11} are contained in the set, which is why the terminal at depth two is set to rue, representing integers with bit string 10--, and an interval of size = 4. Note that of values 14 (1110) and 15 (1111), only one is element of the set, which is why no reduction is possible in the leftmost, lowest non-terminal (111-). Using BDDs, representation of non-convex, arbitrary sets of integers is possible. he data structure, efficient in the use of memory, and allowing for effective set manipulation by modifying the BDD, enables analyses to be more powerful. Such analyses are the framework for our widening heuristic. Implementing the aforementioned concepts, several design choices can increase data structure power. irstly, by demanding uniqueness of BDDs in memory, constant time equality checks are possible for arbitrary sets. o realize this concept, any node s children 9

16 2. About Program Analysis and Integer Sets as BDDs igure 2.5.: Reduced BDD of {8, 9, 10, 11, 14}. are actualized as pointers to corresponding subtrees, uniqueness is ensured via use of a hash-function. Since every unique BDD can be memorized only once, being referenced by pointers, for scenarios during which the number of BDD becomes large enough that multiple nodes point to the same memory cell as a child (i.e., the same subtree appears in multiple BDDs), memorization efficiency greatly increases. Secondly, by storing the number of elements represented for each subtree (and recomputing this number every time the tree changes), for any tree, a constant time SA-count of the represented Boolean function (i.e., a count of the number of elements in the represented integer set), is possible. Both of these BDD properties (constant time equality check, constant time SA-count) are necessary to enable our proposed widening heuristic, defined in Chapter 3. 10

17 3. A Change- and Precision-sensitive Widening Operator In this chapter, we consider the concept of widening. We use an example to motivate different methods of approximation, and introduce three heuristics, which vary in complexity. Naive widening represents the simplest, most straightforward approach. Precisionsensitive widening allows for approximation with variable precision. Change-sensitive widening considers memorized observed changes and estimates future development by extrapolation. Each method is used during a separate analysis of the example Concept In the context of infinite domains of abstract interpretation, analysis has to be able to deal with loops in the program. o determine semantics where loops occur, a naive approach would be the repeated consideration the loop and iterative adjustment of the state set. However, depending on the given instructions, iterative computation of fixed points might diverge, i.e., is not guaranteed to terminate. As a measure to ensure termination, a widening operator can be used to approximate a fixed point of the property in question [6]. Widening is done by approximation through extrapolation. Skipping steps of the naive approach, precision might be lost in the process. ermination can be guaranteed where the operator is applied to an operand, of which the ascending chain stabilizes eventually. In a naive example, where possible values of an integer variable are represented by an interval [a, b] and b is incremented in each loop iteration, widening might approximate b =. ormally: Let L be a complete partially ordered set representing some semantics. Define the widening operator L L L, such that: ( x, y L : x x y) ( x, y L : y x y), and for all increasing chains x 0 x 1..., the increasing chain defined by y 0 = x 0,..., y i+1 = y i x i+1,... stabilizes eventually. Analysis of binaries aims to construct an accurate representation of the subject program in the form of control flow specification (e.g., as a control flow graph). In the course of analysis, the variation domain of each variable at every program point needs to be ascertained. Consider igure 3.1, containing a segment of code and a control flow graph, representing program function. able 3.1 holds the corresponding exact variation domain information. o compute CG and abstract state space, a rigorous approach, such as the classical worklist algorithm [8], would need to traverse the while-loop starting in S1 exactly 99 times, in each iteration increasing the variation domain of i by one. 11

18 3. A Change- and Precision-sensitive Widening Operator S0: i = 1 x = 1 S0 : i = 1 ; x = 1 ; S1 : while ( i < 100){ S2 : x = i ; S3 : i f ( i %64 == 0) S4 : x += 10000; S5 : f u n c t ( x ) ; i ++; } S1: i < 100 true S2: x = i false S5: funct(x) i++ false S3: i%64 == 0 true S4: x += igure 3.1.: Example program. State i Variation domain x S0 {1} {1} S1 {1,..., 100} {1,..., 63} {65,..., 99} {10064} S2 {1,..., 99} {1,..., 99} S3 {1,..., 99} {1,..., 99} S4 {1,..., 99} {10064} S5 {2,..., 100} {1,..., 63} {65,..., 99} {10064} able 3.1.: Abstract state space after exact computation (99 iterations to fixed point). 12

19 3.2. Naive Widening It is easy to imagine a scenario, where, depending on the behavior and complexity of funct(x), or an arbitrarily larger bound to the loop conditional in S1, exact computation of this kind is not feasible for real analyses. A possibility to reduce the number of times the loop has to be traversed is the approximation of variation domains. Wherever loops exist, i.e., the abstract states of the loop need to be revisited and updated, estimation, possibly extrapolation from observed behavior, can speed up the analysis process. Where an approximation is made, the analysis continues less precise than it would have, had computation been exact, however convergence might be reached faster. his trade-off between loss of precision and potential increase in scalability/convergence speed characterizes different methods of approximation, i.e., different variants of widening. A method where potential loss of precision is relatively large is called stronger widening, a method closer to an exact computation called weaker widening. We will now examine three different possibilities to widen abstract states, which differ in strength. Naive widening represents the simplest, most straightforward approach, which immediately adds all representable values. Precision-sensitive widening takes an integer precision argument, depending on which strength is adjusted, adding intervals of values where elements were added during the last step of precise computation. Change-sensitive widening memorizes observed changes and estimates future development by extrapolation, considering distances to previous changes to apply precision-sensitive widening with variable strength to different intervals of the variation domain. If not stated otherwise, we will assume widening is applied during analysis only succeeding two iterations of exact computation, during which states may stabilize, following previous widening applications Naive Widening: Setting Variation Domain to in a Single Step he strongest possible form of widening n, when called, immediately approximates the variation domain as the set of all possible values (, the actual elements of which depend on the variable type, e.g., {0,..., } for unsigned 32-bit integers): a, b : n (a, b) = (3.1) Arguments a, b can be understood as sets, representing the same variation domain in two successive iterations of exact computation, after which widening is applied. A single widening operation of this kind results in maximum loss of precision no information on possible values is maintained, i.e., output does not depend on input. However, widening in this way ensures that, for this abstract state, the variation domain does not change again, thus not ever requiring re-computation. In our example (igure 3.1), after calling naive widening, the loop has to be traversed only a single time. Of course, as a result, in state S5 the widened variation domain of x contains ( ) values, which may never actually be used in function calls of funct(). his may result in inaccuracies, e.g., 13

20 3. A Change- and Precision-sensitive Widening Operator State i Variation domain x S0 {1} {1} S1 S2 S3 S4 {10000, 10064, 10128,..., } S5 able 3.2.: Abstract state space computed using naive widening (4 exact iterations to fixed point). if in funct() x is used as the target address of an indirect jump the analysis would create a large number of states, which the program can never actually reach, and might collapse when subsequently trying to further analyze these generated, false branches of execution. able 3.2 shows the abstract state space after analysis, during which widening was applied after the loop was iterated over twice. States S1, S2, S3, and S5 change in every iteration, and are set to top. State S0 is not element of the loop, hence does not change, not requiring to be widened. Note that after two iterations the analysis will not yet have generated a state following the evaluation of the statement in S3 to true (i.e., S4), which is why S4 is not widened. After widening, the loop is traversed again, now creating State S4, the variation domain for x of which is generated after it was set to in S3. Analysis then continues, evaluating the false-branch in S1. If, for our example, information propagated to the false-branch of S1 (i, x {100,..., }) is not considered further, and neither are possible arguments of funct() in S5, analysis using naive widening might yield satisfactory results Precision-sensitive Widening: Increasing the Variation Domain Depending on a Precision Value While the previous method of naive widening can be useful in some cases, in general, the maintenance of a higher degree of precision yields a more productive analysis. A weaker form of widening, allowing for a more precise approximation, increases the variation domain by a smaller amount. A practical possibility to dynamically adjust the strength of widening during analysis is the extension of widening to a ternary operator. he additional third argument specifies how much precision is given up when applying the operator, i.e., its strength. Widening heuristic and precision argument can be of an arbitrary complexity. Repeated widening can take any finite amount of time to converge, and as precision argument any type of data structure is feasible. Depending on the desired behavior during 14

21 3.3. Precision-sensitive Widening analysis, a suitable combination of approximation method and precision argument form has to be found. During analysis, precision can be adjusted as is called for by state space development. Iterative change of abstract states, as in loops, usually follows a deterministic pattern, allowing for the prediction of future development. A heuristic recognizing these patterns more effectively is able to perform extrapolation in a way closer to exact computation. Considering common usage of loops, due to their iterative nature, changes to the variation domain are often relatively close to one another in consecutive steps of analysis. Hence, it appears sensible to widen accordingly, i.e., add elements close to where change was observed in the most recent analysis step. o realize a change in widening strength, the number of elements added can be adjusted. Besides the imperative to approximate closely to exact values, convergence speed has to be taken into consideration when choosing a widening heuristic. It appears intuitive to continuously increase the strength of consecutive applications of widening to the same set, until a fixed point is reached, avoiding large inaccuracies due to quick, superfluous over-approximation, while at the same time not taking too long to over-approximate a potentially large set. he usage of BDDs to represent integer sets is relevant to both of these ideas finding a sensible interval around areas where change is expected, and increasing the size of the interval in an escalating fashion. or every integer, each node in the path to the root node can be understood as a representative of an interval, which contains the integer. he size of each of these intervals is twice the size of the interval represented by its children, and half the size of its parents. In general, the depth n at which a node is located in the tree specifies the width of the interval represented as 2 m n, where m is the maximum depth of the BDD (e.g., 32 for 32-bit Integers). Considering the above rationale, and using these properties, we propose a precisionsensitive heuristic p as follows. or every element e added in the last state update (i.e., e s \ s where s is the state updated from s), find an interval around the element, the size of which depends on the precision value, and add it to the set. Interval size is given as 2 m p, where m is the maximum depth of the BDD, p the precision. he positioning of the interval depends on the data-structure used, or more generally on the way integers are represented as bit-vectors. or an interval of width w to be added, the largest multiple of w smaller than the element e is used as a lower bound of the interval, i.e., l e = e (e mod 2 m p ). he heuristic can be expressed in a single equation as: p (s, s, p) = s { e s \ s, } {l e,..., r e } l e = e (e mod 2 m p ), (3.2) r e = e (e mod 2 m p ) + 2 m p 1 able 3.3 shows sample applications of the heuristic with different levels of precision. As an example, argument sets are {0}, {0, 1}, yielding {1} = {0, 1} \ {0} as elements updated. Unsigned 32-bit integer arithmetic is assumed. Consider our previous example (igure 3.1), and how an analysis using precisionsensitive widening would perform. Assume widening is first called with maximum precision (m 1) after two iterations of exact computation, and then repeatedly applied 15

22 3. A Change- and Precision-sensitive Widening Operator Precision Result of widening (= p) ( ({0}, {0, 1}, p) = {0,..., 2 32 p 1}) 0 1 {0,..., } 2 {0,..., } 3 {0,..., }.. 28 {0,..., 2 4 1} 29 {0,..., 2 3 1} 30 {0,..., 3} 31 {0, 1} able 3.3.: Precision-sensitive widening with different levels of precision. with two exact iterations in between, each time decrementing precision by one. In loop states S1 S5, variables i and x are over-approximated with an interval of increasing size. After six widening applications (and 14 iterations of exact computation), the interval has reached a size of 2 6 = 64 and the analysis stabilizes. he complete development of the abstract state space can be found in Appendix A.2, able A.1. able 3.4 shows the resulting state space. Note the interval of size 64 added around value in the variation domain of x in several states. Since in every successive application of widening precision is reduced, the isolated case of x assuming value once in S4, and the consecutive widening, results in a large inaccuracy, since earlier applications of widening already reduced precision. While liberal over-approximation in intervals of more frequently observed change ({1,..., 99}) is an important function of widening, the inaccurate addition of large intervals around isolated or small groups of elements appears to be an undesirable side-effect. State i Variation domain x S0 {1} {1} S1 {1,..., 128} {1,..., 127} {10048,..., 10112} S2 {1,..., 127} {1,..., 127} S3 {1,..., 127} {1,..., 127} S4 {0,..., 127} {10048,..., 10112} S5 {2,..., 128} {1,..., 127} {10048,..., 10112} able 3.4.: Abstract state space computed using precision-widening (14 exact iterations to fixed point). 16

23 3.4. Change-sensitive Widening 3.4. Change-sensitive Widening: Increasing the Variation Domain Depending on a History of Change he aforementioned precision-sensitive widening heuristic, while certainly useful, and even sufficient for satisfying analysis in some cases, results in a somewhat straightforward over-approximation. A single widening application does not take into consideration previously observed development of the variation domain. Particularly, the number of times elements are added in close proximity to each other does not factor into the choice of interval size used to over-approximate each group of elements. aking igure 3.1 as an example, a heuristic should be able to take into account variable x in S5 being able to assume significantly more values in one interval (98 values, exact: {1,..., 63, 65,..., 99}) than in any other interval (one value, exact: {10064}). It should be avoided to over-approximate everywhere with the same precision, as this would needlessly add elements in the neighborhood of isolated elements, or relatively small numbers thereof. hus, widening should approximate in different areas with different levels of precision, depending on how often change was observed within the area, i.e., adjust the width of the interval used for approximation accordingly. Individual adjustment of abstract states depending on previously observed development requires the memorization of some representation of the history of change for each. Using an approach similar to our previously introduced precision-sensitive widening, we replace the integer precision argument by a more complex data structure, in which information about observed development of abstract states can be preserved. he precision, which can be understood as the documentation of previously observed changes, is saved for each variable s variation domain of each abstract state. Individual precision is adjusted every time widening is called, documenting, where in the most recent step elements were added, before being used to determine the size of the approximation interval for each element added. We propose a change- and precision-sensitive widening ( (s, s, p)) consisting of two steps: Update precision depending on change in the most recent step (s \ s) Widen, using newly computed precision Memorizing histories of change in an exact manner appears infeasible for reasons similar to those discussed previously, and even unnecessary, since widening is a process of approximation, less precise than exact computation. he usage of a data structure, in which the efficient manipulation of information on intervals relevant to our method of widening is possible, appears sensible. Hence, we propose that the same data structure used to represent integer sets (BDD) be also used to hold precision information. We extend our previously introduced precision-sensitive widening to be applied to each added element with a precision depending on how often and how close to the element change has been previously observed. In order to do so, we initialize our precision BDD as a single rue terminal, and extend it every time widening is called. 17

24 3. A Change- and Precision-sensitive Widening Operator Widen Process In intervals, in which change is observed more often, i.e., for which the precision BDD has grown larger, widening is applied with a lower precision. In particular, the depth d of every terminal in the graph is considered. In the interval of the variation domain corresponding to rue terminals (being placed at depth d in the BDD), precision-sensitive widening p is called with a precision of (maximum depth 2 d 1) (subtracting d once to correct for the depth at which the terminal is placed, and a second time to reduce widening precision). alse terminals call widening with precision equal to (maximum depth 2 d) (i.e., as if they were a rue terminal placed at a depth d 1). As a result, terminals placed at a greater depth in a precision subtree yield a stronger widening in the corresponding interval. Update Process or precision BDD adjustment, every element of change is considered, and, if the precision BDD has not reached more than half its maximum depth in the corresponding subtree, find in the precision BDD the terminal which represents an interval containing the element. If this terminal is alse, it is replaced by rue. If it is rue, it is replaced by a non-terminal node, the two children of which are rue (representing the subtree, in which more elements were added), and alse (less elements added). hus, the precision BDD increases in size where change is observed. Where the precision BDD has grown to a size larger than half its maximum, the shape of subtrees below do not influence widening further. However, their depth is still relevant larger subtrees yield stronger widening and vice versa. Subsequently, if during adjustment a terminal is placed at a depth greater or equal half the tree s maximum, and change is observed within the represented interval, the lowest terminal is replaced by a new node with terminal children (effectively increasing the depth by one). or every element representable in the variation domain, at any point in time only a single terminal node represents an interval containing it. he depth d, at which this terminal is placed in the precision BDD specifies that change has been observed at least d times, and, depending on the integer arithmetic used, the maximum distance of each observed change to the interval of the terminal. Since a terminal at depth d represents an interval of size 2 maximum depth d, replacing a terminal in the precision BDD and increasing the depth by one results in the maximum distance to the most recent change being half of what it was when the now replaced terminal was created. or any element, a corresponding terminal at depth d in the precision BDD specifies that change was sequentially observed with distance properties as follows: distance to change < 2 maximum depth 1 (first update places rue at depth 1) distance to change < 2 maximum depth 2 (update places rue at depth 2) distance to change < 2 maximum depth 3 (update places rue at depth 3). distance to change < 2 maximum depth (d 1) (update places rue at depth d 1) 18

25 3.4. Change-sensitive Widening distance to change < 2 maximum depth d (update places rue at depth d, current) Iterative construction of the precision BDD in the above manner requires that for any interval to decrease in precision, change has to be observed repeatedly in an increasing proximity to it. At the same time, the more often widening is called (and change observed), a more precise differentiation between the precision in different intervals can be made, since the precision BDD grows more complex. he size of the interval added through widening is directly proportional to the depth at which the representative terminal is placed as a result of observing change in a sequence as stated above. Consider able 3.5, in which, as an example, change-sensitive widening is applied to a variation domain until is reached. Assume 4-bit unsigned integer arithmetic. Assume also that between each widening application a single exact iteration is computed, in which the smallest element not yet part of the variation domain is added. In each step (widening application), input arguments and output are stated. We will now examine each step separately, and observe how our heuristic operates. Step 1: he two most recent iterations of exact analysis yield the set {0} as elements added. Precision, previously initialized as rue, is increased in depth. A rue terminal is placed as the right (unset) child of the root, since the larger amount of change was observed in the corresponding interval (1 element added in {0,..., 7} vs. 0 elements added in {8,..., 15}, i.e., the left interval/set branch). As a result, any interval added because of changes in the right half of the variation domain, represented by the newly created rue terminal ({0,..., 7}), in subsequent applications of widening will double in size, i.e., precision is decreased by one, and widening is stronger. he interval added through immediate widening has a size of 2 1. Change observed: {0} After update, corresponding precision terminal placed at depth: 1 Resulting widening precision: 4 1 = 3 Resulting widening interval size: 2 1 = 2 Interval added through widening: {0, 1} Step 2: Exact analysis finds {2} as element added. he rue terminal in the precision BDD is replaced by a non-terminal node, the unset branch of which is set to rue (representing {0,..., 3}, containing the added element), the set branch to alse. Precision for subsequent widening application is again decreased for elements represented by the rue terminal now placed at a greater depth. Note that although widening only increases the size of the variation domain by one (because of elements already in the set), the interval added has size 2 2 = 4. Change observed: {2} After update, corresponding precision terminal placed at depth: 2 19

26 3. A Change- and Precision-sensitive Widening Operator Step Input Output irst argument Second argument Old precision New precision Result 1 (= {}) (= {0}) (= {0, 1}) 2 (= {0, 1}) (= {0,..., 2}) (= {0,..., 3}) 3 (= {0,..., 3}) (= {0,..., 4}) (= {0,..., 7}) 4 (= {0,..., 7}) (= {0,..., 8}) (= {0,..., 9}) 5 (= {0,..., 9}) (= {0,..., 10}) (= {0,..., 11}) 6 (= {0,..., 11}) (= {0,..., 12}) (= {0,..., 15}) able 3.5.: Development of variation domain through sample applications of changesensitive widening. 20

27 3.4. Change-sensitive Widening Resulting widening precision: 4 2 = 2 Resulting widening interval size: 2 2 = 4 Interval added through widening: {0,..., 3} Step 3: Exact analysis finds {4} as elements added. he corresponding precision terminal is alse (representing {4,..., 7}) and would usually be replaced by rue. However, because of automatic reduction (nodes, whose children are equivalent terminals are replaced by the terminal), instead, the precision is decreased one additional time, i.e., the precision tree grows further. alse is replaced not by rue, but by a non-terminal, the unset-branch child of which is set to rue (representing {4, 5}, containing 4), the set-branch child to alse. Since the lowest terminal is now placed at a depth greater than half the maximum, widening will add intervals of a size larger than what is represented by the lowest terminal. he precision BDD has reached a depth of 3 in the root node s unset branch (i.e., change has been observed thrice within); as a result precision is reduced to 1 in the right half of the variation domain. No change has been observed in the left half, represented by the alse terminal as the set child of the root node. Change observed: {4} After update, corresponding precision terminal placed at depth: 3 Resulting widening precision: 4 3 = 1 Resulting widening interval size: 2 3 = 8 Interval added through widening: {0,..., 7} Step 4: Note that our example proceeds to develop exactly the same in the set branch of the root (the left half of our variation domain), as has been observed in the unset branch. Exact analysis finds {8} as elements to be added. Since this is the first time change is observed in the set branch, the corresponding alse terminal (representing {8,..., 15}) is set to rue (compare step 1). Change observed: {8} After update, corresponding precision terminal placed at depth: 1 Resulting widening precision: 4 1 = 3 Resulting widening interval size: 2 1 = 2 Interval added through widening: {8, 9} Step 5: Exact analysis finds {10} as elements to be added. he rue terminal ({8,..., 15}) is replaced by a non-terminal with children rue ({8,..., 11}) and alse ({12,..., 15}) (compare step 2). Change observed: {10} 21

28 3. A Change- and Precision-sensitive Widening Operator State i Variation domain x S0 {1} {1} S1 {1,..., 128} {1,..., 127} {10062,..., 10065} S2 {1,..., 127} {1,..., 127} S3 {1,..., 127} {1,..., 127} S4 {64, 65} {10064, 10065} S5 {2,..., 128} {1,..., 127} {10062,..., 10065} able 3.6.: Abstract state space computed using change-widening (14 exact iterations to fixed point). After update, corresponding precision terminal placed at depth: 2 Resulting widening precision: 4 2 = 2 Resulting widening interval size: 2 2 = 4 Interval added through widening: {8,..., 11} Step 6: Exact analysis finds {12} as elements to be added. Again, preventing reduction, alse ({12,..., 15}) is replaced by a non-terminal (compare step 3). Subsequent widening yields as approximated variation domain. Change observed: {12} After update, corresponding precision terminal placed at depth: 3 Resulting widening precision: 4 3 = 1 Resulting widening interval size: 2 3 = 8 Interval added through widening: {8,..., 15} Note that, while in every step of our example only a single value is observed as change, regularly, multiple elements are added. During an update, every terminal representing an interval in which elements are added will be replaced, i.e., for its interval precision adjusted. Consider application of our more sophisticated heuristic during analysis of our previous example (igure 3.1). Assume again two iterations of exact computation between widening applications. Assume 16-bit unsigned integer arithmetic. In Appendix A.2, able A.2 shows the complete development of abstract states during analysis. able 3.6 shows the resulting state space. Analysis results are similar to those obtained by using precision-sensitive widening (able 3.4), using the same number of iterations to reach a fixed point. However, the interval added around the isolated element has only size four, created according to distances to previously observed changes. 22

29 4. Operator Behavior and Characteristics In this chapter, we give a brief rationale for our proposed method of estimation. We consider different application contexts and their influences on behavior of our widening operator, as introduced in the last chapter. We note and discuss traits and properties of the operator Design Considerations he purpose of widening in the context of static program analysis is to decrease the time analyses require to terminate, by replacing steps of exact analysis with the application of an estimation operator, over-approximating value sets using a sensible heuristic. hrough widening, analysis precision is traded in for analysis convergence speed. Several properties are desirable to enable the usage of a sophisticated approximation heuristic, such as the possibility to adjust the degree to which precision is given up in a single widening application, i.e., its strength (precision sensitivity), and the possibility to take into account previously observed development of the argument variation domain (change sensitivity). Given these properties, it is possible to construct a widening heuristic which modulates its extrapolation depending on the context. An important consideration for our proposed method are the relative distances between areas in which change is observed. We apply a stronger widening to areas of the variation domain in which change is detected more often, while leaving areas of fewer observed changes more time to develop in an exact manner, by applying a weaker widening. Starting with a high degree of precision, each successive application of widening to the same variation domain decreases in overall precision, until, after a maximum number of applications, the set of all representable values is reached. o enable supply of a precision argument, the widening operator is extended to ternary form. Memorization of observed changes is achieved through construction of BDDs, in which for every distinct variation domain individual observation of development is represented Application Context Utility of Change Sensitivity in Different Contexts Operator behavior and performance heavily depend on the context in which it is applied. While in some contexts properties of the proposed heuristic allow for a quick overapproximation close to actual values, in other application environments usage of a more 23

Multiplication of BDD-Based Integer Sets for Abstract Interpretation of Executables

Bachelor hesis Johannes Müller Multiplication of BDD-Based Integer Sets for Abstract Interpretation of Executables March 19, 2017 supervised by: Prof. Dr. Sibylle Schupp Sven Mattsen Hamburg University