Multiplication of BDD-Based Integer Sets for Abstract Interpretation of Executables

Size: px
Start display at page:

Download "Multiplication of BDD-Based Integer Sets for Abstract Interpretation of Executables"

Transcription

1 Bachelor hesis Johannes Müller Multiplication of BDD-Based Integer Sets for Abstract Interpretation of Executables March 19, 2017 supervised by: Prof. Dr. Sibylle Schupp Sven Mattsen Hamburg University of echnology (UHH) echnische Universität Hamburg-Harburg Institute for Software Systems Hamburg

2

3 Eidesstattliche Erklärung Hiermit erkläre ich, Johannes Müller, an Eides statt, dass ich die vorliegende Bachelorarbeit selbstständig angefertigt und dabei keine weiteren als die angegebenen Hilfsmittelverwendet habe. Die Arbeit wurde in dieser oder ähnlicher orm noch keiner Prüfungskommission vorgelegt. Harsefeld, 19. März 2017 Johannes Müller iii

4

5 Contents Contents 1. Introduction 1 2. Background Data-flow Analysis Abstract Interpretation BDD-Based Integer Sets Multiplication of BDD-Based Integer Sets Multiplication of General Sets Singleton Multiplication Constructing BDD-Based Strided Intervals Correctness of Singleton Multiplication Conversion of BDD-Based Sets to Strided Intervals Requirements for a Galois Insertion Algorithm for inding Best-itting Strided Intervals Implementation Evaluation Related Work Conclusion 47 A. Appendix 49 A.1. Implementation in Scala A.2. Evaluation Results v

6

7 1. Introduction he ubiquitous presence of computer programs in nearly all areas of modern life necessitates methods facilitating the extraction of properties from programs, such as correctness in respect to their specification, performance metrics or presence of security vulnerabilities. Especially in safety-critical areas, like avionics or the construction of medical devices, these safety properties are of utmost importance. During the history of software development and engineering, several methods that detect such properties with different trade-offs have been conceived. or example, the dynamic testing of programs, while relatively easy to employ, can only prove the presence of bugs, not their absence, whereas formal proofs of certain properties, which theoretically are the most precise and thorough method, are comparatively hard to formulate for non-trivial programs. Another approach, the static analysis of programs, which does not need to examine the program at runtime, promises to derive properties in an automatic fashion, making it attractive as an additional tool for software developers to check the quality of their product. hese analyses can process different forms of a program: one option is to examine the source code, i.e., a higher-level representation, of the program, and another one the inspection of machine code, i.e., the source code compiled to an executable. his thesis will focus on the latter representation of programs. he need to analyze machine code instead of higher-level source code arises for several reasons. or one, the source code of a program of interest might not always be freely available, for example if the program is legacy software, where the source code has simply been lost over the years. Another example would be a vulnerability analysis of third-party software, where the original source is naturally not available. In addition, the executable might not always be a faithful translation of the original source code, for example due to compiler optimizations, as described by Balakrishnan [1]. Even for open-source software, which can be downloaded in executable form, a mere analysis of the source code for vulnerabilities or malicious code might not be sufficient, since there exists the possibility that a downloaded executable has code added on top of the original functionality. In summary, it can be concluded that the executable is the single source of truth for the behavior of a program and must thus be treated as an important analysis subject. In order to analyze programs statically, one still needs information about the possible behavior of the program, which in turn requires knowledge of the possible program states at each program point that determine this behavior. Part of this program state are the current contents of memory locations, or in higher-level languages the values of variables. One particular static analysis that determines this information is the value analysis, which computes a set of possible values of each variable or memory cell, called the variation domain (VD). As an example, consider Listing 1.1. A value analysis for this program would compute the possible values of the variable i at each program point, for example the set {0, 1 at line 6. If the original program contains an operation on variables to compute some result, a value analysis needs a way to combine VDs with respect to this operator in order to compute all possible results. During an analysis, 1

8 1. Introduction Listing 1.1: Example program 1 #i n c l u d e <s t d i o. h> 2 i n t main ( void ) { 3 i n t a [ 2 ] = { 4, 9 ; 4 i n t i ; 5 f o r ( i = 0 ; i < 2 ; i++) 6 p r i n t f ("%d\n ", a [ i ] ) ; 7 return 0 ; 8 these computations are performed using transfer functions, which are the operators of the concrete program lifted to the world of variation domains. As an example, in order to analyze the example program we need a transfer function for the addition, which would compute all possible values of i after the incrementation. or this thesis, we use BDDs to represent the variation domains. his representation allows efficient and precise transfer functions for bitwise operators, which are a common occurrence in machine code, and addition, as defined by Mattsen et al. [6]. he overarching topic of this thesis is the development of transfer functions for multiplication, since there currently only exists a vastly over-approximating transfer function and multiplication commonly occurs in executables, even if no explicit multiplication was employed in the source program, making a more precise transfer function a worthwhile research subject. o see this peculiarity, consider the array access in line 6 of the example program. he corresponding instruction in machine code, load a + i * 4 in pseudo code, computes the address of the array element from the index i and achieves this by multiplying the index i by a constant factor of 4, since we assume a system working with 4 byte integers, and adding this resulting offset to the base address of the array. igure 1.1 visualizes how the resolution of the address of an array element works: the start address of the array a is given by 0x3 and thus the start address of the i th array element by 0x3 + i 4. As part of this thesis, we will present an algorithm computing exact results for the special case of a singleton multiplication, i.e., multiplication where one VD only contains a single element as it is the case for array accesses, where the singleton set in our example would hold the value 4. We will also describe an algorithm for the general case of multiplication, which however only approximates, since a precise... 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA... a[0] a[1] igure 1.1.: Layout of an array in memory (squares represent bytes in memory, the text inside represents their address) 2

9 calculation would not be computationally feasible. As part of the singleton multiplication we will present an algorithm for the construction of BDD-based sets representing strided intervals, which arise when multiplying an interval with a constant. Having found a way to convert from the strided interval domain to the domain of BDD-based sets, we expand on that and devise an algorithm that given a BDD-based set finds an optimal strided interval, which is a superset of the original one, allowing us to conveniently convert between the abstract domain of BDD-based sets and strided intervals. In short, the contributions of this thesis are a precise transfer function for the special case of singleton multiplication an approximating transfer function for the general multiplication abstraction and concretization functions between the domain of BDD-based sets and the strided interval domain an evaluation of the presented multiplication algorithms. he thesis is structured into eight chapters. We will begin by explaining important fundamental concepts in Chapter 2, including data-flow analysis and the representation of integer sets as BDDs. Chapter 3 is dedicated to the explanation of the transfer functions for the multiplication that we designed. In Chapter 4 we will explain a conversion between the domain of strided intervals and our domain of BDD-based integer sets. Chapter 5 will serve as a description of the implementation of the developed algorithms. An evaluation of our algorithms will be presented in Chapter 6. We will discuss related work in Chapter 7 and conclude the thesis in Chapter 8. 3

10

11 2. Background 2.1. Data-flow Analysis One technique of static analysis constitutes data-flow analysis [2], which examines program properties that are dictated by the specific way the data is propagated through the program, i.e., how the sequence of executed code, determined by the control flow, changes the state of the program. Data-flow analyses are used for example during optimization phases of program compilation. here exist a plethora of specific analyses that fall under the umbrella of data-flow analysis, such as the live variables or reaching definitions analysis. hey all depend on the control flow of the program, typically represented by a control flow graph, and define a property space of data-flow information specific to their goals, which is derived during analysis by means of combination and modification of previously known information. In particular, for each node in the control flow graph a transfer function modeling the effects of the corresponding basic block is defined, which computes the exit state of the block, i.e., the properties known after execution of that block, from incoming information, i.e., properties of the program known before the basic block is executed. he incoming information is computed by combining the exit states of all predecessor nodes, using the join function of the analysis. or each node b of the control flow graph we have: in b = join p predb (out p ) out b = trans b (in b ) his gives rise to a system of equations for the program depending on the analysis used. Starting at the entry node of the program and the initially known information, one can iteratively traverse the control flow graph, computing new exit states based on the combination of incoming information and propagate this knowledge through the control flow graph. his is repeated until the properties known at each node do not change anymore, i.e., we reached a fix point of the system of equations. Binary Analysis Analyzing executables using a data-flow analysis entails several challenges. o begin with, the extraction of the control flow graph of an executable is much harder compared to the case of control-flow extraction from higher-level languages, since the original control flow given by higher-level control structures is reduced to jump instructions in the executable. While static jumps to a known instruction are easily resolved, dynamic jumps to an address that is not statically known pose a significant obstacle, since the possible jump targets must be computed as part of the analysis. In order to formulate a sound analysis, the computed set of jump targets must be a superset of all possible jump targets in the original program, otherwise we would ignore certain valid sequences of executed instructions. However, the analysis needs to make sure that the derived set of jump targets is as precise as possible, since an over-approximation of jump targets 5

12 2. Background S1 int a[2]={..; int i = 0; S2 i < 2 false EXI printf(..); i = i + 1; S3 true igure 2.1.: Control flow graph for the code in Listing 1.1 leads to an over-approximation of the control flow graph, which might entail the analysis of instruction sequences that were not present in the original program. In practice, the reconstruction of the control flow graph and the data-flow analysis can be combined: during the analysis of an instruction, all successor instructions are derived and then visited, instead of knowing the successors prior to the examination of an instruction. Value Analysis he specific data-flow analysis we are interested in is the value analysis as mentioned in the introduction, which computes variation domains for the variables respectively memory cells in the program, i.e., it analyzes the possible values that these variables can hold. his information is used for example to check for out-of-bound accesses on arrays, which happen if a possible value for the index exceeds the bounds of the array, or for the computation of possible jump targets of dynamic jumps as required by the reconstruction of the control flow graph for executables. As an example, in the following we perform a value analysis on the example program in Listing 1.1 from the introduction. he corresponding control flow graph is displayed in igure 2.1, where the basic blocks of the program are represented by nodes with edges connecting basic blocks that are executed in sequence. If the execution depends on a conditional expression, the required value of this expression for a basic block to be executed is used as label of the edge. We will now define our analysis by starting with the property space, i.e., the information that we want to track. Since we want to infer the possible values of the single variable i of the program, we use the power set of all integers P(Z) as our property space, i.e., as our flow data we always have a set of possible values. In practice, such a property space is not feasible because we would have to be able to store infinite sets, but is used here for a simplification of the example. We define the join operator, which 6

13 2.1. Data-flow Analysis combines the information of different variation domains, as the set union, i.e., we have: join(s 1,..., s n ) = s 1... s n We will now define the transfer functions for the nodes in the control flow graph, which update a variation domain based on the code that would be executed, as follows: S1: this basic block initializes i to the value 0, which means that the possible values of i after this block are given by the singleton set 0, i.e., trans S1 (in) = {0. S2: the conditional does not modify i and its transfer function is thus the identity function, i.e., trans S2 (in) = in. S3: in the first statement of this basic block i is only read and not modified, while the second statement increments i by 1, which means that we form the new variation domain by adding 1 to all previously possible values, i.e., trans S3 (in) = {i + 1 i in. EXI: the exit node does not modify the values of i and thus we arrive at trans EXI (in) = in. We initialize in S1 with Z, since we have no information about i at that point, i.e., i could be any integer. he initial outgoing properties out n are initialized with the empty set, since this is the neutral element to the set union. he analysis proceeds as follows: Step 1: We calculate out S1 as out S1 = trans S1 (Z) = {0. Step 2: in S2 is computed by combining the outgoing properties of S1 and S3, i.e., in S2 = join({0, ) = {0. rom this we determine the new exit state as out S2 = trans S2 ({0) = {0. Step 3: We compute the outgoing property of S3 from the information incoming from S2: out S3 = trans S3 ({0) = {i+1 i {0 = {1. his tells us that the possible values of i after S3 are just the single number 1. Step 4: Since the outgoing information of S3 changes, we have to recompute the incoming state for S2 as in S2 = join({0, {1) = {0, 1, which also constitutes the new outgoing information of S2. Step 5: Due to the change of out S2 we have to update out S3, since we have a new incoming variation domain: out S3 = trans S3 ({0, 1) = {i + 1 i {0, 1 = {1, 2. Step 6: We proceed by updating the incoming variation domain for S2 based on the new outgoing variation domain from S3 as in S2 = join({0, {1, 2) = {0, 1, 2. Step 7: Since the condition i < 2 would now no longer be fulfilled for S2, we take the false edge to the exit node. Here we just propagate the incoming variation domain, i.e., the outgoing one from S2, and conclude that the possible values of i at program exit are out EXI = trans EXI ({0, 1, 2) = {0, 1, 2. 7

14 2. Background 2.2. Abstract Interpretation he sound static analysis of a program encompasses the consideration of all possible execution behaviors, since all defects or properties that are present in the original program and of interest must be discovered as part of the analysis, i.e., no false negatives may exist. As a consequence, such an analysis must capture all possible execution traces, i.e., all sequences of states of the program, which for example includes the values of variables. However, the sheer size of the state space of non-trivial programs prevents the consideration of every single program state from being computationally feasible. o circumvent this unreasonable computation, one can abstract the concrete program to an abstract one, which reaches abstract states, and perform the analysis on the abstract version. he abstract program, consisting of an abstract state space and state transformers, must obviously be a faithful representation of the original program in order for the analysis to be correct and sensible. Abstract interpretation [3] is a framework facilitating the conception of sound static analyses based on abstracting a concrete program with respect to its semantics. As the result of the process, such an analysis derives properties that describe the concrete values of the concrete program. or the case of a value analysis, the properties would describe the possible values of a concrete variable in the original program. Intuitively, an analysis is correct if the properties describe (at least) all possible program values, which in our case means that we derive a variation domain which is a superset of the actual one of the concrete program. o formalize the correctness of an analysis, one needs to define a correctness relation relating concrete program values to abstract properties dependent on the particular analysis goals. If this correctness relation is preserved under computation the analysis is shown to be correct, i.e., given that the relation holds between an initial value and initial property it must also hold between the final value and property after the property has been transformed by a transfer function corresponding to the semantics of the concrete program. o facilitate the construction of correct analyses, the properties are required to be organized in a lattice, i.e., a set with meet operator, join operator, and partial ordering, where the ordering represents a metric for the precision of a property or the amount of information known, i.e., if l 1 l 2 then l 2 represents at least the information represented by l 1, which means that if l 1 correctly describes a concrete value then l 2 does so too. Based on the goals of the analysis to be performed, one can now choose a property space that can hold all information required but is still small enough to enable efficient computation. Generally, the more abstract and thus less precise a chosen abstract domain of properties and transfer functions is, the more efficient is the computation in that domain, which means that one has to choose a trade-off between precision and computability. As an example, if an analysis is to be designed that checks whether array indices are always within bounds there is no need to track all possible values of the indices complicating the computation; the smallest and largest possible value alone are sufficient. However, it is also possible to change the abstract domain used during an analysis itself without compromising the correctness of the analysis: if a computation is too expensive in a more concrete domain one can approximate this do- 8

15 2.3. BDD-Based Integer Sets main by a more abstract one and perform the computation in this new domain followed by conversion back to the more concrete domain. he basis of these safe abstractions are the Galois connections, which dictate how abstraction to and concretization from the abstract domain must behave. Let L and M be two lattices with L representing a more concrete property space, α : L M the monotone abstraction function relating more concrete properties to more abstract ones and γ : M L the corresponding concretization function for the opposite direction. ogether with the concrete property space L and abstract one M, α and γ form a Galois connection, if c L and a M c γ(α(c)) (2.1) α(γ(a)) a (2.2) he first condition expresses that no information is lost by going to the more abstract domain and back, whereas the second condition expresses the requirement that a concretization followed by an abstraction may not add information. A more strict version of a Galois connection is the Galois insertion, which eliminates superfluous elements in the more abstract domain M by requiring equality in the second equation, i.e., a concretization followed by an abstraction must yield the original element and thus all elements in M uniquely describe a value in L BDD-Based Integer Sets he representation of integer sets as variation domain for our value analysis is based on the concept of Binary Decision Diagrams (BDDs). his section will explain the basics of BDDs and how they can be used to represent integer sets. Binary Decision Diagrams BDDs are directed, acyclic graphs, which are commonly used to represent Boolean functions and are formed from two types of nodes: terminal and decision nodes. erminal nodes, also called sinks, are nodes that are either labeled, meaning alse-terminal, or for a rue-terminal and are the leaves of a tree since they have no successors. or the extend of this thesis, they will be represented as rectangular boxes. Decision nodes are inner nodes of BDDs, labeled with a variable of the represented Boolean function, and have two outgoing edges, a 0-edge (dashed line) and a 1-edge (solid line), representing the value of the variable. We will depict them as circles. BDDs have a single root node. igure 2.2b shows an exemplary BDD representing the Boolean xor-function given by the truth table in igure 2.2a. In order to evaluate a BDD that represents a n-ary Boolean function f : {0, 1 n {0, 1 with variables x i, we start at the root node and repeat the following steps: If the node is a sink, the result for the given input is the label of the node. 9

16 2. Background x 1 x 2 x 1 x (a) ruth table for the xor-function 1 x 1 x 2 x (b) BDD for the xor-function 0 If the node is a decision node with label x i, we look up the input value corresponding to the variable x i. If this value is 0, we need to follow the 0-edge to reach the next node, and otherwise follow the 1-edge. As an example, we evaluate our example function for the input 01, i.e. x 1 = 0 and x 2 = 1. We start at the root node with label x 1 and follow the 0-edge to the right subtree since x 1 = 0. After reaching the (right) node with label x 2, in the next step we need to follow the 1-edge, reaching a rue-terminal, and conclude that f(0, 1) = 1. Reduced Ordered Binary Decision Diagrams (OBDDs) In practice, however, regular BDDs are rarely used, since their lack of enforcement of ordering between nodes can lead to unwanted configurations and less efficient algorithms. or this reason, OBDDs were introduced, which augment normal BDDs with an ordering restriction, which all paths in the BDD must adhere to. In particular, a total order π on the set of variables x 1,..., x n needs to be defined and the following statement must hold: x i {x 1,..., x n : x j Successors(x i ) : x i < π x j or each path in the graph we have that the variables on this path are ordered with respect to π. his also means that no variable can appear twice in a path to a terminal. hese OBDDs still do not optimally represent Boolean functions, since they can still contain redundant information, such as a decision node that points to two equivalent sinks. his gives rise to reduced OBDDs, which remove these redundant information. o achieve this, several reduction rules that transform a non-reduced OBDD to a more reduced one, i.e., one with less nodes, are defined and an OBDD is called reduced if there is not a single rule that is still applicable. he reduction rules for our purposes are the following: Rule 1 If there exists a node in the OBDD that has both edges pointing to the same terminal, replace the node with that terminal and redirect the incoming edges. Rule 2 If the OBDD contains two or more equivalent subgraphs, remove all but one and redirect the incoming edges of the removed ones to the remaining subgraph. 10

17 2.3. BDD-Based Integer Sets Note that these rules differ from the ones commonly found in literature in that the first one only applies if the successors are terminals and not arbitrary subgraphs. An advantage of these ROBBDs is that two equivalent Boolean functions have the same representation, i.e., we have canonicity, which simplifies comparing for equality. In the following, for simplicity, we will refer to ROBDDs as BDDs and may use nonreduced BDDs for our examples, if this leads to an improvement in clarity. Indicator unctions An indicator function I A, sometimes also called characteristic function, for a given set A is a function that indicates whether an element is member of the set or not. o achieve this, the function must accept all values, of which the set can be made up, and return a Boolean value, indicating whether this value is element of the set. his leads to the following definition: Definition 1. Let U be a set of all possible values (universe) and A U a set of interest, the indicator function I A is defined as follows: Expressed differently, we have: I A : U {0, 1 { 1 x A x 0 x / A x A I A (x) = 1 hese indicator functions can be used to represent sets, which is especially useful if storing the indicator function is more size-efficient than storing every single value of the set. hey also allow for natural definitions of indicator functions for combinations of sets. Let A U and B U be sets. We now have for example: I A B (x) = I A (x) I B (x) I A B (x) = I A (x) I B (x) If we use BDDs to represent sets using indicator functions, this allows us to quite easily compute the union and intersection of two sets by simply combining the BDDs using well known algorithms such as the If-hen-Else algorithm [4]. BDD-Based Integer Sets It should be obvious that a BDD representing the Boolean function f : {0, 1 n {0, 1 can be seen as a specialized indicator function, where the set of all possible values U is just the set of all n-ary bit vectors. If we now interpret these n-ary bit vectors as unsigned integers in binary, we have found a way to encode sets of n-bit integers as BDDs. he ordering in our BDDs is 11

18 2. Background ? 0? 1? 0? 1? 0? 1? 0? 1? 0? 1? 0? 1? 0? 1? 0? igure 2.3.: BDD for 4-bit integers defined as being from most significant bit (MSB) to least significant bit (LSB), which allows us to easily find the minimum and maximum element in a BDD-based set. As an example, consider a BDD that can store arbitrary unsigned 4-bit integer sets, displayed in igure 2.3. Considering this visualization, we can make some observations: Combining the labels on the path from root to terminal simply gives us a binary number. Each decision node represents a specific bit in the binary number, dependent on its level in the tree, and the outgoing edges determine the value of that bit. he root node, for example, represents the first bit in the number, and all numbers in the right subtree have the first bit unset, while the MSB of the numbers in the left subtree is set. We can also derive that a subtree at depth m, and thus of height h = n m, represents a set of binary numbers on its own. here are two ways to look at such a subtree: on its own it represents a set of h-bit binary numbers, which is subset of the interval [0, 2 h 1]. But if this subtree is viewed as part of the entire tree, it determines the membership of binary numbers with a certain m-bit prefix, namely the edge labels along the path from root node to the subtree. Consider the subtree reached by first taking the 1-edge and then the 0-edge. All numbers represented by this subtree have the prefix 10 and have thus the form 10--, those numbers are thus from the interval [8, 8+3 = 11]. In general, the smallest possible number represented by a subtree is the binary number given by taking the edge labels on the path to the subtree with the remaining places set to 0, which we will call a. hen this subtree represents a set of numbers from the interval [a, a + 2 m 1]. If an entire subtree only contains terminals of one specific flavor, our reduction rules demand this subtree to be replaced by this terminal. So whenever there is a terminal not at the lowest level, it counts for an entire interval that is either subset of the represented set or not. As a specific example, consider the reduced BDD representing the set {0, 3, 4, 5, 6, 7 in igure 2.4. Since none of the numbers in the interval [8, 15], i.e., numbers with the first bit set, is element of the set, the subtree representing these numbers is simply set to the alse-terminal. Similarly, all numbers of the interval [4, 7], which have the prefix 01, are member of the set and thus their corresponding subtree is set to the rue-terminal. 12

19 2.3. BDD-Based Integer Sets igure 2.4.: BDD representing the set {0, 3, 4, 5, 6, 7 or the remaining numbers, 0 (0000) and 3 (0011), no common prefix can be found, such that all numbers with that prefix are member of the set, preventing a reduction of the rightmost subtree of height 2. Using BDDs to represent integer sets, we can now efficiently store the variation domains of integer memory locations. One big advantage of using BDDs lies in the fact that we do not have the restriction of convexity, such as for example the interval domain, since we can store arbitrary sets. his ability is especially important for the analysis of binaries, since variation domains of certain types of memory locations, for example those holding jump targets for dynamic jumps, can contain element of almost arbitrary value, which would cause convex domains to vastly over-approximate. or the abstract domain of BDD-based sets, there already exist a number of transfer functions for certain operators like the addition or bitwise operators. 13

20

21 3. Multiplication of BDD-Based Integer Sets We have seen in the example from the introduction that transfer functions for multiplication of variation domains are imperative for the analysis of executables. he design of algorithms for these transfer functions is the focus of this thesis and will be discussed in this chapter. In the following, whenever we speak of set multiplication we mean the cartesian product of the sets with each tuple mapped to the product of its components, i.e., A B.= {a b a A, b B, where is defined as the operator for set multiplication. his operator can be interpreted as an exact transfer function for multiplication, i.e., one without overapproximation. We will start by describing an over-approximating algorithm for the multiplication of general sets, since computing the exact result for BDD-based sets would not be computationally feasible. Intuitively, the complexity of set multiplication can be comprehended when looking at the dependencies during binary multiplication between the input bits of the operands and the output bits of the result: whereas for bitwise operators the result of the bit at the i th position only depends on both input bits at the i th position, the result of the i th bit during multiplication depends on all previous bits of both operands. his means that if we were to precisely compute the result of the product of BDD-based sets, at each decision node, we would have to incorporate the information of both subtrees of the operands, compared to just looking at the outgoing edges for the case of bitwise operators. In addition to this general multiplication, we will introduce and explain an algorithm for the precise computation of the special case of the singleton multiplication, where one of the operands only contains a single element, i.e., is a singleton set. It makes sense to put effort into designing such a precise algorithm for this special case, since the multiplication with a singleton is a frequent operation in machine code. As seen in the example from the introduction, whenever an index into something comparable to an array, i.e., a memory region of contiguous elements of a certain size, needs to be converted to an actual byte offset in this region, the index, whose possible values are captured in a variation domain, gets multiplied by the constant size of each element, giving rise to the aforementioned singleton multiplication Multiplication of General Sets he basic idea behind an approximating multiplication of general sets is that while we can not easily compute the result on BDDs directly, we can convert a BDD to an approximated intermediate representation, which allows for an efficient set multiplication, and then multiply the operands in this representation followed by converting the result 15

22 3. Multiplication of BDD-Based Integer Sets back to a BDD. As intermediate representation we have chosen to approximate BDDs, i.e., arbitrary sets, by a set of intervals I = {i 1,..., i n. o keep our analysis correct, an approximation of a BDD-based set A must of course not lose any information, i.e., integer elements: i A i I Let A, B be BDD-based sets and I A, I B their approximations using intervals, a safe over-approximated result for the set multiplication can then be computed as follows: mult gen (A, B) = (i a,i b ) I A I B i a int i b, where int is the transfer function for interval multiplication using the interval bounds, i.e., [a, b] int [c, d] = [a c, b d]. In short, we form each possible combination of intervals from both operand approximations, multiply those intervals and finally form the union of the resulting intervals. he final result is converted back to a BDD to conclude our general set multiplication. In the context of abstract interpretation, this can be seen as abstracting from our original BDD domain to the domain of interval disjunctions, computing the result in that domain, and then concretizing the result back to our original domain. As an example, consider the BDD-based sets A = {0, 1, 5, 6, 7 and B = {2, 3, 4. A valid approximation for A would be I A = {[0, 1], [5, 7] and for B I B = {[2, 4], since every element in an original set is part of (at least) one interval in the approximation. orming all possible combinations of intervals yields I A I B = {([0, 1], [2, 4]), ([5, 7], [2, 4]), combining the tuple components using interval multiplication gives us the set {[0, 4], [10, 28] and forming the union of all contained sets finally results in mult gen (A, B) = {0, 1, 2, 3, 4, 10, 11,..., 28. he precise set multiplication has the result {0, 2, 3, 4, 10, 12, 14, 15, 18, 20, 21, 24, 28, which is a subset of the approximated result as requested. In this case, the approximated result contains mult gen (A, B) = 24 elements, whereas the precise one only A B = 13 elements, an increase of roughly 85%. Since the interval multiplication over-approximates by a large margin, it is essential to find a good approximation to a set of intervals. We will now discuss how this approximation, i.e., converting a BDD-based set to a set of intervals, can be implemented. he underlying idea for this conversion is that each subtree in a BDD represents a set and this set can be either approximated by a set of intervals, if we want more precision, or a single interval, by taking its lower and upper bound. In principal, our conversion algorithm recursively traverses through the tree and decides for each subtree, i.e., each decision node, whether to approximate the subtree by a single interval or by a set of intervals, which is done by recursively converting both subtrees to sets of intervals and forming the union of them. If the decision is made to approximate by a single interval, the algorithm finds the smallest and greatest element 16

23 3.1. Multiplication of General Sets Algorithm 1 Abstracting BDD-based integer sets to sets of intervals 1: function ApproxBDDSet(bdd, depth, start) 2: if bdd is Nalse then return 3: else if bdd is Nrue then 4: return {[start, start + 2 n depth 1] 5: else if SUBREE-ES(bdd) then 6: s1 ApproxBDDSet(falseSucc(bdd), depth+1, start) 7: s2 ApproxBDDSet(trueSucc(bdd), depth+1, start + 2 n depth ) 8: return s1 s2 9: else 10: min min(bdd) 11: max max(bdd) 12: return {[start + min, start + max] 13: end if 14: end function in the subtree and uses those values as the bounds of the interval, a task that is made efficient by the ordering of MSB to LSB, since we only need to find the rightmost and leftmost rue-terminal. Another possibility, which does not involve finding the minimum and maximum and is thus more efficient, would be to treat the whole subtree as if it was a single rue-terminal, i.e., setting the lower bound of the interval to the smallest possible element in the subtree and the upper bound to the biggest possible one. As our base cases, we have that rue-terminals result in a singleton set of just the interval, which the terminal represents, and alse-terminals in an empty set. Based on this, we can define Algorithm 1, which still depends on a predicate SUBREE-ES deciding the cut-off point in the tree, i.e., whether to approximate the subtree by a single interval or multiple ones. In the algorithm, n represents the bit-size of our integers, i.e., the maximum height of the BDD, the input start keeps track of the path of root node to our subtree, i.e., the common prefix of all numbers. In the following we will describe a few possible predicates that we have evaluated for our analysis. In each case, trade-offs between the cost of the computation of the predicate and its meaningfulness must be made. he first very simple approach decides based on the level of the subtree in the entire tree, meaning the length of the path from root to this subtree, and recurses deeper into the tree only if this value is smaller than a specified cut-off value. his means that we only ever go up to a certain depth into the tree and approximate by sets of intervals until then. Algorithm 2 implements this idea. he computation of this predicate is obviously very cheap. Algorithm 2 Depth-based cut-off 1: function DepthCutoff h (bdd) 2: return depth(bdd) < h 3: end function 17

24 3. Multiplication of BDD-Based Integer Sets Another strategy is to compute the proportion of number of elements inside the subset represented by the subtree to the maximum possible number and only recurse if this value is smaller than some parameter representing a precision, i.e., we decide to approximate a subtree by a single interval if it proportionally contains enough elements causing us to not lose unreasonable amounts of information. An implementation of this approach is presented in Algorithm 3. his predicate can be made more precise by comparing the number of elements in the subtree to the number of elements in the approximating interval that would be created, because in that case we do not consider the border regions of the subtree that do not contain any elements. However, this would increase the cost of the computation of the predicate. Algorithm 3 Precision-based cut-off 1: function PrecisionCutoff p (bdd) 2: count elemcount(bdd) 3: h height(bdd) 4: maxcount 2 h 5: return count maxcount < p 6: end function Correctness We will now demonstrate that the presented algorithm for general multiplication provides a correct result with respect to the framework of abstract interpretation. o achieve that, we must show that the computed result is a superset of the precise result, i.e., that we do not lose information. he precise result for two sets A and B is given by A B := {a b a A, b B, which means that we have to show that the product of each combination of values out of A and B is an element of our result. Let A, B be integer sets represented by BDDs and I A, I B their approximations using sets of intervals. We have that a A, b B : i a I A, i b I B : a i a b i b, i.e., for each combination of a A and b B we can find two intervals in the approximations, such that the values a and b are members of an interval. he product of these sets is then computed as part of our algorithm and is a subset of the final result, since the union of all these interval products is formed. We now need to show that a b i a int i b, which is clearly the case, since a and b are bounded by the bounds of their respective interval and the bounds of the resulting interval are just the original bounds multiplied Singleton Multiplication In this section we will describe how to design an algorithm for the special case of singleton multiplication. Due to the prevalence of singleton multiplication in executables, we require our algorithm to compute an exact result, i.e., we do not want it to overapproximate. Since the algorithm should still be able to compute the result efficiently, 18

25 3.2. Singleton Multiplication igure 3.1.: Augmented BDD with edge weights we need to find a way to directly operate on the structure of BDDs without using an intermediate representation as in the case for general multiplication. We will start by taking a closer look at how the binary representation of integers works and how this information can be made explicit in our BDD representation, which will later on help us visualize the inner workings of the algorithm for singleton multiplication. A n-bit binary number x = b n,..., b 1 with b i {0, 1 is at its core just a bit vector. he corresponding number is a sum of powers of 2, which depends on the values of the bits. We have n x = b n,..., b 1 = b i 2 i 1. i=1 One way to interpret this is, that each bit contributes a value to the final result: if the bit is 0, this value is also 0, but if it is 1, the value is a power of 2 that depends on the position of the bit in the bit vector. We will now augment the classic BDDs by adding this information explicitly. he decision nodes represent a certain bit in the bit vector and the outgoing edges the value of that bit, i.e. the b i s in our sum. Previously the powers of 2 that each level in a BDD represents, i.e. the 2 i 1 s were only implicit, leading us to explicitly add them in our augmented BDD by setting them as the weight of the edges. igure 3.1 is an example of an augmented BDD for a height of 2. Having such a representation allows us to simply sum up all the weights along a path from root node to a terminal in order to find out what value this terminal represents. As an example take a look a igure 3.1. he sum along the path to the leftmost terminal is = 3, as expected since 11 is the binary representation of the decimal number 3. erminals that are not on the lowest level represent intervals that start with the sum of the edge weights of the path to this terminal and have a certain size dependent on the position in the tree. or the case of singleton multiplication we want to multiply many of such binary numbers by the single element of the singleton operand, which we will call y. or one specific x = b n,..., b 1, the product x y is given by n n x y = ( b i 2 i 1 ) y = b i 2 i 1 y. i=1 i=1 Interpreting this result, we can see that the product of a binary number with another number y can be computed by adding up powers of 2 multiplied by y, depending on 19

26 3. Multiplication of BDD-Based Integer Sets 2 y 0 y 1 y 0 y 1 y 0 y igure 3.2.: Augmented BDD with edge weights for singleton multiplication the values of the bits of the first operand. Yet again, each bit in the input contributes a summand to the output, prompting us to adapt the augmented BDD by multiplying each edge weight by y, such that the edge weights reflect the summands. he sum of edge weights along a path from the root node to a terminal now represents the product of the original number in the BDD and y. he basic idea of the singleton algorithm consists of summing up the edge weights along the paths to each rue-terminal and collecting the results. As an example, consider igure 3.2, where the edge weights have been multiplied by y. We again look at the leftmost terminal representing 3, the sum along the path is 2 y + 1 y = 3 y, which is exactly what we would expect. We now describe a recursive algorithm that takes as input a BDD and the singleton value y of the second operand and returns a BDD which represents the set of each integer in the input BDD multiplied by y. As a starting point, we will derive a recursive formula for the multiplication of a single binary number with another number, which can then be extended to work for sets of numbers represented by BDDs as first argument. We define a function that takes a bit vector representing an integer as first argument and a natural number y as second argument: mult : {0, 1 i N 0 N 0 i ( b i,..., b 1, y) b i 2 j 1 y his is just the formula from above adapted to a function definition syntax. We can then extract the last summand (j = i) from the sum, giving us: j=1 i 1 ( b i,..., b 1, y) b i 2 i 1 y + b i 2 j 1 y. he second summand of the equation is just mult( b i 1,..., b 1, y), giving rise to the recursive formula for the multiplication: j=1 mult( b i,..., b 1, y) = b i 2 i 1 y + mult( b i 1,..., b 1, y). As the base case we set mult(, y) = 0, since 0 is the neutral element to the addition. In short, when we multiply a binary number by another number, we compute the recursive result, where we remove the first bit from the number, and add to it a power of 2, which 20

27 3.2. Singleton Multiplication depends on the length of the input vector, multiplied by y if the first bit is 1, and 0 otherwise. We now adapt this recursive formula to BDD-based integer sets as the first argument. We will use the notation {... # to represent BDD-based sets. he input BDD can either be a decision node or a terminal. We start by considering the case of a decision node. Such a decision node is the root of a subtree of a certain height, which represents a set of integers of the form x j = b i,..., b 1. he decision node itself represents the first bit b i of all these integers, and the outgoing edges the value of the first bit. his means that for all integers in the right subtree of the decision node, where we have an incoming 0-edge, the first bit is 0 and for all numbers in the left one 1, respectively. he remaining bits b i 1 to b 1 of the integers are determined by the subtrees, which means that if we call the singleton algorithm recursively for a subtree, we get back a BDD-based set representing the integers that arise if we multiply the suffix b i 1,..., b 1 of the original numbers x j by y. Based on the recursive formula for multiplication of two single numbers, we now define a way to combine the recursive result and the information of the decision node to form the final result: let bdd be a decision node of the tree and ts and fs its true- and false-successor, respectively. he result for this node is computed by: mult sing (bdd, y) = (mult sing (ts, y) + {2 i 1 y # ) mult sing (fs, y), where i is the level that the subtree is placed in the original BDD and + the transfer function of addition for BDDs, which already exists. Since all binary numbers that end in the left subtree have a leading 1, we add the edge weight at that level, 2 i 1 y, to the recursive result of the left subtree, which is a BDD representing the lower bits of that subset multiplied by y, and form the union with the recursive result of the right subtree, where the leading bit is 0 and thus no addition necessary, giving us a new BDD, where the edge weight was added to each number. or the base cases, we have that multiplication by the alse-terminal, i.e., the empty set, results again in the empty set. or a rue-terminal, for now we restrict ourselves to ones on the lowest level, we return a BDD representing the singleton set of 0, since 0 is again the neutral element to the addition. Since a rue-terminal at a higher level would be equivalent to a fully expanded subtree with only rue-terminals, we will later on derive what the expected result of the recursive call for such a subtree would be and from that define the result for such a terminal. In Algorithm 4 the idea is formulated as pseudo code. As an example, consider the multiplication of the BDD-based set A = {1, 2, 3 # with the singleton set B = {5 #. he BDD representing the set A is displayed in igure 3.3a, with the edge weights, i.e., the terms b i 2 h 1 y, already filled in. We start by computing the result for the base cases, which is the singleton set {0 # for each rue-terminal, and the empty set for all alse-terminals, which can be observed in igure 3.3b. or the next step, we calculate the result for the decision nodes representing the last bit. he general procedure is to take the recursive result for the subtrees, add the edge weight to each number in that result, and form the union of those resulting sets. In our case, we have for the right decision node the recursive results # for the right subtree, which has an incoming edge with weight 0, giving us the partial result {0 # + # = #, and {0 # for 21

28 3. Multiplication of BDD-Based Integer Sets Algorithm 4 Incomplete singleton multiplication 1: function MultSing(bdd, y, height) 2: if bdd is Nalse then return # 3: else if bdd is Nrue height = 0 then 4: return {0 # 5: else if bdd is Node(trueSucc, falsesucc) then 6: fr MultSing(falseSucc, y, height - 1) 7: tr MultSing(trueSucc, y, height - 1) 8: weight {2 height 1 y # 9: return fr (tr + weight) 10: end if 11: end function the left subtree with incoming edge weight 5, resulting in {0 # + {5 # = {5 #. his leads to the final result # {5 # = {5 # for this decision node. his result makes sense, since the subtree spanned by this decision node represents the set of 1-bit integers {1, which also intuitively gives the result {5 when each element is multiplied by 5. A similar computation is performed for the left decision node: we calculate {0 # + {0 # = {0 # for its right subtree and {0 # + {5 # = {5 # for its left subtree. he union of these two sets {0 # {5 # = {0, 5 # then determines the final result for this decision node. Intuitively this result makes sense, since the left subtree represented the set of 1-bit integers {0, 1, which multiplied by 5 results in just {0, 5. hese results are shown in igure 3.3c. o complete our computation for the entire subtree, we examine the root node: the recursive result for the right subtree was {5 #, adding the corresponding edge weight 0 to each element yields the set {5 # again. or the left subtree we add the edge weight 10 to each element of the recursive result {0, 5 #, giving us {10, 15 #. he union of these two sets {5 # {10, 15 # = {5, 10, 15 # constitutes the final result of the singleton multiplication, as expected and shown in igure 3.3d. Worth highlighting is that each set written down in this example would be represented as a BDD during the computation of the algorithm and specialized versions for set addition and union would be used. Let us now revisit the case of a rue-terminal, which is not placed at the lowest level, but at a certain depth d. If we expand this terminal, we will get a subtree of height h = n d, where all paths lead to a rue-terminal. On its own, such a BDD would represent all h-bit numbers, i.e., the interval [0, 2 h 1]. If such a BDD would be a subtree during our singleton multiplication, the recursive result would be each of the numbers in this interval multiplied by the singleton element y, i.e. [0, 2 h 1] {y = {0, y, 2 y,..., (2 h 1) y. Such a set is called a strided interval, i.e., an interval with an additional parameter, the stride, that determines the distance between consecutive elements, written as s[a, b], where s is the stride. In our case the strided interval would be y[0, (2 h 1) y]. his means that whenever we come across a rue-terminal at a higher level, the result of our multiplication must be a BDD representing such a strided interval. In order to achieve good performance in our singleton algorithm, we need a 22

Change- and Precision-sensitive Widening for BDD-based Integer Sets

Change- and Precision-sensitive Widening for BDD-based Integer Sets Bachelor hesis elix Lublow Change- and Precision-sensitive Widening for BDD-based Integer Sets October 06, 2016 supervised by: Prof. Dr. Sibylle Schupp Sven Mattsen Hamburg University of echnology (UHH)

More information

Lecture Notes on Binary Decision Diagrams

Lecture Notes on Binary Decision Diagrams Lecture Notes on Binary Decision Diagrams 15-122: Principles of Imperative Computation William Lovas Notes by Frank Pfenning Lecture 25 April 21, 2011 1 Introduction In this lecture we revisit the important

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Boolean Representations and Combinatorial Equivalence

Boolean Representations and Combinatorial Equivalence Chapter 2 Boolean Representations and Combinatorial Equivalence This chapter introduces different representations of Boolean functions. It then discusses the applications of these representations for proving

More information

Bits, Words, and Integers

Bits, Words, and Integers Computer Science 52 Bits, Words, and Integers Spring Semester, 2017 In this document, we look at how bits are organized into meaningful data. In particular, we will see the details of how integers are

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

Chapter 2: Number Systems

Chapter 2: Number Systems Chapter 2: Number Systems Logic circuits are used to generate and transmit 1s and 0s to compute and convey information. This two-valued number system is called binary. As presented earlier, there are many

More information

A Signedness-Agnostic Interval Domain with Congruences and an Implementation for Jakstab

A Signedness-Agnostic Interval Domain with Congruences and an Implementation for Jakstab Bachelor Thesis Anselm Jonas Scholl A Signedness-Agnostic Interval Domain with Congruences and an Implementation for Jakstab May 31, 2016 supervised by: Prof. Dr. Sibylle Schupp Sven Mattsen Hamburg University

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST.2011.3.1 COMPUTER SCIENCE TRIPOS Part IB Monday 6 June 2011 1.30 to 4.30 COMPUTER SCIENCE Paper 3 Answer five questions. Submit the answers in five separate bundles, each with its own cover sheet.

More information

Problem with Scanning an Infix Expression

Problem with Scanning an Infix Expression Operator Notation Consider the infix expression (X Y) + (W U), with parentheses added to make the evaluation order perfectly obvious. This is an arithmetic expression written in standard form, called infix

More information

Chapter 3: Operators, Expressions and Type Conversion

Chapter 3: Operators, Expressions and Type Conversion 101 Chapter 3 Operators, Expressions and Type Conversion Chapter 3: Operators, Expressions and Type Conversion Objectives To use basic arithmetic operators. To use increment and decrement operators. To

More information

Lecture Notes on Liveness Analysis

Lecture Notes on Liveness Analysis Lecture Notes on Liveness Analysis 15-411: Compiler Design Frank Pfenning André Platzer Lecture 4 1 Introduction We will see different kinds of program analyses in the course, most of them for the purpose

More information

Detection of Zeno Sets in Hybrid Systems to Validate Modelica Simulations

Detection of Zeno Sets in Hybrid Systems to Validate Modelica Simulations Bachelor Thesis Detection of Zeno Sets in Hybrid Systems to Validate Modelica Simulations Marcel Gehrke July 20, 2012 supervised by: Prof. Dr. Sibylle Schupp Technische Universität Hamburg-Harburg Institute

More information

2009 Haskell January Test Binary Decision Diagrams

2009 Haskell January Test Binary Decision Diagrams 009 Haskell January est Binary Decision Diagrams his test comprises four parts and the maximum mark is 5. Parts I, II and III are worth of the 5 marks available. he 009 Haskell Programming Prize will be

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

Violations of the contract are exceptions, and are usually handled by special language constructs. Design by contract

Violations of the contract are exceptions, and are usually handled by special language constructs. Design by contract Specification and validation [L&G Ch. 9] Design patterns are a useful way to describe program structure. They provide a guide as to how a program fits together. Another dimension is the responsibilities

More information

Lecture Notes on Arrays

Lecture Notes on Arrays Lecture Notes on Arrays 15-122: Principles of Imperative Computation July 2, 2013 1 Introduction So far we have seen how to process primitive data like integers in imperative programs. That is useful,

More information

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning Lecture 1 Contracts 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning In these notes we review contracts, which we use to collectively denote function contracts, loop invariants,

More information

Formal Verification. Lecture 7: Introduction to Binary Decision Diagrams (BDDs)

Formal Verification. Lecture 7: Introduction to Binary Decision Diagrams (BDDs) Formal Verification Lecture 7: Introduction to Binary Decision Diagrams (BDDs) Jacques Fleuriot jdf@inf.ac.uk Diagrams from Huth & Ryan, 2nd Ed. Recap Previously: CTL and LTL Model Checking algorithms

More information

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014 CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014 1 Introduction to Abstract Interpretation At this point in the course, we have looked at several aspects of programming languages: operational

More information

CS Bootcamp Boolean Logic Autumn 2015 A B A B T T T T F F F T F F F F T T T T F T F T T F F F

CS Bootcamp Boolean Logic Autumn 2015 A B A B T T T T F F F T F F F F T T T T F T F T T F F F 1 Logical Operations 1.1 And The and operator is a binary operator, denoted as, &,, or sometimes by just concatenating symbols, is true only if both parameters are true. A B A B F T F F F F The expression

More information

Static Program Analysis CS701

Static Program Analysis CS701 Static Program Analysis CS701 Thomas Reps [Based on notes taken by Aditya Venkataraman on Oct 6th, 2015] Abstract This lecture introduces the area of static program analysis. We introduce the topics to

More information

4 Fractional Dimension of Posets from Trees

4 Fractional Dimension of Posets from Trees 57 4 Fractional Dimension of Posets from Trees In this last chapter, we switch gears a little bit, and fractionalize the dimension of posets We start with a few simple definitions to develop the language

More information

Advanced Programming Methods. Introduction in program analysis

Advanced Programming Methods. Introduction in program analysis Advanced Programming Methods Introduction in program analysis What is Program Analysis? Very broad topic, but generally speaking, automated analysis of program behavior Program analysis is about developing

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

Crit-bit Trees. Adam Langley (Version )

Crit-bit Trees. Adam Langley (Version ) CRITBIT CWEB OUTPUT 1 Crit-bit Trees Adam Langley (agl@imperialviolet.org) (Version 20080926) 1. Introduction This code is taken from Dan Bernstein s qhasm and implements a binary crit-bit (alsa known

More information

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 The data of the problem is of 2GB and the hard disk is of 1GB capacity, to solve this problem we should Use better data structures

More information

3.7 Denotational Semantics

3.7 Denotational Semantics 3.7 Denotational Semantics Denotational semantics, also known as fixed-point semantics, associates to each programming language construct a well-defined and rigorously understood mathematical object. These

More information

18.3 Deleting a key from a B-tree

18.3 Deleting a key from a B-tree 18.3 Deleting a key from a B-tree B-TREE-DELETE deletes the key from the subtree rooted at We design it to guarantee that whenever it calls itself recursively on a node, the number of keys in is at least

More information

COS 320. Compiling Techniques

COS 320. Compiling Techniques Topic 5: Types COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer 1 Types: potential benefits (I) 2 For programmers: help to eliminate common programming mistakes, particularly

More information

CS4215 Programming Language Implementation. Martin Henz

CS4215 Programming Language Implementation. Martin Henz CS4215 Programming Language Implementation Martin Henz Thursday 15 March, 2012 2 Chapter 11 impl: A Simple Imperative Language 11.1 Introduction So far, we considered only languages, in which an identifier

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR

DLD VIDYA SAGAR P. potharajuvidyasagar.wordpress.com. Vignana Bharathi Institute of Technology UNIT 1 DLD P VIDYA SAGAR UNIT I Digital Systems: Binary Numbers, Octal, Hexa Decimal and other base numbers, Number base conversions, complements, signed binary numbers, Floating point number representation, binary codes, error

More information

Unit 4: Formal Verification

Unit 4: Formal Verification Course contents Unit 4: Formal Verification Logic synthesis basics Binary-decision diagram (BDD) Verification Logic optimization Technology mapping Readings Chapter 11 Unit 4 1 Logic Synthesis & Verification

More information

What Every Programmer Should Know About Floating-Point Arithmetic

What Every Programmer Should Know About Floating-Point Arithmetic What Every Programmer Should Know About Floating-Point Arithmetic Last updated: October 15, 2015 Contents 1 Why don t my numbers add up? 3 2 Basic Answers 3 2.1 Why don t my numbers, like 0.1 + 0.2 add

More information

Solutions to Midterm 2 - Monday, July 11th, 2009

Solutions to Midterm 2 - Monday, July 11th, 2009 Solutions to Midterm - Monday, July 11th, 009 CPSC30, Summer009. Instructor: Dr. Lior Malka. (liorma@cs.ubc.ca) 1. Dynamic programming. Let A be a set of n integers A 1,..., A n such that 1 A i n for each

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 10 Lecture No. # 16 Machine-Independent Optimizations Welcome to the

More information

Conditional Elimination through Code Duplication

Conditional Elimination through Code Duplication Conditional Elimination through Code Duplication Joachim Breitner May 27, 2011 We propose an optimizing transformation which reduces program runtime at the expense of program size by eliminating conditional

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Chapter S:V. V. Formal Properties of A*

Chapter S:V. V. Formal Properties of A* Chapter S:V V. Formal Properties of A* Properties of Search Space Graphs Auxiliary Concepts Roadmap Completeness of A* Admissibility of A* Efficiency of A* Monotone Heuristic Functions S:V-1 Formal Properties

More information

15.4 Longest common subsequence

15.4 Longest common subsequence 15.4 Longest common subsequence Biological applications often need to compare the DNA of two (or more) different organisms A strand of DNA consists of a string of molecules called bases, where the possible

More information

6. Relational Algebra (Part II)

6. Relational Algebra (Part II) 6. Relational Algebra (Part II) 6.1. Introduction In the previous chapter, we introduced relational algebra as a fundamental model of relational database manipulation. In particular, we defined and discussed

More information

Basic Properties The Definition of Catalan Numbers

Basic Properties The Definition of Catalan Numbers 1 Basic Properties 1.1. The Definition of Catalan Numbers There are many equivalent ways to define Catalan numbers. In fact, the main focus of this monograph is the myriad combinatorial interpretations

More information

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.1

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.1 Programming Languages and Compilers Qualifying Examination Monday, September 19, 2016 Answer 4 of 6 questions.1 GENERAL INSTRUCTIONS 1. Answer each question in a separate book. 2. Indicate on the cover

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

Lecture 2: Big-Step Semantics

Lecture 2: Big-Step Semantics Lecture 2: Big-Step Semantics 1 Representing Abstract Syntax These are examples of arithmetic expressions: 2 * 4 1 + 2 + 3 5 * 4 * 2 1 + 2 * 3 We all know how to evaluate these expressions in our heads.

More information

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007

ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 ALGORITHMS EXAMINATION Department of Computer Science New York University December 17, 2007 This examination is a three hour exam. All questions carry the same weight. Answer all of the following six questions.

More information

CPS122 Lecture: From Python to Java

CPS122 Lecture: From Python to Java Objectives: CPS122 Lecture: From Python to Java last revised January 7, 2013 1. To introduce the notion of a compiled language 2. To introduce the notions of data type and a statically typed language 3.

More information

Unit-II Programming and Problem Solving (BE1/4 CSE-2)

Unit-II Programming and Problem Solving (BE1/4 CSE-2) Unit-II Programming and Problem Solving (BE1/4 CSE-2) Problem Solving: Algorithm: It is a part of the plan for the computer program. An algorithm is an effective procedure for solving a problem in a finite

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

A Gentle Introduction to Program Analysis

A Gentle Introduction to Program Analysis A Gentle Introduction to Program Analysis Işıl Dillig University of Texas, Austin January 21, 2014 Programming Languages Mentoring Workshop 1 / 24 What is Program Analysis? Very broad topic, but generally

More information

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

CPS122 Lecture: From Python to Java last revised January 4, Objectives: Objectives: CPS122 Lecture: From Python to Java last revised January 4, 2017 1. To introduce the notion of a compiled language 2. To introduce the notions of data type and a statically typed language 3.

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

Principles of Program Analysis: A Sampler of Approaches

Principles of Program Analysis: A Sampler of Approaches Principles of Program Analysis: A Sampler of Approaches Transparencies based on Chapter 1 of the book: Flemming Nielson, Hanne Riis Nielson and Chris Hankin: Principles of Program Analysis Springer Verlag

More information

Lecture 5: Suffix Trees

Lecture 5: Suffix Trees Longest Common Substring Problem Lecture 5: Suffix Trees Given a text T = GGAGCTTAGAACT and a string P = ATTCGCTTAGCCTA, how do we find the longest common substring between them? Here the longest common

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013 Principles of Program Analysis Lecture 1 Harry Xu Spring 2013 An Imperfect World Software has bugs The northeast blackout of 2003, affected 10 million people in Ontario and 45 million in eight U.S. states

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Problem with Scanning an Infix Expression

Problem with Scanning an Infix Expression Operator Notation Consider the infix expression (X Y) + (W U), with parentheses added to make the evaluation order perfectly obvious. This is an arithmetic expression written in standard form, called infix

More information

Job-shop scheduling with limited capacity buffers

Job-shop scheduling with limited capacity buffers Job-shop scheduling with limited capacity buffers Peter Brucker, Silvia Heitmann University of Osnabrück, Department of Mathematics/Informatics Albrechtstr. 28, D-49069 Osnabrück, Germany {peter,sheitman}@mathematik.uni-osnabrueck.de

More information

Compiler Construction 2016/2017 Loop Optimizations

Compiler Construction 2016/2017 Loop Optimizations Compiler Construction 2016/2017 Loop Optimizations Peter Thiemann January 16, 2017 Outline 1 Loops 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6 Loop Unrolling

More information

Crit-bit Trees. Adam Langley (Version )

Crit-bit Trees. Adam Langley (Version ) Crit-bit Trees Adam Langley (agl@imperialviolet.org) (Version 20080926) 1. Introduction This code is taken from Dan Bernstein s qhasm and implements a binary crit-bit (alsa known as PATRICA) tree for NUL

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS C H A P T E R 6 DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS OUTLINE 6- Binary Addition 6-2 Representing Signed Numbers 6-3 Addition in the 2 s- Complement System 6-4 Subtraction in the 2 s- Complement

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm

Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Computing intersections in a set of line segments: the Bentley-Ottmann algorithm Michiel Smid October 14, 2003 1 Introduction In these notes, we introduce a powerful technique for solving geometric problems.

More information

SDD Advanced-User Manual Version 1.1

SDD Advanced-User Manual Version 1.1 SDD Advanced-User Manual Version 1.1 Arthur Choi and Adnan Darwiche Automated Reasoning Group Computer Science Department University of California, Los Angeles Email: sdd@cs.ucla.edu Download: http://reasoning.cs.ucla.edu/sdd

More information

Number Systems CHAPTER Positional Number Systems

Number Systems CHAPTER Positional Number Systems CHAPTER 2 Number Systems Inside computers, information is encoded as patterns of bits because it is easy to construct electronic circuits that exhibit the two alternative states, 0 and 1. The meaning of

More information

Modal Logic: Implications for Design of a Language for Distributed Computation p.1/53

Modal Logic: Implications for Design of a Language for Distributed Computation p.1/53 Modal Logic: Implications for Design of a Language for Distributed Computation Jonathan Moody (with Frank Pfenning) Department of Computer Science Carnegie Mellon University Modal Logic: Implications for

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

Lecture Notes: Widening Operators and Collecting Semantics

Lecture Notes: Widening Operators and Collecting Semantics Lecture Notes: Widening Operators and Collecting Semantics 15-819O: Program Analysis (Spring 2016) Claire Le Goues clegoues@cs.cmu.edu 1 A Collecting Semantics for Reaching Definitions The approach to

More information

Compiler Construction 2010/2011 Loop Optimizations

Compiler Construction 2010/2011 Loop Optimizations Compiler Construction 2010/2011 Loop Optimizations Peter Thiemann January 25, 2011 Outline 1 Loop Optimizations 2 Dominators 3 Loop-Invariant Computations 4 Induction Variables 5 Array-Bounds Checks 6

More information

Randomized Jumplists With Several Jump Pointers. Elisabeth Neumann

Randomized Jumplists With Several Jump Pointers. Elisabeth Neumann Randomized Jumplists With Several Jump Pointers Elisabeth Neumann Bachelor Thesis Randomized Jumplists With Several Jump Pointers submitted by Elisabeth Neumann 31. March 2015 TU Kaiserslautern Fachbereich

More information

Greedy Algorithms CHAPTER 16

Greedy Algorithms CHAPTER 16 CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Formal semantics of loosely typed languages. Joep Verkoelen Vincent Driessen

Formal semantics of loosely typed languages. Joep Verkoelen Vincent Driessen Formal semantics of loosely typed languages Joep Verkoelen Vincent Driessen June, 2004 ii Contents 1 Introduction 3 2 Syntax 5 2.1 Formalities.............................. 5 2.2 Example language LooselyWhile.................

More information

Lecture 24 Search in Graphs

Lecture 24 Search in Graphs Lecture 24 Search in Graphs 15-122: Principles of Imperative Computation (Fall 2018) Frank Pfenning, André Platzer, Rob Simmons, Penny Anderson, Iliano Cervesato In this lecture, we will discuss the question

More information

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = (

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = ( Floating Point Numbers in Java by Michael L. Overton Virtually all modern computers follow the IEEE 2 floating point standard in their representation of floating point numbers. The Java programming language

More information

Final Labs and Tutors

Final Labs and Tutors ICT106 Fundamentals of Computer Systems - Topic 2 REPRESENTATION AND STORAGE OF INFORMATION Reading: Linux Assembly Programming Language, Ch 2.4-2.9 and 3.6-3.8 Final Labs and Tutors Venue and time South

More information

CS 275 Automata and Formal Language Theory. First Problem of URMs. (a) Definition of the Turing Machine. III.3 (a) Definition of the Turing Machine

CS 275 Automata and Formal Language Theory. First Problem of URMs. (a) Definition of the Turing Machine. III.3 (a) Definition of the Turing Machine CS 275 Automata and Formal Language Theory Course Notes Part III: Limits of Computation Chapt. III.3: Turing Machines Anton Setzer http://www.cs.swan.ac.uk/ csetzer/lectures/ automataformallanguage/13/index.html

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

We will give examples for each of the following commonly used algorithm design techniques:

We will give examples for each of the following commonly used algorithm design techniques: Review This set of notes provides a quick review about what should have been learned in the prerequisite courses. The review is helpful to those who have come from a different background; or to those who

More information

γ 2 γ 3 γ 1 R 2 (b) a bounded Yin set (a) an unbounded Yin set

γ 2 γ 3 γ 1 R 2 (b) a bounded Yin set (a) an unbounded Yin set γ 1 γ 3 γ γ 3 γ γ 1 R (a) an unbounded Yin set (b) a bounded Yin set Fig..1: Jordan curve representation of a connected Yin set M R. A shaded region represents M and the dashed curves its boundary M that

More information

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path. Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the

More information

EDAA40 At home exercises 1

EDAA40 At home exercises 1 EDAA40 At home exercises 1 1. Given, with as always the natural numbers starting at 1, let us define the following sets (with iff ): Give the number of elements in these sets as follows: 1. 23 2. 6 3.

More information

Excerpt from: Stephen H. Unger, The Essence of Logic Circuits, Second Ed., Wiley, 1997

Excerpt from: Stephen H. Unger, The Essence of Logic Circuits, Second Ed., Wiley, 1997 Excerpt from: Stephen H. Unger, The Essence of Logic Circuits, Second Ed., Wiley, 1997 APPENDIX A.1 Number systems and codes Since ten-fingered humans are addicted to the decimal system, and since computers

More information

Generalized Network Flow Programming

Generalized Network Flow Programming Appendix C Page Generalized Network Flow Programming This chapter adapts the bounded variable primal simplex method to the generalized minimum cost flow problem. Generalized networks are far more useful

More information

Computer Science Technical Report

Computer Science Technical Report Computer Science Technical Report Feasibility of Stepwise Addition of Multitolerance to High Atomicity Programs Ali Ebnenasir and Sandeep S. Kulkarni Michigan Technological University Computer Science

More information

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Semantics via Syntax. f (4) = if define f (x) =2 x + 55. 1 Semantics via Syntax The specification of a programming language starts with its syntax. As every programmer knows, the syntax of a language comes in the shape of a variant of a BNF (Backus-Naur Form)

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Introduction to Visual Basic and Visual C++ Arithmetic Expression. Arithmetic Expression. Using Arithmetic Expression. Lesson 4.

Introduction to Visual Basic and Visual C++ Arithmetic Expression. Arithmetic Expression. Using Arithmetic Expression. Lesson 4. Introduction to Visual Basic and Visual C++ Arithmetic Expression Lesson 4 Calculation I154-1-A A @ Peter Lo 2010 1 I154-1-A A @ Peter Lo 2010 2 Arithmetic Expression Using Arithmetic Expression Calculations

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Week 02 Module 06 Lecture - 14 Merge Sort: Analysis Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Week 02 Module 06 Lecture - 14 Merge Sort: Analysis So, we have seen how to use a divide and conquer strategy, we

More information

Model Checking I Binary Decision Diagrams

Model Checking I Binary Decision Diagrams /42 Model Checking I Binary Decision Diagrams Edmund M. Clarke, Jr. School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 2/42 Binary Decision Diagrams Ordered binary decision diagrams

More information

CONSECUTIVE INTEGERS AND THE COLLATZ CONJECTURE. Marcus Elia Department of Mathematics, SUNY Geneseo, Geneseo, NY

CONSECUTIVE INTEGERS AND THE COLLATZ CONJECTURE. Marcus Elia Department of Mathematics, SUNY Geneseo, Geneseo, NY CONSECUTIVE INTEGERS AND THE COLLATZ CONJECTURE Marcus Elia Department of Mathematics, SUNY Geneseo, Geneseo, NY mse1@geneseo.edu Amanda Tucker Department of Mathematics, University of Rochester, Rochester,

More information