Verifying C Programs: A VCC Tutorial

Size: px

Start display at page:

Download "Verifying C Programs: A VCC Tutorial"

Abigail Weaver
5 years ago
Views:

1 Verifying C Programs: A VCC Tutorial Working draft, version 0.2, May 8, 2012 Ernie Cohen, Mark A. Hillebrand, Stephan Tobies European Microsoft Innovation Center ecohen,mahilleb,stobies@microsoft.com Michał Moskal, Wolfram Schulte Microsoft Research Redmond micmo,schulte@microsoft.com Abstract VCC is a verification environment for software written in C. VCC takes a program (annotated with function contracts, state assertions, and type invariants) and attempts to prove that these annotations are correct, i.e. that they hold for every possible program execution. The environment includes tools for monitoring proof attempts and constructing partial counterexample executions for failed proofs. VCC handles fine-grained concurrency and low-level C features, and has been used to verify the functional correctness of tens of thousands of lines of commercial concurrent system code. This tutorial describes how to use VCC to verify C code. It covers the annotation language, the verification methodology, and the use of VCC itself. 1. Introduction This tutorial is an introduction to verifying C code with VCC. Our primary audience is programmers who want to write correct code. The only prerequisite is a working knowledge of C. To use VCC, you first annotate your code to specify what your code does (e.g., that your sorting function sorts its input), and why it works (e.g., suitable invariants for your loops and data structures). VCC then tries to prove (mathematically) that your program meets these specifications. Unlike most program analyzers, VCC doesn t look for bugs, or analyze an abstraction of your program; if VCC certifies that your program is correct, then your program really is correct 1. To check your program, VCC uses the deductive verification paradigm: it generates a number of mathematical statements (called verification conditions), the validity of which suffice to guarantee the program s correctness, and tries to prove these statements using an automatic theorem prover. If any of these proofs fail, VCC reflects these failures back to you in terms of the program itself (as opposed to the formulas seen by the theorem prover). Thus, you normally interact with VCC entirely at the level of code and program states; you can usually ignore the mathematical reasoning going on under the hood. For example, if your program uses division somewhere, and VCC is unable to prove that the divisor is nonzero, it will report this to you as a (potential) program error at that point in the program. This doesn t mean that your program is 1 In reality, this isn t necessarily true, for two reasons. First, VCC itself might have bugs; in practice, these are unlikely to cause you to accidentally verify an incorrect program, unless you find and intentionally exploit such a bug. Second, there are a few checks needed for soundness that haven t yet been added to VCC, such as checking that ghost code terminates; these issues are listed in section??. necessarily incorrect; most of the time, especially when verifying code that is already well-tested, it is because you haven t provided enough information to allow VCC to deduce that the suspected error doesn t occur. (For example, you might have failed to specify that some function parameter is required to be nonzero.) Typically, you fix this error by strengthening your annotations. This might lead to other error reports, forcing you to add additional annotations, so verification is in practice an iterative process. Sometimes, this process will reveal a genuine programming error. But even if it doesn t, you will have not only proved your code free from such errors, but you will have produced the precise specifications for your code a very useful form of documentation. This tutorial covers basics of VCC annotation language. By the time you have finished working through it, you should be able to use VCC to verify some nontrivial programs. It doesn t cover the theoretical background of VCC [2], implementation details [1] or advanced topics; information on these can be found on the VCC homepage 2. The examples in this tutorial are currently distributed with the VCC sources. 3 You can use VCC either from the command line or from Visual Studio 2008/2010 (VS); the VS interface offers easy access to different components of VCC tool chain and is thus generally recommended. VCC can be downloaded from the VCC homepage; be sure to read the installation instructions 4, which provide important information about installation prerequisites and how to set up tool paths. 2. Verifying Simple Programs We begin by describing how VCC verifies simple programs sequential programs without loops, function calls, or concurrency. This might seem to be a trivial subset of C, but in fact VCC reasons about more complex programs by reducing them to reasoning about simple programs. 2.1 Assertions Let s begin with a simple example: #include <vcc.h> int main() Available from list/changesets: click Download on the right, get the zip file and navigate to vcc/docs/tutorial/c. 4 VCC Tutorial (working draft, ver. 0.2) /5/8

2 int x,y,z; if (x <= y) z = x; else z = y; _(assert z <= x) return 0; This program sets z to the minimum of x and y. In addition to the ordinary C code, this program includes an annotation, starting with _(, terminating with a closing parenthesis, with balanced parentheses inside. The first identifier after the opening parenthesis (in the program above it s assert) is called an annotation tag and identifies the type of annotation provided (and hence its function). (The tag plays this role only at the beginning of an annotation; elsewhere, it is treated as an ordinary program identifier.) Annotations are used only by VCC, and are not seen by the C compiler. When using the regular C compiler the <vcc.h> header file defines: #define _(...) / nothing / VCC does not use this definition, and instead parses the inside of _(... ) annotations. An annotation of the form _(assert E), called an assertion, asks VCC to prove that E is guaranteed to hold (i.e., evaluate to a value other than 0) whenever control reaches the assertion. 5 Thus, the line _(assert z <= x) says that when control reaches the assertion, z is no larger than x. If VCC successfully verifies a program, it promises that this will hold throughout every possible execution of the program, regardless of inputs, how concurrent threads are scheduled, etc. More generally, VCC might verify some, but not all of your assertions; in this case, VCC promises that the first assertion to be violated will not be one of the verified ones. It is instructive to compare _(assert E) with the macro assert(e) (defined in <assert.h>), which evaluates E at runtime and aborts execution if E doesn t hold. First, assert(e) requires runtime overhead (at least in builds where the check is made), whereas _(assert E) does not. Second, assert(e) will catch failure of the assertion only when it actually fails in an execution, whereas _(assert E) is guaranteed to catch the failure if it is possible in any execution. Third, because _(assert E) is not actually executed, E can include unimplementable mathematical operations, such as quantification over infinite domains. To verify the function using VCC from the command line, save the source in a file called minimum.c and run VCC as follows: C:\Somewhere\VCC Tutorial> vcc.exe minimum.c Verification of main succeeded. C:\Somewhere\VCC Tutorial> If instead you wish to use VCC Visual Studio plugin, load the solution VCCTutorial.sln [TODO: make sure that VCC can verify code not in the project] locate the file with the example, and rightclick on the program text. You should get options to verify the file or just this function (either will work). If you right-click within a C source file, several VCC commands are made available, depending on what kind of construction IntelliSense thinks you are in the middle of. The choice of verifying the entire file is always available. If you click within the definition of a struct type, VCC will offer you the choice of checking admissibility for that type (a concept explained in 6.5). If you click within the body of a function, VCC should offer you the opportunity to 5 This interpretation changes slightly if E refers to memory locations that might be concurrently modified by other threads; see 8 verify just that function. However, IntelliSense often gets confused about the syntactic structure of VCC code, so it might not always present these context-dependent options. However, if you select the name of a function and then right click, it will allow you to verify just that function. VCC verifies this function successfully, which means that its assertions are indeed guaranteed to hold and that the program cannot crash. 6 If VCC is unable to prove an assertion, it reports an error. Try changing the assertion in this program to something that isn t true and see what happens. (You might also want to try coming up with some complex assertion that is true, just to see whether VCC is smart enough to prove it.) To understand how VCC works, and to use it successfully, it is useful to think in terms of what VCC knows at each control point of your program. In the current example, just before the first conditional, VCC knows nothing about the local variables, since they can initially hold any value. Just before the first assignment, VCC knows that x <= y (because execution followed that branch of the conditional), and after the assignment, VCC also knows that z == x, so it knows that z <= x. Similarly, in the else branch, VCC knows that y < x (because execution didn t follow the if branch), and after the assignment to z, it also knows that z == y, so it also knows z <= x. Since z <= x is known to hold at the end of each branch of the conditional, it is known to hold at the end of the conditional, so the assertion is known to hold when control reaches it. In general, VCC doesn t lose any information when reasoning about assignments and conditionals. However, we will see that VCC may lose loses some information when reasoning about loops, necessitating additional annotations. When we talk about what VCC knows, we mean what it knows in an ideal sense, where if it knows E, it also knows any logical consequence of E. In such a world, adding an assertion that succeeds would have no effect on whether later assertions succeed. VCC s ability to deduce consequences is indeed complete for many types of formulas (e.g. formulas that use only equalities, inequalities, addition, subtraction, multiplication by constants, and boolean connectives), but not for all formulas, so VCC will sometimes fail to prove an assertion, even though it knows enough to prove it. Conversely, an assertion that succeeds can sometimes cause later assertions that would otherwise fail to succeed, by drawing VCC s attention to a crucial fact it needs to prove the later assertion. This is relatively rare, and typically involves nonlinear arithmetic (e.g. where variables are multiplied together), bitvector reasoning ( D) or quantifiers. When VCC surprises you by failing to verify something that you think it should be able to verify, it is usually because it doesn t know something you think it should know. An assertion provides one way to check whether VCC knows what you think it knows. Exercises 1. Can the assertion at the end of the example function be made stronger? What is the strongest valid assertion one could write? Try verifying the program with your stronger assertion. 2. Write an assertion that says that the int x is the average of the ints y and z. 3. Modify the example program so that it sets x and y to values that differ by at most 1 and sum to z. Prove that your program does this correctly. 4. Write an assertion that says that the int x is a perfect square (i.e., a number being a square of an integer). 6 VCC currently doesn t check that your program doesn t run forever or run out of stack space, but future versions will, at least for sequential programs. VCC Tutorial (working draft, ver. 0.2) /5/8

3 5. Write an assertion that says that the int x occurs in the int array b[10]. 6. Write an assertion that says that the int array b, of length N, contains no duplicates. 7. Write an assertion that says that all pairs of adjacent elements of the int array b of length N differ by at most Write an assertion that says that an array b of length N contains only perfect squares. 2.2 Assumptions You can add to what VCC knows at a particular point with a second type of annotation, called an assumption. The assumption _(assume E) tells VCC to ignore the rest of an execution if E fails to hold (i.e., if E evaluates to 0). Reasoning-wise, the assumption simply adds E to what VCC knows for subsequent reasoning. For example: int x, y; _(assume x!= 0) y = 100 / x; Without the assumption, VCC would complain about possible division by zero. (VCC checks for division by zero because it would cause the program to crash.) Assuming the assumption, this error cannot happen. Since assumptions (like all annotations) are not seen by the compiler, assumption failure won t cause the program to stop, and subsequent assertions might be violated. To put it another way, if VCC verifies a program, it guarantees that in any prefix of an execution where all (user-provided) assumptions hold, all assertions will also hold. Thus, your verification goal should be to eliminate as many assumptions as possible (preferably all of them). Although assumptions are generally to be avoided, they are nevertheless sometimes useful: (i) In an incomplete verification, assumptions can be used to mark the knowledge that VCC is missing, and to coordinate further verification work (possibly performed by other people). If you follow a discipline of keeping your code in a state where the whole program verifies, the verification state can be judged by browsing the code (without having to run the verifier). (ii) When debugging a failed verification, you can use assumptions to narrow down the failed verification to a more specific failure scenario, perhaps even to a complete counterexample. (iii) Sometimes you want to assume something even though VCC can verify it, just to stop VCC from spending time proving it. For example, assuming \false allows VCC to easily prove subsequent assertions, thereby focussing its attention on other parts of the code. Temporarily adding assumptions is a common tactic when developing annotations for complex functions. (iv) Sometimes you want to make assumptions about the operating environment of a program. For example, you might want to assume that a 64-bit counter doesn t overflow, but don t want to justify it formally because it depends on extra-logical assumptions (like the speed of the hardware or the lifetime of the software). (v) Assumptions provide a useful tool in explaining how VCC reasons about your program. We ll see examples of this throughout this tutorial. An assertion can also change what VCC knows after the assertion, if the assertion fails to verify: although VCC will report the failure as an error, it will assume the asserted fact holds afterward. For example, in the following VCC will only report an error for the first assumption and not the second: int x; _(assert x == 1) _(assert x > 0) Exercises 1. In the following program fragment, which assertions will fail? int x,y; _(assert x > 5) _(assert x > 3) _(assert x < 2) _(assert y < 3) 2. Is there any difference between _(assume p) _(assume q) and _(assume q) _(assume p)? What if the assumptions are replaced by assertions? 3. Is there any difference between _(assume p) _(assert q) and _(assert (!p) (q))? 3. Function Contracts Next we turn to the specification of functions. We ll take the example from the previous section, and pull the computation of the minimum of two numbers out into a separate function: #include <vcc.h> int min(int a, int b) if (a <= b) return a; else return b; int main() int x, y, z; z = min(x, y); _(assert z <= x) return 0; Verification of min succeeded. Verification of main failed. testcase(15,12) : error VC9500: Assertion 'z <= x' did not verify. (The listing above presents both the source code and the output of VCC, typeset in a different fonts, and the actual file name of the example is replaced with testcase.) VCC failed to prove our assertion, even though it s easy to see that it always holds. This is because verification in VCC is modular: VCC doesn t look inside the body of a function (such as the definition of min()) when understanding the effect of a call to the function (such as the call from main()); all it knows about the effect of calling min() is that VCC Tutorial (working draft, ver. 0.2) /5/8

4 the call satisfies the specification of min(). Since the correctness of main() clearly depends on what min() does, we need to specify min() in order to verify main(). The specification of a function is sometimes called its contract, because it gives obligations on both the function and its callers. It is provided by three types of annotations: A requirement on the caller (sometimes called the precondition of the function) takes the form _(requires E), where E is an expression. It says that callers of the function promise that E will hold on function entry. A requirement on the function (sometimes called a postcondition of the function) takes the form _(ensures E), where E is an expression. It says that the function promises that E holds just before control exits the function. The third type of contract annotation, a writes clause, is described in the next section. In this example, the lack of writes clauses says that min() has no side effects that are visible to its caller. For example, we can provide a suitable specification for min() as follows: #include <vcc.h> int min(int a, int b) _(requires \true) _(ensures \result <= a && \result <= b) if (a <= b) return a; else return b; //... definition of main() unchanged... Verification of min succeeded. Verification of main succeeded. (Note that the specification of the function comes after the header and before the function body; you can also put specifications on function declarations (e.g., in header files).) The precondition _(requires \true) of min() doesn t really say anything (since \true holds in every state), but is included to emphasize that the function can be called from any state and with arbitrary parameter values. The postcondition states that the value returned from min() is no bigger than either of the inputs. Note that \true and \result are spelled with a backslash to avoid name clashes with C identifiers. 7 VCC uses function specifications as follows. When verifying the body of a function, VCC implicitly assumes each precondition of the function on function entry, and implicitly asserts each postcondition of the function (with \result bound to the return value and each parameter bound to its value on function entry) just before the function returns. For every call to the function, VCC replaces the call with an assertion of the preconditions of the function, sets the return value to an arbitrary value, and finally assumes each postcondition of the function. For example, VCC translates the program above roughly as follows: #include <vcc.h> int min(int a, int b) int \result; 7 All VCC keywords start with a backslash; this contrasts with annotation tags (like requires), which are only used at the beginning of annotation and therefore cannot be confused with C identifiers (and thus you are still free to have, e.g., a function called requires or assert). // assume precondition of min(a,b) _(assume \true) if (a <= b) \result = a; else \result = b; // assert postcondition of min(a,b) _(assert \result <= a && \result <= b) int main() int \result; // assume precondition of main() _(assume \true) int x, y, z; // z = min(x,y); int _res; // placeholder for the result of min() // assert precondition of min(x,y) _(assert \true) // assume postcondition of min(x,y) _(assume _res <= x && _res <= y) z = _res; // store the result of min() _(assert z <= x) \result = 0; // assert postcondition of main() _(assert \true) Note that the assumptions above are harmless, that is in a fully verified program they will be never violated, as each follows from the assertion that proceeds it in an execution 8 For example, the assumption generated by a precondition could fail only if the assertion generated from that same precondition before it fails. Why modular verification? Modular verification brings several benefits. First, it allows verification to more easily scale to large programs. Second, by providing a precise interface between caller and callee, it allows you to modify the implementation of a function like min() without having to worry about breaking the verifications of functions that call it (as long as you don t change the specification of min()). This is especially important because these callers normally aren t in scope, and the person maintaining min() might not even know about them (e.g., if min() is in a library). Third, you can verify a function like main() even if the implementation of min() is unavailable (e.g., if it hasn t been written yet). Exercises 1. Try replacing the < in the return statement of min() with >. Before running VCC, can you guess which parts of the verification will fail? 2. What is the effect of giving a function the specification _(requires \false)? How does it effect verification of the function itself? What about its callers? Can you think of a good reason to use such a specification? 3. Can you see anything wrong with the above specification of min()? Can you give a simpler implementation than the one presented? Is this specification strong enough to be useful? If not, how might it be strengthened to make it more useful? 4. Specify a function that returns the (int) square root of its (int) argument. (You can try writing an implementation for the func- 8 A more detailed explanation of why this translation is sound is given in section??. VCC Tutorial (working draft, ver. 0.2) /5/8

5 tion, but won t be able to verify it until you ve learned about loop invariants.) 5. Can you think of useful functions in a real program that might legitimately guarantee only the trivial postcondition _(ensures \true)? 3.1 Side Effects [TODO: move writes clauses in here, use copy example] 3.2 Reading and Writing Memory [TODO: add a note about stack variables] When programming in C, it is important to distinguish two kinds of memory access. Sequential access, which is the default, is appropriate when interference from other threads (or the outside world) is not an issue, e.g., when accessing unshared memory. Sequential accesses can be safely optimized by the compiler by breaking it up into multiple operations, caching reads, reordering operations, and so on. Atomic access is required when the access might race with other threads, i.e., write to memory that is concurrently read or written, or a read to memory that is concurrently written. Atomic access is typically indicated in C by accessing memory through a volatile type (or through atomic compiler intrinsics). We consider only sequential access for now; we consider atomic access in section 8. To access a memory object, the object must reside at a valid memory address 9. (For example, on typical hardware, its virtual address must be suitably aligned, must be mapped to existing physical memory, and so on.) In addition, to safely access memory sequentially, the memory must not be concurrently written by other threads (including hardware and devices); in VCC, this condition is written \thread_local(p), where p is a pointer to the memory object. VCC asserts this before any sequential memory access to the object. The three simple ways of specifying this are as follows: [TODO: There are some problems here. First, we have unnecessarily tangled up the classification of memory with the specification of functions by saying that they have to be in preconditions; they should instead be presented as assertions. Second, as far as i can find, we never define thread locality anywhere, and define mutability only in passing; we should give at least intuitive meanings here. Third, saying that a pointer is considered thread-local when you specify that it is thread-local just sounds vacuously stupid.-e ] A pointer p is considered thread-local when you specify: _(requires \thread_local(p)) When reading p (without additional annotations) you will need to prove \thread_local(p). A pointer p is considered mutable when you specify: _(requires \mutable(p)) All mutable pointers are also thread local. Writing via pointers different than p cannot make p non-mutable. A pointer p is considered writable when you specify: [TODO: We should introduce writability as a predicate.] _(writes p) Additionally, freshly allocated pointers are are also writable. All writable pointers are also mutable. To write through p you will need to prove that the pointer is writable. Deallocating p 9 VCC actually uses a somewhat stronger validity condition because of its typed view of memory??. requires that it is writable, and afterwards the memory is not even thread-local anymore. If VCC doesn t know why an object is thread local, then it has hard time proving that the object stays thread local after an operation with side effects (e.g., a function call). Thus, in preconditions you will sometimes want to use \mutable(p) instead of \thread_local(p). The precise definitions of mutability and thread locality is given in 6.2, where we also describe another form of guaranteeing thread locality through so called ownership domains. The NULL pointer, pointers outside bounds of arrays, the memory protected by the operating system, or outside the address space are never thread local (and thus also never mutable nor writable). Let s have a look at an example: void write(int p) _(writes p) p = 42; void write_wrong(int p) _(requires \mutable(p)) p = 42; int read1(int p) _(requires \thread_local(p)) return p; int read2(int p) _(requires \mutable(p)) return p; int read_wrong(int p) return p; void test_them() int a = 3, b = 7; read1(&a); _(assert a == 3 && b == 7) // OK read2(&a); _(assert a == 3 && b == 7) // OK write(&a); _(assert b == 7) // OK _(assert a == 3) // error Verification of write succeeded. Verification of write_wrong failed. testcase(10,4) : error VC8507: Assertion 'p is writable' did not verify. Verification of read1 succeeded. Verification of read2 succeeded. Verification of read_wrong failed. testcase(21,11) : error VC8512: Assertion 'p is thread local' did not verify. Verification of test_them failed. testcase(32,12) : error VC9500: Assertion 'a == 3' did not verify. The function write_wrong fails because p is only mutable, and not writable. In read_wrong VCC complains that it does not know anything about p (maybe it s NULL, who knows), in particular it doesn t know it s thread-local. read2 is fine because \mutable is stronger than \thread_local. Finally, in test_them the first three assertions succeed because if something is not listed in the writes clause of the called function it cannot change. The last assertion fails, because write() listed &a in its writes clause. VCC Tutorial (working draft, ver. 0.2) /5/8

6 Intuitively, the clause _(writes p, q) says that, of the memory objects that are thread-local to the caller before the call, the function is going to modify only the object pointed to by p and the object pointed to by q. In other words, it is roughly equivalent to a postcondition that ensures that all other objects thread-local to the caller prior to the call remain unchanged. VCC allows you to write multiple writes clauses, and implicitly combines them into a single set. If a function spec contains no writes clauses, it is equivalent to specifying a writes clause with empty set of pointers. Here is a simple example of a function that visibly reads and writes memory; it simply copies data from one location to another. #include <vcc.h> void copy(int from, int to) _(requires \thread_local(from)) _(writes to) _(ensures to == \old( from)) to = from; int z; void main() _(writes &z) int x,y; copy(&x,&y); copy(&y,&z); _(assert x==y && y==z) Verification of copy succeeded. Verification of main succeeded. In the postcondition the expression \old(e) returns the value the expression E had on function entry. Thus, our postcondition states that the new value of to equals the value from had on call entry. VCC translates the function call copy(&x,&y) approximately as follows: _(assert \thread_local(&x)) _(assert \mutable(&y)) // record the value of x int _old_x = x; // havoc the written variables havoc(y); // assume the postcondition _(assume y == _old_x) 3.3 Arrays [TODO: should the inline expressions below be displayed like the writes clauses?] Array accesses are a kind of pointer accesses. Thus, before allowing you to read an element of an array VCC checks if it s thread-local. Usually you want to specify that all elements of an array are thread-local, which is done using the expression \thread_local_array(ar, sz). It is essentially a syntactic sugar for \forall unsigned i; i < sz ==> \thread_local(&ar[i]). The annotation \mutable_array() is analogous. To specify that an array is writable use: _(writes \array_range(ar, sz)) which is roughly a syntactic sugar for: _(writes &ar[0], &ar[1],..., &ar[sz 1]) For example, the function below is recursive implementation of the C standard library memcpy() function: void my_memcpy(unsigned char dst, unsigned char src, unsigned len) _(writes \array_range(dst, len)) _(requires \thread_local_array(src, len)) _(requires \arrays_disjoint(src, len, dst, len)) _(ensures \forall unsigned i; i < len ==> dst[i] == \old(src[i])) if (len > 0) dst[len 1] = src[len 1]; my_memcpy(dst, src, len 1); It requires that array src is thread-local, dst is writable, and they do not overlap. It ensures that, at all indices, dst has the value src. The next section presents a more conventional implementation using a loop. [TODO: Should we talk about termination here?] 3.4 Logic functions [TODO: These should probably be put into an appendix, or just left out altogeather in favor of pure functions. It certainly could be delayed, right?] Just like with programs, as the specifications get more complex, it s good to structure them somewhat. One mechanism provided by VCC for that is logic functions. They essentially work as macros for pieces of specification, but prevent name capture and give better error messages. For example: _(logic bool sorted(int arr, unsigned len) = \forall unsigned i, j; i <= j && j < len ==> arr[i] <= arr[j]) A partial spec for a sorting routine could look like the following: 10 void sort(int arr, unsigned len) _(writes \array_range(arr, len)) _(ensures sorted(arr, len)) Logic functions are not executable, but they can be implemented: int check_sorted(int arr, unsigned len) _(requires \thread_local_array(arr, len)) _(ensures \result == sorted(arr, len)) if (len <= 1) return 1; if (!(arr[len 2] <= arr[len 1])) return 0; return check_sorted(arr, len 1); A logic function and its C implementation can be combined into one using _(pure) annotation. This is covered in E.1. [TODO: This isn t right, is it? Logic functions really act as macros, e.g. they cannot be used in triggers.] 1. Write and verify a program that checks whether two sorted arrays contain a common element. 2. Write and verify a program for binary search (a program that checks whether a sorted array contains a given value). 3. Write and verify a program that takes a 2-dimensional array of ints in which every row and column is sorted, and checks whether a given int occurs in the array. Exercises In the following exercises,all implementations should be recursive. You should return to these and write/verify iterative implementations after reading section??. 10 We will take care about input being permutation of the output in 7. VCC Tutorial (working draft, ver. 0.2) /5/8

7 1. Could the postcondition of copy have been written equivalently as the simpler to == from? If not, why not? 2. Specify and verify a program that takes two arrays and checks whether the arrays are equal (i.e., whether they contain the same sequence of elements). 3. Specify and verify a program that takes an array and checks whether it contains any duplicate elements. Verify a recursive implementation. 4. Specify and verify a program that checks whether a given array is a palindrome. 5. Specify and verify a program that checks whether two arrays contain a common element. 6. Specify and verify a function that swaps the values pointed to by two int pointers. Hint: use \old(...) in the postcondition. 7. Specify and verify a function that takes a text (stored in an array) and a string (stored in an array) and checks whether the string occurs within the text. 8. Extend the specification of sorting to guarantee that the sort is stable (i.e., specify that it shouldn t change the array if it is already sorted). 9. Extend the specification of sorting to guarantee that the sorting doesn t change the set of numbers occurring in the array (though it might change their multiplicities). 10. Write, without using the % operator, logic functions that specify (1) that one number (unsigned int) divides another evenly, (2) that a number is prime, (3) that two numbers are relatively prime, and (4) the greatest common divisor of two numbers. [TODO: if this section remains after the functions section, move the exercises from there to here, since they require quantification.] 4. Arithmetic and Quantifiers [TODO: move arithmetic stuff appendix in, as well as mathint and a mention of maps] [TODO: add appendix section gathering together all annotations and extensions to C] VCC provides a number of C extensions that can be used within VCC annotations (such as assertions): The Boolean operator ==> denotes logical implication; formally, P ==> Q means ((!P) (Q)), and is usually read as P implies Q. Because ==> has lower precedence than the C operators, it is typically not necessary to parenthesize P or Q. The expression \forall T v; E evaluates to 1 if the expression E evaluates to a nonzero value for every value v of type T. For example, the assertion _(assert x > 1 && \forall int i; 1 < i && i < x ==> x % i!= 0) checks that (int) x is a prime number. If b is an int array of size N, then _(assert \forall int i; \forall int j; 0 <= i && i <= j && j < N ==> b[i] <= b[j]) checks that b is sorted. Similarly, the expression \exists T v; E evaluates to 1 if there is some value of v of type T for which E evaluates to a nonzero value. For example, if b is an int array of size N, the assertion _(assert \exists int i; 0 <= i && i < N && b[i] == 0) asserts that b contains a zero element. \forall and \exists are jointly referred to as quantifiers. VCC also provides some mathematical types that cannot be used in ordinary C code (because they are too big to fit in memory); these include mathematical (unbounded) integers and (possibly infinite) maps. They are described in??. Expressions within VCC annotations are restricted in their use of functions: you can only use functions that are proved to be pure, i.e., free from side effects ( E.1). 5. Loop invariants For the most part, what VCC knows at a control point can be computed from what it knew at the immediately preceding control points. But when the control flow contains a loop, VCC faces a chicken-egg problem, since what it knows at the top of the loop (i.e., at the beginning of each loop iteration) depends not only on what it knew just before the loop, but also on what it knew just before it jumped back to the top of the loop from the loop body. Rather than trying to guess what it should know at the top of a loop, VCC lets you tell it what it should know, by providing a loop invariant. To make sure that the loop invariant does indeed hold whenever control reaches the top of the loop, VCC asserts that the invariant holds wherever control jumps to the top of the loop namely, on loop entry and at the end of the loop body. Let s look at an example: #include <vcc.h> void divide(unsigned x, unsigned d, unsigned q, unsigned r) _(requires d > 0 && q!= r) _(writes q,r) _(ensures x == d ( q) + r && r < d) unsigned lq = 0; unsigned lr = x; while (lr >= d) _(invariant x == d lq + lr) lq++; lr = d; q = lq; r = lr; Verification of divide succeeded. The divide() function computes the quotient and remainder of integer division of x by d using the classic division algorithm. The loop invariant says that we have a suitable answer, except with a remainder that is possibly too big. VCC translates this example roughly as follows: #include <vcc.h> void divide(unsigned x, unsigned d, unsigned q, unsigned r) _(writes q,r) // assume the precondition _(assume d > 0 && q!= r) unsigned lq = 0; unsigned lr = x; // check that the invariant holds on loop entry _(assert x == d lq + lr) // start an arbitrary iteration VCC Tutorial (working draft, ver. 0.2) /5/8

8 // forget variables modified in the loop unsigned _fresh_lq, _fresh_lr; lq = _fresh_lq; lr = _fresh_lr; // assume that the loop invariant holds _(assume x == d lq + lr) // jump out if the loop terminated if (!(lr >= d)) goto loopexit; // body of the loop lq++; lr = d; // check that the loop preserves the invariant _(assert x == d lq + lr) // end of the loop _(assume \false) loopexit: q = lq; r = lr; // assert postcondition _(assert x == d ( q) + r && r < d) Note that this translation has removed all cycles from the control flow graph of the function (even though it has gotos); this means that VCC can use the rules of the previous sections to reason about the program. In VCC, all program reasoning is reduced to reasoning about acyclic chunks of code in this way. Note that the invariant is asserted wherever control moves to the top of the loop (here, on entry to the loop and at the end of the loop body). On loop entry, VCC forgets the value of each variable modified in the loop (in this case just the local variables lr and ld), 11 and assumes the invariant (which places some constraints on these variables). VCC doesn t have to consider the actual jump from the end of the loop iteration back to the top of the loop (since it has already checked the loop invariant), so further consideration of that branch is cut off with _(assume \false). Each loop exit is translated into a goto that jumps to just beyond the loop (to loopexit). At this control point, we know the loop invariant holds and that lr < d, which together imply that we have computed the quotient and remainder. For another, more typical example of a loop, consider the following function that uses linear search to determine if a value occurs within an array: #include <vcc.h> #include <limits.h> unsigned lsearch(int elt, int ar, unsigned sz) _(requires \thread_local_array(ar, sz)) _(ensures \result!= UINT_MAX ==> ar[\result] == elt) _(ensures \forall unsigned i; i < sz && i < \result ==> ar[i]!= elt) unsigned i; for (i = 0; i < sz; i++) _(invariant \forall unsigned j; j < i ==> ar[j]!= elt) if (ar[i] == elt) return i; return UINT_MAX; 11 Because of aliasing, it is not always obvious to VCC that a variable is not modified in the body of the loop. However, VCC can check it syntactically for a local variable if you never take the address of that variable. Verification of lsearch succeeded. The postconditions say that the returned value is the minimal array index at which elt occurs (or UINT_MAX if it does not occur). The loop invariant says that elt does not occur in ar[0]... ar[i 1]. 5.1 Writes clauses for loops Loops are in many ways similar to recursive functions. Invariants work as the combination of pre- and post-conditions. Similarly to functions loops can also have writes clauses. You can provide a writes clause using exactly the same syntax as for functions. When you do not write any heap location in the loop (which has been the case in all examples so far), VCC will automatically infer an empty writes clause. Otherwise, it will take the writes clause that is specified on the function. So by default, the loop is allowed to write everything that the function can. Let s see an example of such implicit writes clause, a reinterpretation of my_memcpy() from??. void my_memcpy(unsigned char dst, unsigned char src, unsigned len) _(writes \array_range(dst, len)) _(requires \thread_local_array(src, len)) _(requires \arrays_disjoint(src, len, dst, len)) _(ensures \forall unsigned i; i < len ==> dst[i] == \old(src[i])) unsigned k; for (k = 0; k < len; ++k) _(invariant \forall unsigned i; i < k ==> dst[i] == \old(src[i])) dst[k] = src[k]; If the loop does not write everything the function can write you will often want to provide explicit write clauses. Here s a variation of memcpy(), which clears (maybe for security reasons) the source buffer after copying it. void memcpyandclr(unsigned char dst, unsigned char src, unsigned len) _(writes \array_range(src, len)) _(writes \array_range(dst, len)) _(requires \arrays_disjoint(src, len, dst, len)) _(ensures \forall unsigned i; i < len ==> dst[i] == \old(src[i])) _(ensures \forall unsigned i; i < len ==> src[i] == 0) unsigned k; for (k = 0; k < len; ++k) _(writes \array_range(dst, len)) _(invariant \forall unsigned i; i < k ==> dst[i] == \old(src[i])) dst[k] = src[k]; for (k = 0; k < len; ++k) _(writes \array_range(src, len)) _(invariant \forall unsigned i; i < k ==> src[i] == 0) src[k] = 0; If the second loops did not provide a writes clause, we couldn t prove the first postcondition VCC would think that the second loop could have overwritten dst. [TODO: one sorting example is needed if we want people to do other, maybe we should use bubble or insertion sort though] Equipped with that knowledge we can proceed to not only checking if an array is sorted, as we did in??, but to actually sorting it. The function below implements the bozo-sort algorithm. The VCC Tutorial (working draft, ver. 0.2) /5/8

9 algorithm works by swapping two random elements in an array, checking if the resulting array is sorted, and repeating otherwise. We do not recommend using it in production code: it s not stable, and moreover has a fairly bad time complexity. _(logic bool sorted(int buf, unsigned len) = \forall unsigned i, j; i < j && j < len ==> buf[i] <= buf[j]) void bozo_sort(int buf, unsigned len) _(writes \array_range(buf, len)) _(ensures sorted(buf, len)) if (len == 0) return; for (;;) _(invariant \mutable_array(buf, len)) int tmp; unsigned i = random(len), j = random(len); tmp = buf[i]; buf[i] = buf[j]; buf[j] = tmp; for (i = 0; i < len 1; ++i) _(invariant sorted(buf, i + 1)) if (buf[i] > buf[i + 1]) break; if (i == len 1) break; The specification that we use is that the output of the sorting routine is sorted. Unfortunately, we do not say that it s actually a permutation of the input. We ll show how to do that in 7.2. Exercises Return to the verification exercises of section 3.4 and repeat them with iterative implementations. 6. Object invariants Pre- and postconditions allow for associating consistency conditions with the code. However, fairly often it is also possible to associate such consistency conditions with the data and require all the code operating on such data to obey the conditions. As we will learn in 8 this is particularly important for data accessed concurrently by multiple threads, but even for sequential programs enforcing consistency conditions on data reduces annotation clutter and allows for introduction of abstraction boundaries. In VCC, the mechanism for enforcing data consistency is object invariants, which are conditions associated with compound C types (structs and unions). The invariants of a type describe how proper objects of that type behave. In this and the following section, we consider only the static aspects of this behavior, namely what the consistent states of an object are. Dynamic aspects, i.e., how objects can change, are covered in 8. For example, consider the following type definition of \0 -terminated safe strings implemented with statically allocated arrays (we ll see dynamic allocation later). #define SSTR_MAXLEN 100 typedef struct SafeString unsigned len; char content[sstr_maxlen + 1]; _(invariant \this >len <= SSTR_MAXLEN) _(invariant content[len] == \0 ) SafeString; The invariant of SafeString states that consistent SafeStrings have length not more than SSTR_MAXLEN and are \0 -terminated. Within a type invariant, \this refers to (the address of) the current instance of the type (as in the first invariant), but fields can also be referred to directly (as in the second invariant). Because memory in C is allocated without initialization, no nontrivial object invariant could be enforced to hold at all times (they would not hold right after allocation). Wrapped objects are ones for which the invariant holds and which the current thread directly owns (that is they are not part of representation of some higher-level objects). After allocating an object we would usually wrap it to make sure its invariant holds and prepare it for later use: void sstr_init(struct SafeString s) _(writes \span(s)) _(ensures \wrapped(s)) s >len = 0; s >content[0] = \0 ; _(wrap s) For a pointer p of structured type, \span(p) returns the set of pointers to members of p. Arrays of base types produce one pointer for each base type component, so in this example, \span(s) abbreviates the set s, &s >len, &s >content[0], &s >content[1],..., &s >content[sstr_maxlen] Thus, the writes clause says that the function can write the fields of s. The postcondition says that the function returns with s wrapped, which implies also that the invariant of s holds; this invariant is checked when the object is wrapped. (You can see this check fail by commenting any of the assignment statements.) A function that modifies a wrapped object will first unwrap it, make the necessary updates, and wrap the object again (which causes another check of the object invariant). Unwrapping an object adds all of its members to the writes set of a function, so such a function has to report that it writes the object, but does not have to report writing the fields of the object. void sstr_append_char(struct SafeString s, char c) _(requires \wrapped(s)) _(requires s >len < SSTR_MAXLEN) _(ensures \wrapped(s)) _(writes s) _(unwrap s) s >content[s >len++] = c; s >content[s >len] = \0 ; _(wrap s) Finally, a function that only reads an object need not unwrap, and so will not list it in its writes clause. For example: int sstr_index_of(struct SafeString s, char c) _(requires \wrapped(s)) _(ensures \result >= 0 ==> s >content[\result] == c) unsigned i; for (i = 0; i < s >len; ++i) if (s >content[i] == c) return (int)i; return 1; The following subsection explains this wrap/unwrap protocol in more details. 6.1 Wrap/unwrap protocol Because invariants do not always hold, in VCC one needs to explicitly state which objects are consistent, using a field \closed which VCC Tutorial (working draft, ver. 0.2) /5/8

10 is defined on every object. A closed object is one for which the \closed field is true, and an open object is one where it is false. The invariants have to (VCC enforces them) to hold only when for closed objects, but can also hold for open objects. Newly allocated objects are open, and you need to make them open before disposing them. In addition to the \closed field each object has an owner field. The owner of p is p >\owner. This field is of pointer (object) type, but VCC provides objects, of \thread type, to represent threads of execution, so that threads can also own objects. The idea is that the owner of p should have some special rights to p that others do not. In particular, the owner of p can transfer ownership of p to another object (e.g., a thread can transfer ownership of p from itself to the memory allocator, in order to dispose of p). When verifying a body of a function VCC assumes that it is being executed by some particular thread. The \thread object representing it is referred to as \me. (Some of) the rules of ownership and consistency are 1. on every atomic step of the program the invariants of all the closed objects have to hold, 2. only the owning thread can modify fields of an open object, 3. each thread owns itself, and 4. only threads can own open objects. Thus, by the first two rules, VCC allows updates of objects in the following two situations: 1. the updated object is closed, the update is atomic, and the update preserves the invariant of the object, 2. or the updated object is open and the update is performed by the owning thread. In the first case to ensure that an update is atomic, VCC requires that the updated field has a volatile modifier. There is a lot to be said about atomic updates in VCC, and we shall do that in 8, but for now we re only considering sequentially accessed objects, with no volatile modifiers on fields. For such objects we can assume that they do not change when they are closed, so the only way to change their fields is to first make them open, i.e., via method 2 above. A thread needs to make the object open to update it. Because making it open counts as an update, the thread needs to own it first. This is performed by the unwrap operation, which translates to the following steps: 1. assert that the object is in the writes set, 2. assert that the object is wrapped (closed and owned by the current thread), 3. assume the invariant (as a consequence of rule 1, the invariant holds for every closed object), 4. set the \closed field to false, and 5. add the span of the object (i.e., all its fields) to the writes set The wrap operation does just the reverse: 1. assert that the object is mutable (open and owned by the current thread), 2. assert the invariant, and 3. set the \closed field to true (this implicitly prevents further writes to the fields of the object). Let s then have a look at the definitions of \wrapped(...) and \mutable(...). logic bool \wrapped(\object o) = o >\closed && o >\owner == \me; logic bool \mutable(\object o) =!o >\closed && o >\owner == \me; The definitions of \wrapped(...) and \mutable(...) use the \object type. It is much like void, in the sense that it is a wildcard for any pointer type. However, unlike void, it also carries the dynamic information about the type of the pointer. It can be only used in specifications. The assert/assume desugaring of the sstr_append_char() function looks as follows: void sstr_append_char(struct SafeString s, char c) _(requires \wrapped(s)) _(requires s >len < SSTR_MAXLEN) _(ensures \wrapped(s)) // _(unwrap s), steps 1 5 _(assert \writable(s)) _(assert \wrapped(s)) _(assume s >len <= SSTR_MAXLEN && s >content[s >len] == \0 ) _(ghost s >\closed = \false) _(assume \writable(\span(s))) s >content[s >len++] = c; s >content[s >len] = \0 ; // _(wrap s), steps 1 3 _(assert \mutable(s)) _(assert s >len <= SSTR_MAXLEN && s >content[s >len] == \0 ) _(ghost s >\closed = \true) 6.2 Ownership trees Objects often stand for abstractions that are implemented with more than just one physical object. As a simple example, consider our SafeString, changed to have a dynamically allocated buffer. The logical string object consists of the control object holding the length and the array of bytes holding the content. In real programs such abstraction become hierarchical, e.g., an address book might consists of a few hash tables, each of which consists of a control object, an array of buckets, and the attached linked lists. struct SafeString unsigned capacity, len; char content; _(invariant len < capacity) _(invariant content[len] == \0 ) _(invariant \mine((char[capacity])content)) ; In C the type char[10] denotes an array with exactly 10 elements. VCC extends that location to allow the type char[capacity] denoting an array with capacity elements (where capacity is a variable). Such types can be only used in casts. For example, (char[capacity])content means to take the pointer content and interpret it as an array of capacity elements of type char. This notation is used so we can think of arrays as objects (of a special type). The other way to think about it is that content represents just one object of type char, whereas (char[capacity])content is an object representing the array. The invariant of SafeString specifies that it owns the array object. The syntax \mine(o1,..., on) is roughly equivalent (we ll get into details later) to: o1 >\owner == \this &&... && on >\owner == \this Conceptually there isn t much difference between having the char array embedded and owning a pointer to it. In particular, the functions operating on some s of type SafeString should still list only s VCC Tutorial (working draft, ver. 0.2) /5/8

VCC: A Practical System for Verifying Concurrent C

VCC: A Practical System for Verifying Concurrent C Ernie Cohen 1, Markus Dahlweid 2, Mark Hillebrand 3, Dirk Leinenbach 3, Michal Moskal 2, Thomas Santen 2, Wolfram Schulte 4 and Stephan Tobies 2 1 Microsoft