Provably Correct Software - PDF Free Download

Provably Correct Software Max Schäfer Institute of Information Science/Academia Sinica September 17, 2007 1 / 48

The Need for Provably Correct Software BUT bugs are annoying, embarrassing, and cost gazillions of $/TWD/es every year there are many approaches to developing less buggy software traditionally: code review, testing etc. E. W. Dijkstra: Program testing can be used to show the presence of bugs, but never to show their absence. (EWD 249) we are interested in provably correct software, i.e. software 1 that has a precise (mathematical) specication 2 that provably (machine checkably) fullls it 2 / 48

How to get there in order to specify program behavior and reason about programs, we need a mathematical model of the language thus we either need to work in a language that is amenable to mathematical treatment (functional languages, interactive theorem provers) work harder to construct a model for a (simplication of a) real world language, then integrate specications and proofs with programs 3 / 48

Outline We will present the following systems: Isabelle/HOL: SML-inspired programming language, about which propositions can be proved (based on LF) Agda: practical dependently typed programming language (based on UTT) Coq: integration of dependently typed programming language and theorem prover (based on PCIC) Proof Carrying Code Why/Caduceus: verication of imperative programs Much of the material is based on lecture notes of the TYPES Summer School 2007 (see http://typessummerschool07.cs.unibo.it). 5 / 48

Outline 1 Interactive Theorem Provers Verication of Functional Programs Dependent Types and Inductive Families 2 Proof Carrying Code 3 Verication of Imperative Programs 6 / 48

Functional Programming in One Slide the interactive theorem provers we discuss all have an internal functional language functions dened in the language should behave like mathematical functions no state, no assignable variables details of program evaluation should matter as little as possible in strongly typed languages (which we exclusively consider here), datatypes are used to ensure that functions are only invoked with meaningful arguments new datatypes can be dened inductively For the moment, we use the language of Isabelle/HOL. 7 / 48

Isabelle Isabelle is a generic proof assistant developed by L. C. Paulson (Cambridge) and T. Nipkow (München) since the late '80s it is based on the logical framework approach with a lean metalogic in which object logics can be implemented inference trees of the object logic are represented as terms of the metalogic; correct application of the rules is ensured by type checking the terms of the metalogic most used object logic is Isabelle/HOL, which implements higher order logic from the Isabelle homepage: The main application is the formalization of mathematical proofs and in particular formal verication, which includes proving the correctness of computer hardware or software and proving properties of computer languages and protocols. Homepage: http://isabelle.in.tum.de/ 9 / 48

Inductive Datatypes datatype of booleans: datatype bool = False True this tells us: 1 False and True have type bool 2 everything that has type bool is either False or True 3 False and True are dierent datatype of natural numbers: datatype nat = Zero Suc nat this type is recursive; we have: 1 Zero is of type nat, Suc is of type nat->nat 2 elements of type nat are Zero, (Suc Zero), (Suc (Suc Zero)), etc.; but every one is either Zero or of the form (Suc x), where x itself is also of type nat 3 Zero is not equal to any (Suc x); if (Suc x) equals (Suc y), then x equals y for convenience, we can use numerals (0:=Zero, 1:=Suc Zero,... ) 11 / 48

Polymorphic Types datatype of polymorphic lists: datatype 'a list = Nil Cons 'a 'a list list itself is not a type; it needs to be instantiated with a concrete 'a; for example, bool list and nat list are (dierent) types elements of bool list are Nil, Cons true Nil, Cons false (Cons false Nil), etc. elements of nat list are Nil, Cons (Suc (Suc Zero)) (Cons (Suc Zero) Nil), etc. we can not form a list like Cons true (Cons Zero Nil) Nil is ambiguous, it could be either Nil::bool list or Nil::nat list; most of the time, the compiler can gure it out 13 / 48

Functions on Lists functions on inductive datatypes can be dened by pattern matching example: appending two lists consts app :: 'a list => 'a list => 'a list primrec app Nil ys = ys app (Cons x xs) ys = Cons x (app xs ys) reversing a list consts rev :: 'a list => 'a list primrec rev Nil = Nil rev (Cons x xs) = app (rev xs) (Cons x Nil) 15 / 48

How do we know that they are correct? we can now formulate and prove statements about the functions structural induction on lists: a property P about lists can be proved by showing that 1 P holds on Nil 2 if P holds on xs, then it holds on Cons x xs corresponding induction schemata are automatically derived for every inductive datatype for example: we can prove that reversing a list twice yields the original list in Isabelle: theorem rev_rev: rev (rev xs) = xs 17 / 48

A Word about Termination it is very hard to reason about non-terminating functions e.g., if we could dene f(x) = f(x) + 1, then 0 = 0 + f(x) f(x) = 0 + f(x) + 1 f(x) = 1 hence, in Isabelle (and most other proof assistants) only terminating functions can be dened two ways to achieve this: 1 only use restricted recursion schemata (like primitive recursion) 2 provide explicit termination proofs both are possible in Isabelle thus, Isabelle (like most theorem provers) is not Turing complete! 19 / 48

Who are we trusting? the proofs are done directly on the source code, no need for translation to pseudo code we need not trust our code or our understanding of it the proofs are checked by Isabelle; we need to trust the Isabelle kernel we do not need to trust the people writing tactics! Slogan Make the amount of code that needs to be trusted as small as possible. 20 / 48

Integrating Proofs and Programs programs and proofs about them should not be separated look at the type of app in Isabelle: app :: 'a list => 'a list => 'a list it guarantees that, when given two lists, it will return a list this is not strong enough to convince us of its correctness we would like to know that if we have lists xs and ys of length m and n, then app xs ys is a list of length m + n for any 0 i < m, xs[i]=(app xs ys)[i] for any 0 i < n, ys[i]=(app xs ys)[m+i] we need a stronger type system, in which types can depend on data one language that makes this possible is Agda 22 / 48

Agda Agda is a theorem prover/programming language developed at Chalmers rst version was written by Catarina Coquand in the '90s Agda2 is a complete reimplementation, mostly by Ulf Norell it is based on Luo's Universal Type Theory, an extension of the Calculus of Constructions most important concepts are dependent types, universes, and inductive families syntax is similar to Haskell with sophisticated pattern matching Homepage: http://www.cs.chalmers.se/~ulfn/agda/ 24 / 48

Inductive Families the type of sized lists in Agda: data list {A : Set} : nat -> Set where [] : list A 0 _::_ : {n : nat} -> A -> list {A} n -> list {A} (S n) we now have true :: false :: [] : list {bool} 2 (observe inx notation of ::) safe head function: head : {A : Set} {n : nat} -> list {A} (S n) -> A head (x :: _) = x note that Agda's pattern matching mechanism gures out that the list argument cannot be empty! 26 / 48

The Append Function we can dene the append function _++_ to immediately show the eect on lengths: _++_ : {A : Set} {m n : nat} -> list {A} m -> list {A} n -> list {A} (m + n) [] ++ ys = ys (x :: xs) ++ ys = x :: (xs ++ ys) in order to prove that it does what it should, we need a function to index lists, preferably with a syntax like l[s zero] can you implement the following function? _[_] : {A : Set} {n : nat} -> list A n -> nat -> A 28 / 48

Safe Indexing we need to ensure that the index is smaller than n solution: dene, for every n : nat the type below n of natural numbers smaller than it data below : nat -> Set where bzero : {n : nat} -> below (S n) bsuc : {n : nat} -> below n -> below (S n) (sadly we cannot reuse the usual notation for natural numbers) now we can dene _[_] : {A : Set} {n : nat} -> list A n -> below n -> A and proceed to prove our implementation of _++_ correct 30 / 48

Function Denitions in Agda function denitions look similar to Haskell, always done by pattern matching functions have to be explicitly annotated with their type in a dependently typed setting it is not generally possible to infer types without annotations type checking is also quite hard; sometimes unexpected results: for Agda, the types list {A} (x+y) and list {A} (y+x) are dierent, although they have the same inhabitants function denitions for which Agda cannot ensure termination are still accepted, but marked by the editor 32 / 48

Advanced Dependent Types dependent types in conjunction with universes are extremely powerful; very few primitive concepts are needed example: denition of dependent sum type data Σ {A : Set} (P : A -> Set) : exist : {x : A} -> P x -> Σ P Set where an inhabitant of this type is a pair t, M, where t : A and M : P t; for example, Σ list is a type for lists of any length seen from a logical perspective, this is an implementation of the (constructive) existential quantier 34 / 48

Advanced Dependent Types (cont.) example: identity type data _==_ {A : Set} : A -> A -> Set where refl : (x : A) -> x == x the only way to obtain an element of this type is through refl, hence if we have an element of s == t, s and t must in fact be equal Agda's pattern matching can exploit this fact: subst : {A : Set} (C : A -> Set) (x y : A) -> x == y -> C x -> C y subst C.x.x (refl x) cx = cx 36 / 48

Comparison: Agda vs. Isabelle/HOL dierent goals: Isabelle/HOL is mainly a proof assistant, Agda is mainly a programming language Agda does not have lemmas, tactics, etc. (it can still be used as a proof assistant, however) Agda has type universes (like Set), which Isabelle/HOL lacks underlying concepts are quite similar 38 / 48

Throwing Everything Together: Coq Coq unies the programming language approach and the theorem prover approach started as an implementation of a type checker for the pure Calculus of Constructions of Coquand and Huet in the early '80s recent versions are based on the Predicative Calculus of Inductive Constructions it can be used to formulate and prove mathematical results similar to Isabelle inductive families and matching like in Agda are also available (not quite so sophisticated) tactic language similar to Isabelle/HOL, with user-denable tactics large library of predened datatypes, functions, and results about them functions written in Coq can be extracted to OCaml, Haskell, or Scheme for ecient execution (has been done for fairly large programs: Compcert project) Homepage: http://coq.inria.fr 40 / 48

Writing Certied Programs in Coq dening divisibility in Coq: Definition divides (d m:nat) := exists k, m = k*d. greatest common divisor: Definition is_gcd (m n d:nat) := divides d m /\ divides d n /\ (forall d', divides d' m -> divides d' n -> d' <= d). we want a function like the following: Definition gcd (m n:nat) : {d:nat is_gcd m n d}. later, we can extract from it an OCaml function gcd : nat -> nat -> nat without the explicit proofs 42 / 48

Comparison: Agda, Coq the technical underpinnings are very similar (at least from a user's perspective) Agda is missing Coq's theorem prover features, it does not have as large a codebase as Coq but this also means it has less historical ballast... the real selling point for Coq is program extraction 43 / 48

Outline 1 Interactive Theorem Provers Verication of Functional Programs Dependent Types and Inductive Families 2 Proof Carrying Code 3 Verication of Imperative Programs 44 / 48

Proof Carrying Code See the slides by David Pichardie and Benjamin Gregoire on the summer school website. 45 / 48

Outline 1 Interactive Theorem Provers Verication of Functional Programs Dependent Types and Inductive Families 2 Proof Carrying Code 3 Verication of Imperative Programs 46 / 48

Why/Caduceus Why is a tool for verifying imperative programs based on a simple ML-like imperative language (also called Why), which is annotated with conditions and invariants formulated in wp-style the Why compiler generates verication conditions which can be solved by an automatic prover or using an interactive proof environment (like Coq or Isabelle/HOL) Caduceus translates annotated C programs into Why for Java, there is a similar tool called Krakatoa Focus is on verication of real world programs with as much automation as possible. 47 / 48

Further Explanation and Examples For further explanations and examples see the slides by Jean-Christophe Filliâtre on the summer school website. 48 / 48