Refinements to techniques for verifying shape analysis invariants in Coq

Similar documents
Refinements to techniques for verifying shape analysis invariants in Coq

Using the Coq theorem prover to verify complex data structure invariants

Proving Properties on Programs From the Coq Tutorial at ITP 2015

Theorem proving. PVS theorem prover. Hoare style verification PVS. More on embeddings. What if. Abhik Roychoudhury CS 6214

CS152: Programming Languages. Lecture 2 Syntax. Dan Grossman Spring 2011

Chapter 1. Introduction

c constructor P, Q terms used as propositions G, H hypotheses scope identifier for a notation scope M, module identifiers t, u arbitrary terms

Coq quick reference. Category Example Description. Inductive type with instances defined by constructors, including y of type Y. Inductive X : Univ :=

Main Goal. Language-independent program verification framework. Derive program properties from operational semantics

Inductive Definitions, continued

Finite Model Generation for Isabelle/HOL Using a SAT Solver

VS 3 : SMT Solvers for Program Verification

Software verification using proof assistants

Hoare Logic and Model Checking

An experiment with variable binding, denotational semantics, and logical relations in Coq. Adam Chlipala University of California, Berkeley

Hoare logic. A proof system for separation logic. Introduction. Separation logic

Induction and Semantics in Dafny

Hoare Logic and Model Checking. A proof system for Separation logic. Introduction. Separation Logic

CSolve: Verifying C With Liquid Types

Verification Condition Generation via Theorem Proving

Equations: a tool for dependent pattern-matching

3.7 Denotational Semantics

Improving Coq Propositional Reasoning Using a Lazy CNF Conversion

In Our Last Exciting Episode

Specification, Verification, and Interactive Proof

Hoare logic. WHILE p, a language with pointers. Introduction. Syntax of WHILE p. Lecture 5: Introduction to separation logic

These notes are intended exclusively for the personal usage of the students of CS352 at Cal Poly Pomona. Any other usage is prohibited without

Hoare logic. Lecture 5: Introduction to separation logic. Jean Pichon-Pharabod University of Cambridge. CST Part II 2017/18

Tree Interpolation in Vampire

Programming Languages Third Edition

A Simple Supercompiler Formally Verified in Coq

A proof-producing CSP solver: A proof supplement

Lecture Notes on Program Equivalence

Analysis of dependent types in Coq through the deletion of the largest node of a binary search tree

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Chapter 11 p. 1/38

Expr_annotated.v. Expr_annotated.v. Printed by Zach Tatlock

Adam Chlipala University of California, Berkeley ICFP 2006

An Extensible Programming Language for Verified Systems Software. Adam Chlipala MIT CSAIL WG 2.16 meeting, 2012

Formally Certified Satisfiability Solving

The Rule of Constancy(Derived Frame Rule)

CMSC 330: Organization of Programming Languages

Translation validation: from Simulink to C

Types and Type Inference

Asserting Memory Shape using Linear Logic

Motivation was to facilitate development of systems software, especially OS development.

Foundations of Computer Science Spring Mathematical Preliminaries

Types and Type Inference

Z Notation. June 21, 2018

Inferring Invariants in Separation Logic for Imperative List-processing Programs

Handout 9: Imperative Programs and State

Part II. Hoare Logic and Program Verification. Why specify programs? Specification and Verification. Code Verification. Why verify programs?

The Prototype Verification System PVS

Introduction to dependent types in Coq

Reasoning About Imperative Programs. COS 441 Slides 10

Coq projects for type theory 2018

Deductive Methods, Bounded Model Checking

CS 456 (Fall 2018) Scribe Notes: 2

Formal semantics of loosely typed languages. Joep Verkoelen Vincent Driessen

Lectures 20, 21: Axiomatic Semantics

Modular dependent induction in Coq, Mendler-style. Paolo Torrini

1 Problem Description

Towards Customizability of a Symbolic-Execution-Based Program Verifier

Type Theory meets Effects. Greg Morrisett

Separating Shape Graphs

Formalization of Incremental Simplex Algorithm by Stepwise Refinement

A CRASH COURSE IN SEMANTICS

Abstract Interpretation

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

Programming Languages Fall 2013

Functional Programming. Pure Functional Programming

Lecture Notes on Memory Layout

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/

Verified Characteristic Formulae for CakeML. Armaël Guéneau, Magnus O. Myreen, Ramana Kumar, Michael Norrish April 27, 2017

Integration of SMT Solvers with ITPs There and Back Again

Lecture Notes on Induction and Recursion

From Event-B Models to Dafny Code Contracts

Short Notes of CS201

Mechanised Separation Algebra

Verifying Data Structure Invariants in Device Drivers

CS201 - Introduction to Programming Glossary By

Verasco: a Formally Verified C Static Analyzer

Linear Logic, Heap-shape Patterns and Imperative Programming

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

A concrete memory model for CompCert

Binary Tree Node Relationships. Binary Trees. Quick Application: Expression Trees. Traversals

Binary Trees. For example: Jargon: General Binary Trees. root node. level: internal node. edge. leaf node. Data Structures & File Management

A Refinement Framework for Monadic Programs in Isabelle/HOL

An Introduction to Heap Analysis. Pietro Ferrara. Chair of Programming Methodology ETH Zurich, Switzerland

CSCI 171 Chapter Outlines

Lecture 3: Recursion; Structural Induction

X-KIF New Knowledge Modeling Language

Paths, Flowers and Vertex Cover

CIS 500 Software Foundations. Midterm I. (Standard and advanced versions together) October 1, 2013 Answer key

Chapter 4: Trees. 4.2 For node B :

Formal Systems II: Applications

COS 320. Compiling Techniques

G Programming Languages - Fall 2012

Hierarchical Shape Abstraction of Dynamic Structures in Static Blocks

Semantics. There is no single widely acceptable notation or formalism for describing semantics Operational Semantics

Object Ownership in Program Verification

Transcription:

Refinements to techniques for verifying shape analysis invariants in Coq Kenneth Roe and Scott Smith The Johns Hopkins University Abstract. We describe the PEDANTIC framework for verifying the correctness of C-like programs using Coq. PEDANTIC is designed to prove invariants over complex dynamic data structures such as interreferencing trees and linked lists. The PEDANTIC tactic library has been constructed to allow program verifications to be done with reasonably compact proofs. We introduce a couple of important innovations. First, we introduce constructs to allow for elegant reasoning about cross referencing relationships between recursive data structures on the heap. Second we have designed our framework to be extensible in spite of its use of a deep embedding of the logic into Coq. Our data structure for representing state assertions is parameterized by functions which give semantic interpretations to the constructs. This allows us to extend our system with new predicates and functions without changing the underlying Coq data structures that define the core PEDANTIC logic. 1 Introduction This paper presents first steps towards a framework for verifying programs in imparative languages such as C that use complex heap based data structures. Our extensible PEDANTIC (Proof Engine for Deductive Automation using Nondeterministic Traversal of Instruction Code) framework is implemented in Coq and is based on ideas from many other separation logic frameworks such as [2, 1, 3]. The framework implements a logic language based on separation logic for describing program invariants, and a set of tactics to propagate these assertions through the program. While we use Coq as our underlying prover as do the aforecited works, we make some different choices in terms of how the logic is structured. We discuss four areas of development: (1) use of a deep model; (2) use of pointers in the functional representation of a data structure; (3) introduction of a merge tactic to merge together branches at the end of an if-then-else construct; and (4) use of a crunch tactic to perform low level theorem proving tasks. Deep model Unlike the frameworks of [2, 1, 3], PEDANTIC uses a deep embedding. This means an explicit Coq data structure is defined for the assertion language grammar, whereas the cited frameworks use higher-order abstract syntax (HOAS). The trade-offs between the two approaches are well-known, but we briefly review them in the context of our goals. We use a deep embedding because Coq s Galina language functions can then be used to manipulate the assertions and define some of the tactics. While Ltac can be used in the shallow model to create arbitrary tactics, Ltac requires a proof to be constructed even

if there is a simpler algorithm of computing the outcome of the tactic. With the deep model, one can do proofs of the Galina functions themselves which eliminates the need to do this work in the program verification. There is the disadvantage that a deep model is less flexible. In the above systems, one can add complex specifications by simply defining new functions and properties as part of the proof. To address this shortcoming, we have developed a novel approach in which we parameterized the data structures for representing the state, allowing users to add new definitions by redefining semantic functions. While this is more complex than what needs to be done with a shallow model, we believe a preprocessor could be written to automate the additional steps. The deep model also means some functionality embedded in Coq needs to be reimplemented. For example, Some arithmetic and logical simplifications need to be re-implemented whereas in a shallow embedding they come for free since Coq s logical reasoning tactics can automatically apply. There is no getting around this weakness but we believe the reward will be worth the price. Note that once the deep embedding syntax is unfolded to its underlying meaning as defined in Coq, the difference between the two approaches has vanished. Embedding pointers in a functional representation The Btree example in [3] illustrates how embedding pointers in a functional representation can be used to simplify the expression of invariants. We expand on this technique by showing how it can be used to represent cross referencing relationships from one data structure to another. We also designed our logic to supports the expression of data structure invariants in semi-independent layers. First, a shape invariant is used to describe the recursive data structure. Then, separate predicates can be used to describe additional constraints, giving state assertions more modularity. Merge At the end of an if-then-else block, our forward propagation tactics will have produced two distinct state assertions. As these assertions are derived from the same starting point, they will be similar but not the same. We have developed a merge tactic which works by first pairing off components in the states that are the same, and then proceeding to merge other components through more complex operations. In this process it may be necessary to invoke fold tactics to align recursive data structures to facilitate the merge. We also use this pairing technique for proving assertion entailment properties. Crunch Finally, we found it useful to create a series of crunch tactics to automate many simple operations. For example, if one sees a hypothesis of the form H: None = Some, the tactic inversion H can be immediately applied to solve the goal. We have rolled up a set of these rules into a single large Ltac match statement. 1.1 A program example PEDANTIC excels at reasoning about cross-structure invariants, and we give a small program example here which contains such invariants that PEDANTIC can verify. Figure 1 shows a C-like program which performs a traversal of a tree, given in the parameter r to the function build pre order and places the result

struct list { l 11 t = NULL; struct list *n; l 12 } else { struct tree *t; l 13 struct list *tmp = i->n; int data; l 14 t = i->t; }; l 15 free(l); l 16 i = tmp; struct tree { l 17 } struct tree *l, *r; l 18 } else if (t->r==null) { int value; l 19 t = t->l; }; l 20 } else if (t->l==null) { struct list *p; l 21 t = t->r; void build pre order(struct tree *r) { l 22 } else { l 1 struct list *i = NULL, *n, *x; l 23 n = i; l 2 struct tree *t = r; l 24 i = malloc( l 3 p = NULL; sizeof(struct list)); l 4 while (t) { l 25 i->n = n; l 5 n = p; l 26 x = t->r; l 6 p = malloc(sizeof(struct list)); l 27 i->t = x; l 7 p->t = t; l 28 t = t->l; l 8 p->n = n; l 29 } l 9 if (t->l==null && t->r==null) { l 30 } l 10 if (i==null) { l 31 } Fig. 1. Example program which builds a linked list of all nodes in a tree in a linked list p. This function contains a pointer t which walks the tree. As the tree is walked, elements are added to the p list. Our Coq proof verifies a number of important properties of the data structures in this program, including: (1) the program maintains two well formed linked lists, the heads of which are pointed to by n and p; (2) the program maintains a well formed tree pointed to by r; (3) t always points to an element in the tree rooted at r; (4) the two lists and the tree are disjoint in memory; (5) no other heap memory is allocated; and, (6) the t field of every element in both list structures points to an element in the tree. Invariant 6, in particular is a cross-structure invariant which PEDANTIC is highly suited to verify. 2 Separation Invariants In this section we describe the internal structure of PEDANTIC in more detail. Shallow embeddings such as Bedrock[2] allow one to create arbitrary recursive data structure invariants by defining a Coq function that can be used in a separation logic based assertion. With a deep embedding where the logic is defined as a data structure (rather than a Coq expression), it is non-trivial to make the framework extensible. To address this problem, we make PEDANTIC an explicitly parametric logic consisting of a generic core logic which can be specialized in several dimensions for particular program structures. So, in PEDANTIC we define a general abstract state logic data structure, which can be instantiated to a particular state instance depending on the particular kinds of inductive program invariants that are needed. The abstract syntax for states is shown in Figure 2. absexp is the type of generic assertion expressions, and absstate is the type of generic state assertions. These types

Inductive absexp {ev} {eq : ev -> ev -> bool} { f : id -> list (@Value ev) -> (@Value ev) } : Type := AbsConstVal : (@Value ev) -> absexp AbsVar : id -> absexp AbsQVar : absvar -> absexp AbsFun : id -> list absexp -> absexp. Inductive absstate {ev} {eq : ev -> ev -> bool} { f : id -> list (@Value ev) -> (@Value ev) } { t : id -> list (@Value ev) -> heap -> Prop }... := AbsExists : (@absexp ev eq f) -> @absstate ev eq f t ac -> absstate AbsAll : (@absexp ev eq f) -> @absstate ev eq f t ac -> absstate AbsCompose : @absstate ev eq f t ac -> @absstate ev eq f t ac -> absstate AbsEmpty : absstate AbsLeaf : id -> (list (@absexp ev eq f)) -> absstate... Fig. 2. Separation expression and assertion data structures, from file AbsState.v (e, h) [P ] iff P e 0 and dom(h) = (e, h) l v iff h( l e) = v e x N, x l e x dom(h) (e, h) TREE(r, v, n, f) iff ( r e = 0 h = ) h 0,..., h n+m, x 0,..., x n 1, v 0,..., v m. r e 0 (e, h 0) ( r e + 0) x 0... (e, h n 1) ( r e + n 1) x n 1 (e, h n) TREE(h( r +f 0), v f0, s, f)... (e, h n+m 1) TREE(h( r +f m 1), v fm 1, s, f) combine(h, h 0,..., h n+m 1) v = [ r e, v 0,..., v n 1] i < n.i f implies v i = x i where combine(h, h 0,..., h n) iff i, j n.dom(h i) dom(h j) = i dom(h). j.i dom(h j) h(i) = h j(i) Fig. 3. Semantics for the separation predicates of the basicstate instantiation of absstate. (e, h) represents a concrete state. (e, h) s means that (e, h) satisfies the assertion s. e Id Nat represents assignments for program variables and h = Nat optionnat represents the values on the heap. The notation exp e represents evaluation of the expression exp with the set of variables in the environment e. are parametric in function parameters f/t/... which are provided to define an instance of the abstract semantics. The f parameter in absexp is used by the absfun clause to define particular arithmetic operators as needed. The t parameter in AbsState is used by the AbsLeaf clause, and allows arbitrary new state assertions to be plugged into the abstract state assertion grammar. We use the abbreviation a b for AbsCompose a b. We use de Brujin indices for bound variables, so AbsAll/AbsExists do not specify the names of the variable they introduce. We use sugared quantifier notation with variables v 0, v 1,... below for readability. For the particular example of this paper, the absexp and absstate datatypes are instantiated to the corresponding basiceval and basicstate datatypes by providing explicit f and t functions. We instantiate with an f that defines standard operations on integers; parameter t is instantiated to provide

three important leaf predicates: AbsPredicate, AbsCell and AbsTree. AbsLeaf AbsPredicate P, denoted [P ], indicates an arbitrary predicate on the heap, AbsLeaf AbsCell l v, abbreviated l v, asserts that cell l has contents v, and AbsLeaf AbsTree, abbreviated TREE, is used for characterizing recursive data structures on the heap, either lists or trees. By defining a fixed grammar for tree structure assertions we can in tandem define generalized fold and unfold tactics which will work over different recursive data structures in different derivations that are defined with AbsTree. The semantics for these three t extensions are given in Figure 3; the semantics of the absstate core seperation logic is standard and is elided. The basicstate instance is itself further extensible, a user could create a userstate and usereval to integrate additional functions. These functions must support all the functionality of basicstate and basiceval; to verify this holds, there is a supportsfunctionality definition that needs to be instantiated and proven. 2.1 An example state assertion 14 0,0 12 14,16 v0 (R = 10) v1 (I = 30) v2 (P = 40) 10 12,18 16 0,0 18 20,0 20 0,0 30 32,10 32 0,12 40 42,14 42 44,12 44 0,10 AbsExists v 0.AbsExists v 1.AbsExists v 2. TREE(R, v 0, 2, [0,1]) TREE(I, v 1, 2, [0]) TREE(P, v 2, 2, [0]) AbsAll v 3 TreeRecords(v 1 ). [nth(find(v 1, v 3 ), 2) intree v 0 ] AbsAll v 3 TreeRecords(v 2 ). [nth(find(v 2, v 3 ), 2) intree v 0 ] [T = 0 T intree v 0 ] Fig. 4. A snapshot of a possible heap state of the program from Figure 1, and the general PEDANTIC assertion describing the state invariant for the loop over all executions. To better understand state assertions we present a sample heap snapshot of our example program in Figure 4. This example shows the heap memory after two loop iterations have been completed. To the right of the trees we give the general state invariant for the loop as we formalize it in Coq. The three TREE assertions characterize the tree and the two lists of the heap. The four parameters of TREE are: (1) the root of the tree; (2) a variable holding an equivalent functional representation (discussed below); (3) the word size of a record; (4) and, a list of offsets for all the child node pointers. Here the tree needs two such offsets, [0,1] and the lists only need one, [0]. The functional representations v 0 /v 1 /v 2 defined by TREE characterize both the recursive list/tree structure and additionally embed information on the pointer structure, similar to methods used in [3]. For the snapshot from the

Figure, v 0 would for example be the list [10,([12,[14,[0],[0]],[16,[0],[0]]]),([18,[20,[0],[0]],[0]])] Getting back to the invariant assertion, following the TREE assertions, the two AbsAll clauses assert for each list that the pointers in the second field of each node in the list point into the tree. TreeRecords is a function that returns the list of record pointers given a functional representation. For example, for the representation above, TreeRecords returns [10,12,14,16,18,20]. x intree y is shorthand for x TreeRecords(y), and find takes an address in a tree and a pointer to a tree, and returns the subtree rooted at that address. For example, find([30,[32,[0],12],10],32) will return [32,[0],12]. Finally, the last line states that either T is 0 or that T points to an element in T. 2.2 The Coq Proofs The Coq files associated with this paper include file TreeTraversal.v which contains a commented set of steps of the proof that describe the underlying approach to verification in PEDANTIC. 3 Conclusion This paper has covered the basic ideas in the PEDANTIC framework, a tool for verifying the correctness of C programs. We have demonstrated how dynamic data structure invariants including cross referenced dependencies can be expressed and reasoned about. There are many other features for which there is not enough space in the paper to discuss. We currently have an implementation of the framework that contains all of the basic data types and tactics. The proof of the example program invariant of Figure 4 is complete, but correctness proofs of most auxiliary Lemmas still need to be completed and are for now admitted. Once verification of the Lemmas is complete, we plan to apply PEDANTIC to a DPLL based SAT solver. Acknowledgements The authors would like to thank Greg Malecha for his feedback on this paper. References 1. Jesper Bengtson, Jonas Braband Jensen, and Lars Birkedal. Charge! - a framework for higher-order separation logic in Coq. In Third International Conference, ITP, 2012. 2. Adam Chlipala. Mostly-automated verification of low-level programs in computational separation logic. In 32nd Programming Language Design and Implementation (PLDI), 2011. 3. Gregory Malecha and Greg Morrisett. Mechanized verification with sharing. In ICTAC, 2010. 4. Nicolas Marti and Reynald Affeldt. A certified verifier for a fragment of separation logic. Information and Media Technologies, 4(2):304 316, 2009.