Automatic Proof and Disproof in Isabelle/HOL

Similar documents
Isabelle/HOL:Selected Features and Recent Improvements

Interactive Theorem Proving in Higher-Order Logics

Integration of SMT Solvers with ITPs There and Back Again

Isabelle Tutorial: System, HOL and Proofs

Property-Based Testing for Coq. Cătălin Hrițcu

Overview. A Compact Introduction to Isabelle/HOL. Tobias Nipkow. System Architecture. Overview of Isabelle/HOL

Functional Programming with Isabelle/HOL

Automated Discovery of Conditional Lemmas in Hipster!

Theorem Proving Principles, Techniques, Applications Recursion

Lost in translation. Leonardo de Moura Microsoft Research. how easy problems become hard due to bad encodings. Vampire Workshop 2015

Formalization of Incremental Simplex Algorithm by Stepwise Refinement

Provably Correct Software

Type- & Example-Driven Program Synthesis. Steve Zdancewic WG 2.8, August 2014

λ calculus is inconsistent

Programs and Proofs in Isabelle/HOL

Automatic Proofs and Refutations for Higher-Order Logic

Programming and Proving in

Theory Exploration on Infinite Structures. Master s thesis in Computer Science Algorithms, Languages, and Logic SÓLRÚN HALLA EINARSDÓTTIR

The design of a programming language for provably correct programs: success and failure

HOL DEFINING HIGHER ORDER LOGIC LAST TIME ON HOL CONTENT. Slide 3. Slide 1. Slide 4. Slide 2 WHAT IS HIGHER ORDER LOGIC? 2 LAST TIME ON HOL 1

Finite Model Generation for Isabelle/HOL Using a SAT Solver

COMP 4161 NICTA Advanced Course. Advanced Topics in Software Verification. Toby Murray, June Andronick, Gerwin Klein

Deductive Program Verification with Why3, Past and Future

From Types to Sets in Isabelle/HOL

locales ISAR IS BASED ON CONTEXTS CONTENT Slide 3 Slide 1 proof - fix x assume Ass: A. x and Ass are visible Slide 2 Slide 4 inside this context

Coq with Classes. Matthieu Sozeau. Journées PPS 2011 September 5th 2011 Trouville, France. Project Team πr 2 INRIA Paris

Why3 where programs meet provers

SAT Modulo Bounded Checking

A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality

Programming and Proving in Isabelle/HOL

Inductive datatypes in HOL. lessons learned in Formal-Logic Engineering

Programming and Proving in Isabelle/HOL

Introduction. Preliminaries. Original IC3. Tree-IC3. IC3 on Control Flow Automata. Conclusion

Relational Analysis of (Co)inductive Predicates, (Co)algebraic Datatypes, and (Co)recursive Functions

Concrete Semantics. A Proof Assistant Approach. Tobias Nipkow Fakultät für Informatik Technische Universität München

The Isabelle/Isar Reference Manual

type classes & locales

Automated Theorem Proving in a First-Order Logic with First Class Boolean Sort

Preuves Interactives et Applications

Coq, a formal proof development environment combining logic and programming. Hugo Herbelin

Turning inductive into equational specifications

Testing. Wouter Swierstra and Alejandro Serrano. Advanced functional programming - Lecture 2. [Faculty of Science Information and Computing Sciences]

COMP4161: Advanced Topics in Software Verification. fun. Gerwin Klein, June Andronick, Ramana Kumar S2/2016. data61.csiro.au

Parametricity of Inductive Predicates

An LCF-Style Interface between HOL and First-Order Logic

Verification of an LCF-Style First-Order Prover with Equality

A Decision Procedure for (Co)datatypes in SMT Solvers. Andrew Reynolds Jasmin Christian Blanchette IJCAI sister conference track, July 12, 2016

Programming and Proving in Isabelle/HOL

COMP 4161 Data61 Advanced Course. Advanced Topics in Software Verification. Gerwin Klein, June Andronick, Christine Rizkallah, Miki Tanaka

DPLL(Γ+T): a new style of reasoning for program checking

Paral-ITP Front-End Technologies

FAKULTÄT FÜR INFORMATIK. Synthesis of Robust, Semi-Intelligble Isar Proofs from ATP Proofs

Deductive Program Verification with Why3

Inductive data types

Improving Coq Propositional Reasoning Using a Lazy CNF Conversion

How Efficient Can Fully Verified Functional Programs Be - A Case Study of Graph Traversal Algorithms

First-Class Type Classes

Organisatorials. About us. Binary Search (java.util.arrays) When Tue 9:00 10:30 Thu 9:00 10:30. COMP 4161 NICTA Advanced Course

Programming Languages Lecture 14: Sum, Product, Recursive Types

Computing Fundamentals 2 Introduction to CafeOBJ

Automated Theorem Proving and Proof Checking

Basic Foundations of Isabelle/HOL

PRocH: Proof Reconstruction for HOL Light

The Isar Proof Language in 2016

Functional Programming and Modeling

A CRASH COURSE IN SEMANTICS

Some aspects of Unix file-system security

Induction in Coq. Nate Foster Spring 2018

Verified Characteristic Formulae for CakeML. Armaël Guéneau, Magnus O. Myreen, Ramana Kumar, Michael Norrish April 27, 2017

10 Years of Partiality and General Recursion in Type Theory

Counterexample Generation for Higher-Order Logic Using Functional and Logic Programming

Logic - CM0845 Introduction to Haskell

A Verified Simple Prover for First-Order Logic

Introduction to dependent types in Coq

Nitpick. Isabelle. Picking Nits. A User s Guide to Nitpick for Isabelle/HOL. Jasmin Christian Blanchette

Numerical Computations and Formal Methods

A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality

A Refinement Framework for Monadic Programs in Isabelle/HOL

A Verified Compiler from Isabelle/HOL to CakeML

Satisfiability Modulo Bounded Checking

Why. an intermediate language for deductive program verification

CS152: Programming Languages. Lecture 11 STLC Extensions and Related Topics. Dan Grossman Spring 2011

Lecture 5 - Axiomatic semantics

Isabelle s meta-logic. p.1

First Order Logic in Practice 1 First Order Logic in Practice John Harrison University of Cambridge Background: int

TIP: Tools for Inductive Provers

Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2.

Deductive Methods, Bounded Model Checking

Logik für Informatiker Logic for computer scientists

SMT Solvers for Verification and Synthesis. Andrew Reynolds VTSA Summer School August 1 and 3, 2017

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Universes. Universes for Data. Peter Morris. University of Nottingham. November 12, 2009

Introduction to Axiomatic Semantics (1/2)

Encoding Monomorphic and Polymorphic Types

A Simpl Shortest Path Checker Verification

Static and User-Extensible Proof Checking. Antonis Stampoulis Zhong Shao Yale University POPL 2012

Encoding Monomorphic and Polymorphic Types

More SPASS with Isabelle

Document-oriented Prover Interaction with Isabelle/PIDE

Simple proofs of simple programs in Why3

Transcription:

Automatic Proof and Disproof in Isabelle/HOL Jasmin Blanchette, Lukas Bulwahn, Tobias Nipkow Fakultät für Informatik TU München

1 Introduction 2 Isabelle s Standard Proof Methods 3 Sledgehammer 4 Quickcheck: Counterexamples by Testing 5 Nitpick: Counterexamples by SAT Solving

1 Introduction 2 Isabelle s Standard Proof Methods 3 Sledgehammer 4 Quickcheck: Counterexamples by Testing 5 Nitpick: Counterexamples by SAT Solving

A tale of two worlds FOL f (s, t) HOL f s t, f s, λx.t Otter (1987) Isabelle (1986) They did not talk to each other because they spoke different languages. This is the tale of how these two worlds began to understand and boost each other.

Isabelle is an interactive theorem prover that has always embraced automation but without sacrificing soundness: All proofs must ultimately go through the Isabelle kernel This is the LCF principle (Robin Milner).

Two decades of Isabelle development 1990s Basic proof automation Our own proof search in ML: simplifier, automatic provers, arithmetic 2000s Embrace external tools Let them do the proof search, but don t trust them: ATPs (FOL provers, Sons of Otter ) SMT solvers SAT solvers Programming languages

1 Introduction 2 Isabelle s Standard Proof Methods 3 Sledgehammer 4 Quickcheck: Counterexamples by Testing 5 Nitpick: Counterexamples by SAT Solving

Simplifier N. First and higher-order equations (λ) Conditional equations Contextual simplification Special solvers (eg orderings) Arithmetic Case splitting (triggered by if and case) Large library of default equations Isabelle s workhorse

The power of Isabelle s internal automated proof methods relies on large sets of default rules that are user-extensible ([simp]) and tuned over time.

Tableaux prover Paulson Based on leant A P (Beckert & Posegga ) Generic User-extensible by intro and elim rules Proof search in ML, proof checking via Isabelle kernel Works well for pure logic and set theory Does not know anything about equality

Isabelle Demo

1 Introduction 2 Isabelle s Standard Proof Methods 3 Sledgehammer 4 Quickcheck: Counterexamples by Testing 5 Nitpick: Counterexamples by SAT Solving

Sledgehammer Paulson et al. Connects Isabelle with ATPs and SMT solvers E, SPASS, Vampire, CVC3, Yices, Z3,... One-click invocation: Users don t need to select facts... or ensure the problem is first-order Exploits local parallelism, remote servers

Sledgehammer: Demo

Sledgehammer: Architecture Sledgehammer Relevance filter Relevance filter ATP translation SMT tr. SMT translation E SPASS Vampire Z3 CVC3 Yices Metis proof Metis proof Metis proof Metis or SMT proof Metis or SMT proof Metis or SMT proof

Sledgehammer: Fact selection Meng & Paulson Provers perform poorly given 1000s of facts A lightweight, symbol-based filter greatly improves the success rate Number of facts is optimized for each prover

Sledgehammer: Translation Meng & Paulson Bl., Böhme & Smallbone Source: higher-order, polymorphism + type classes Target: first-order, untyped/simply-typed 1 Firstorderize SK combinators, λ-lifting Explicit application operator 2 Encode types Monomorphize... or encode polymorphism

Sledgehammer: Reconstruction Paulson & Susanto Böhme & Weber Four approaches (the 4 Rs): A. Re-find using Metis B. Rerun external prover C. Recheck stored proof D. Recast into Isar proof

A. Re-find using Metis lemma length (tl xs) length xs by (metis append Nil2 append eq conv conj drop eq Nil drop tl tl.simps(1)) Usually fast and reliable Metis sometimes too slow (5% loss on avg)

B. Rerun external prover lemma length (tl xs) length xs by (smt append Nil2 append eq conv conj drop eq Nil drop tl tl.simps(1)) Reinvokes the SMT solver each time!

C. Recheck stored proof lemma length (tl xs) length xs by (smt append Nil2 append eq conv conj drop eq Nil drop tl tl.simps(1)) Fast No need for SMT solver for replay Fragile

D. Recast into Isar proof lemma length (tl xs) length xs proof have tl [] = [] by (metis tl.simps(1)) hence u. xs @ u = xs tl u = [] by (metis append Nil2) hence tl (drop (length xs) xs) = [] by (metis append eq conv conj) hence drop (length xs) (tl xs) = [] by (metis drop tl) thus length (tl xs) length xs by (metis drop eq Nil) qed Fast, self-explanatory Experimental, bulky

Sledgehammer: Judgment Day Böhme & N. Bl., Böhme & Paulson 1240 goals arising in 7 older theories Arrow, FFT, FTA, Hoare, Jinja, NS, SN In 2010: E, SPASS, Vampire (5 to 120 s) ESV 5 s V 120 s! In 2011: Also E-SInE, CVC3, Yices, Z3 (30 s) Z3 > V!

2010

2010 3 ATPs x 30s 54% 46%

2010 3 ATPs x 30s 3 ATPs x 30 s nontrivial goals 54% 46% 66% 34%

2010 3 ATPs x 30s 3 ATPs x 30 s nontrivial goals 54% 46% 66% 34% 2011 (4 ATPs + 3 SMTs) x 30s (4 ATPs + 3 SMTs) x 30s nontrivial goals 39% 61% 54% 46%

Old way: Sledgehammer & Teaching Paulson Low-level tactics + lemma libraries New way: Isar + Sledgehammer + simp etc. lemma blah sorry proof have blah 0 sorryby (metis foo bar) hence blah 1 sorryby metis hence blah 2 sorryby auto thus blah sorryby (metis baz) qed sorry

Sledgehammer: Success story Guttman, Struth & Weber Developed large Isabelle/HOL repository of algebras for modeling imperative programs (Kleene Algebra, Hoare logic,..., 1000 lemmas) Intricate refinement and termination theorems Surprise: Sledgehammer and Z3 automate algebraic proofs at textbook level! The integration of ATP, SMT, and Nitpick is for our purposes very very helpful. G. Struth

Theorem proving and testing Testing can show only the presence of errors, but not their absence. (Dijkstra) Testing cannot prove theorems, but it can refute conjectures! Two facts of life: 95% of all conjectured theorems are wrong. Theorem proving is an expensive debugging technique. Theorem provers need counterexample finders!

1 Introduction 2 Isabelle s Standard Proof Methods 3 Sledgehammer 4 Quickcheck: Counterexamples by Testing 5 Nitpick: Counterexamples by SAT Solving

Quickcheck Berghofer & N. Bul. Adds lightweight validation by testing Motivated by Haskell s QuickCheck Employs Isabelle s code generator Quick response time No-click invocation: automatic after parsing a proposition

Quickcheck: Demo

Quickcheck Berghofer & N. Bul. Covers different testing approaches Random and exhaustive testing Smart test data generators Narrowing-based testing Creates test data generators automatically

Test generators for datatypes Fast iteration over the large number of tests using continuation-passing-style programming: For datatype α list = Nil Cons α (α list) we create a test function for property P: test αlist P = P Nil andalso test α (λx. test αlist (λxs.p (Cons x xs)))

Test generators for predicates Testing propositions with preconditions distinct xs = distinct (remove1 x xs) Problem: Exhaustive testing creates useless test data Solution: Use precondition s definition for smarter generator

Test generators for predicates From the definition: distinct Nil = True distinct (Cons x xs) = (x / xs distinct xs) we create a test function for property P: test-distinct αlist P = P Nil andalso test α (λx. test-distinct αlist (λxs. if x / xs then P (Cons x xs) else True)) Non-distinct lists are never generated

Test generators for predicates Construct generators using data flow analysis: 1 Transform predicates to system of horn clauses x / xs = distinct xs = distinct (Cons x xs) 2 Perform data flow analysis: which variables can be computed, which variables must be generated? 3 Synthesize test data generator

Narrowing-based testing Symbolic execution with demand-driven refinement: Test cases can contain variables If execution cannot proceed, variables are instantiated, again by symbolic terms Pays off if large search spaces can be discarded distinct (Cons 1 (Cons 1 x)) is false for every x No further instantiations for x

Implementations of narrowing Programming language with native narrowing currently still too slow Lazy execution with outer refinement loop results in many recomputations, but fast

Limitations Quickcheck only checks executable specifications: No equality on functions with infinite domain No axiomatic specifications

1 Introduction 2 Isabelle s Standard Proof Methods 3 Sledgehammer 4 Quickcheck: Counterexamples by Testing 5 Nitpick: Counterexamples by SAT Solving

Nitpick Bl. & N. Finite model finder Based on SAT via Kodkod (Alloy s backend) Soundly approximates infinite types

Nitpick: Demo

Nitpick: Architecture Alloy Nitpick Kodkod HOL FOL SAT Solver prop. logic

Nitpick: Basic translation For fixed finite cardinalities (1, 2, 3,..., 10) First-order: τ 1 τ n bool A 1 A n τ 1 τ n τ A 1 A n A + constraint Higher-order args of type σ τ A A }{{} σ times

Nitpick: Datatypes Soundly approximated by finite sets (3-valued logic) Efficient axiomatization: Subterm-closed substructures (Kuncak & Jackson) Examples nat: {0, Suc 0, Suc (Suc 0)} α list: {[], [a 1 ], [a 2 ], [a 2, a 1 ]} Motto: Let the SAT solver spin! (and trust Kodkod s symmetry breaking)

Nitpick: Inductive predicates p is the least solution to p = F (p) for some F Naive idea: Take p = F (p) as p s specification! Unsound in general, but: Sound if p is well-founded Sound for negative occurrences of p Otherwise: Unroll! (cf. Biere, Cimatti, Clarke & Zhu) p 0 = (λx. False) p i+1 = F (p i )

Nitpick: Success stories Algebraic methods (Guttman, Struth & Weber) C++ memory model (Bl., Weber, Batty, Owens & Sarkar) Soundness bugs in TPS and LEO-II Typical fan mail: Last night I got stuck on a goal I was sure was a theorem. After 5 10 minutes I gave Nitpick a try, and within a few secs it had found a splendid counterexample despite the mess of locales and type classes in the context!

AT S Conclusion Isabelle increasingly relies on external tools

Conclusion Isabelle increasingly relies on external tools Many benefits to everybody! To Isabelle users: More proofs for free Quick feedback To external tool users: Foundational approach... within powerful logic To tool developers:

Wish list to tool authors Fast (< 30 s) Scalable Expressive logic Nice proofs/certificates Standard formats (in & out) Easy installation (Linux, Mac, Win) [ Sound, complete ] [ Multicore-aware ]