A Simple Supercompiler Formally Verified in Coq

Similar documents
A Simple Supercompiler Formally Verified in Coq

MoreIntro.v. MoreIntro.v. Printed by Zach Tatlock. Oct 07, 16 18:11 Page 1/10. Oct 07, 16 18:11 Page 2/10. Monday October 10, 2016 lec02/moreintro.

Equations: a tool for dependent pattern-matching

MoreIntro_annotated.v. MoreIntro_annotated.v. Printed by Zach Tatlock. Oct 04, 16 21:55 Page 1/10

Inductive data types

Denotational Semantics. Domain Theory

c constructor P, Q terms used as propositions G, H hypotheses scope identifier for a notation scope M, module identifiers t, u arbitrary terms

Coq quick reference. Category Example Description. Inductive type with instances defined by constructors, including y of type Y. Inductive X : Univ :=

COP4020 Programming Languages. Functional Programming Prof. Robert van Engelen

Meta-programming with Names and Necessity p.1

A Small Interpreted Language

CIS 500: Software Foundations

Programming Languages Lecture 14: Sum, Product, Recursive Types

Inductive Definitions, continued

Inductive data types (I)

Programming with (co-)inductive types in Coq

An experiment with variable binding, denotational semantics, and logical relations in Coq. Adam Chlipala University of California, Berkeley

Lecture 14: Recursive Types

CIS 500: Software Foundations

3 Pairs and Lists. 3.1 Formal vs. Informal Proofs

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Chapter 11 p. 1/38

Coq. LASER 2011 Summerschool Elba Island, Italy. Christine Paulin-Mohring

Introduction to dependent types in Coq

CIS 500 Software Foundations. Midterm I. (Standard and advanced versions together) October 1, 2013 Answer key

COMP 382: Reasoning about algorithms

Coq projects for type theory 2018

Modular dependent induction in Coq, Mendler-style. Paolo Torrini

An Explicit-Continuation Metacircular Evaluator

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p.

Functional Languages. Hwansoo Han

CS 314 Principles of Programming Languages

LECTURE 16. Functional Programming

Formal Systems and their Applications

Induction in Coq. Nate Foster Spring 2018

Coq Summer School, Session 2 : Basic programming with numbers and lists. Pierre Letouzey

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Flat (Draft) Pasqualino Titto Assini 27 th of May 2016

Metaprogramming assignment 3

4 Programming with Types

On Agda JAIST/AIST WS CVS/AIST Yoshiki Kinoshita, Yoriyuki Yamagata. Agenda

Intrinsically Typed Reflection of a Gallina Subset Supporting Dependent Types for Non-structural Recursion of Coq

CIS 500: Software Foundations

Preuves Interactives et Applications

A Coq tutorial for confirmed Proof system users

Calculus of Inductive Constructions

Inductive datatypes in HOL. lessons learned in Formal-Logic Engineering

Proving Properties on Programs From the Coq Tutorial at ITP 2015

Infinite Objects and Proofs 1

Combining Programming with Theorem Proving

Programming Languages Lecture 15: Recursive Types & Subtyping

A CRASH COURSE IN SEMANTICS

Homework 3 COSE212, Fall 2018

Embedding logics in Dedukti

CSE341: Programming Languages Lecture 17 Implementing Languages Including Closures. Dan Grossman Autumn 2018

15 212: Principles of Programming. Some Notes on Continuations

Verification in Coq. Prof. Clarkson Fall Today s music: Check Yo Self by Ice Cube

Introduction to Co-Induction in Coq

From natural numbers to the lambda calculus

CMSC 330: Organization of Programming Languages

Pierce Ch. 3, 8, 11, 15. Type Systems

L04_in_class.v. L04_in_class.v. Printed by Zach Tatlock

Documentation for LISP in BASIC

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

1 The language χ. Models of Computation

Heq: a Coq library for Heterogeneous Equality

Theorem Proving Principles, Techniques, Applications Recursion

From Math to Machine A formal derivation of an executable Krivine Machine Wouter Swierstra Brouwer Seminar

Verified Characteristic Formulae for CakeML. Armaël Guéneau, Magnus O. Myreen, Ramana Kumar, Michael Norrish April 27, 2017

SOFTWARE VERIFICATION AND COMPUTER PROOF (lesson 1) Enrico Tassi Inria Sophia-Antipolis

Autosubst Manual. May 20, 2016

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Section 17.2

Integration of SMT Solvers with ITPs There and Back Again

Formalization of Array Combinators and their Fusion Rules in Coq

Analysis of dependent types in Coq through the deletion of the largest node of a binary search tree

Ornaments in ML. Thomas Williams, Didier Rémy. April 18, Inria - Gallium

CSE-505: Programming Languages. Lecture 20.5 Recursive Types. Zach Tatlock 2016

Lesson 4 Typed Arithmetic Typed Lambda Calculus

From the λ-calculus to Functional Programming Drew McDermott Posted

Ideas over terms generalization in Coq

Name: Entry: Gp: 1. CSL 102: Introduction to Computer Science. Minor 2 Mon 21 Mar 2005 VI 301(Gps 1-6)& VI 401(Gps 7-8) 9:30-10:30 Max Marks 40

Adam Chlipala University of California, Berkeley ICFP 2006

Expr_annotated.v. Expr_annotated.v. Printed by Zach Tatlock

CSE413: Programming Languages and Implementation Racket structs Implementing languages with interpreters Implementing closures

SML A F unctional Functional Language Language Lecture 19

Vector Clocks in Coq

Module 8: Local and functional abstraction

Pretty-Big-Step Semantics

Lambda Calculus. Type Systems, Lectures 3. Jevgeni Kabanov Tartu,

CS-XXX: Graduate Programming Languages. Lecture 17 Recursive Types. Dan Grossman 2012

Formally Certified Satisfiability Solving

Introduction to OCaml

CS 314 Principles of Programming Languages. Lecture 16

Lecture Notes on Data Representation

3.4 Deduction and Evaluation: Tools Conditional-Equational Logic

The Calculator CS571. Abstract syntax of correct button push sequences. The Button Layout. Notes 16 Denotational Semantics of a Simple Calculator

λ calculus is inconsistent

CS 456 (Fall 2018) Scribe Notes: 2

COMP80 Lambda Calculus Programming Languages Slides Courtesy of Prof. Sam Guyer Tufts University Computer Science History Big ideas Examples:

Programming in Standard ML: Continued

Transcription:

A Simple Supercompiler Formally Verified in Coq IGE+XAO Balkan 4 July 2010 / META 2010

Outline 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

Outline Introduction 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

. Supercompiler? Formal verification? Important for non-experimental supercompilers Fresh look over supercompilation process In Coq? A matter of taste Non-critical (very few Coq-specific features used) Simple? Toy language...... over a toy data domain (simple binary trees). Cut supercompilation into smaller pieces...... with modular proofs of correctness. But: less powerful supercompiler

. Supercompiler? Formal verification? Important for non-experimental supercompilers Fresh look over supercompilation process In Coq? A matter of taste Non-critical (very few Coq-specific features used) Simple? Toy language...... over a toy data domain (simple binary trees). Cut supercompilation into smaller pieces...... with modular proofs of correctness. But: less powerful supercompiler

. Supercompiler? Formal verification? Important for non-experimental supercompilers Fresh look over supercompilation process In Coq? A matter of taste Non-critical (very few Coq-specific features used) Simple? Toy language...... over a toy data domain (simple binary trees). Cut supercompilation into smaller pieces...... with modular proofs of correctness. But: less powerful supercompiler

. Supercompiler? Formal verification? Important for non-experimental supercompilers Fresh look over supercompilation process In Coq? A matter of taste Non-critical (very few Coq-specific features used) Simple? Toy language...... over a toy data domain (simple binary trees). Cut supercompilation into smaller pieces...... with modular proofs of correctness. But: less powerful supercompiler

. Supercompiler? Formal verification? Important for non-experimental supercompilers Fresh look over supercompilation process In Coq? A matter of taste Non-critical (very few Coq-specific features used) Simple? Toy language...... over a toy data domain (simple binary trees). Cut supercompilation into smaller pieces...... with modular proofs of correctness. But: less powerful supercompiler

Outline Introduction 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

(classical). Classical Organization of Supercompilation!

(this work). Simple normalization ( deforestation unfolding) - normconv Example term := IfNil Id Id (Tl # Hd). Eval compute in (ntrm2trm (normconv (term $ term))). = IfNil Id (IfNil Id Id (Tl # Hd)) (Hd # Tl) : Trm Theorem normconvpreserveseval: forall (t: Trm) (v: Val), evalnt (normconv t) v = evalt t v. Propagation of test outcomes inside if-branches - norm Eval compute in (ntrm2trm (norm (term $ term))). = IfNil Id Nil (Hd # Tl) : Trm Theorem normpreserveseval: forall t v, evalnt (norm t) v = evalt t v. Single-step loop unrolling - unrolltoinit Ensuring termination - firstembedded Multi-step loop unrolling - sscp

(this work). Simple normalization ( deforestation unfolding) - normconv Example term := IfNil Id Id (Tl # Hd). Eval compute in (ntrm2trm (normconv (term $ term))). = IfNil Id (IfNil Id Id (Tl # Hd)) (Hd # Tl) : Trm Theorem normconvpreserveseval: forall (t: Trm) (v: Val), evalnt (normconv t) v = evalt t v. Propagation of test outcomes inside if-branches - norm Eval compute in (ntrm2trm (norm (term $ term))). = IfNil Id Nil (Hd # Tl) : Trm Theorem normpreserveseval: forall t v, evalnt (norm t) v = evalt t v. Single-step loop unrolling - unrolltoinit Ensuring termination - firstembedded Multi-step loop unrolling - sscp

(this work). Simple normalization ( deforestation unfolding) - normconv Example term := IfNil Id Id (Tl # Hd). Eval compute in (ntrm2trm (normconv (term $ term))). = IfNil Id (IfNil Id Id (Tl # Hd)) (Hd # Tl) : Trm Theorem normconvpreserveseval: forall (t: Trm) (v: Val), evalnt (normconv t) v = evalt t v. Propagation of test outcomes inside if-branches - norm Eval compute in (ntrm2trm (norm (term $ term))). = IfNil Id Nil (Hd # Tl) : Trm Theorem normpreserveseval: forall t v, evalnt (norm t) v = evalt t v. Single-step loop unrolling - unrolltoinit Ensuring termination - firstembedded Multi-step loop unrolling - sscp

Outline Introduction 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

. Coq: Proof assistant based on CoC + inductive types Simplified point of view: Total(!) functional programming language Inductive datatypes (Inductive... :=.......) Pattern matching (match... with... =>...... end) Structural recursion (top-level - Fixpoint, local - fix) lambda-functions (fun... =>...) Interactive proofs in intuitionistic logic The usual logical quantifiers/connectives forall, exists, ->, /\, \/, ~, <-> Interactive tactics for proofs by induction, rewriting, etc. Not used: dependent types(!), co-induction, classical logic

Coq Examples. Inductive datatypes Inductive nat: Set := O : nat S : nat -> nat. Definitions by structural recursion Fixpoint power (n m : nat) {struct m} : nat := match m with 0 => 1 S m1 => n * power n m1 end. Eval compute in (power 2 5). = 32 : nat Partial evaluation Eval cbv beta iota delta -[mult] in (fun n => power n 3). = fun n : nat => n * (n * (n * 1)) :nat -> nat

Outline Introduction 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

Expression Sublanguage Syntax. Data domain: simple binary trees (S-expressions with 1 atom) Inductive Val: Set := VNil: Val VCons: Val -> Val -> Val VBottom: Val. Expression language: tree constructors and selectors, identity, function composition, if-expressions Inductive Selector: Set := HD TL. Inductive Trm: Set := Nil: Trm Cons: Trm -> Trm -> Trm Sel: Selector -> Trm Id: Trm Cmp: Trm -> Trm -> Trm IfNil: Trm -> Trm -> Trm -> Trm Bottom. Infix "$" := Cmp (at level 60, right associativity). Notation Hd := (Sel HD). Notation Tl := (Sel TL). Infix "#" := Cons (at level 62, right associativity).

Expression Sublanguage Semantics. Definition evalsel (sel: Selector) (v: Val) : Val := match v with VCons v1 v2 => match sel with HD => v1 TL => v2 end _ => VBottom end. Fixpoint evalt (t: Trm) (v: Val) {struct t} : Val := match t with Nil => VNil Bottom => VBottom Cons t1 t2 => VCons (evalt t1 v) (evalt t2 v) Sel sel => evalsel sel v Id => v Cmp t1 t2 => evalt t1 (evalt t2 v) IfNil t1 t2 t3 => match evalt t1 v with VNil => evalt t2 v VCons => evalt t3 v VBottom => VBottom end end.

Simple Normalization. Based on simplifications like: Cmp Hd (Cons x y) x IfNil (IfNil x y z) u v IfNil x (IfNil y u v) (IfNil z u v) Produces terms in normal form: Inductive NTrm: Set := NNil: NTrm NCons: NTrm -> NTrm -> NTrm NSelCmp: list Selector -> NTrm NIfNil: list Selector -> NTrm -> NTrm -> NTrm NBottom: NTrm.... which can injected back into full-blown terms: Fixpoint ntrm2trm (nt: NTrm) : Trm :=... Definition evalnt nt v := evalt (ntrm2trm nt) v. Structurally-recursive implementation: Fixpoint normconv (t: Trm) : NTrm :=...

Simple Normalization Correctness. Tricky point - no full function composition in normal forms, yet we can still compose them: Definition normncmp : NTrm -> NTrm -> NTrm :=... Lemma normncmppreserveseval: forall nt1 nt2 v, evalnt (normncmp nt1 nt2) v = evalnt nt1 (evalnt nt2 v). With the help of some other (simpler) lemmas like: Lemma normselsncmppreservesevalt: forall sels nt v, evalt (ntrm2trm (normselsncmp sels nt)) v = evalsels sels (evalt (ntrm2trm nt) v). Lemma normncmpifif: forall sels1 sels2 nt1_1 nt1_2 nt2_1 nt2_2, let nt1 := NIfNil sels1 nt1_1 nt1_2 in normncmp nt1 (NIfNil sels2 nt2_1 nt2_2) = NIfNil sels2 (normncmp nt1 nt2_1) (normncmp nt1 nt2_2).... we can establish correctness of simple normalization: Theorem normconvpreserveseval: forall t v, evalnt (normconv t) v = evalt t v.

Outline Introduction 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

Poor-man Explicit Substitutions. Primitives for pairing and function composition give us: Variable-free programming Simple form of explicit substitutions Example: IfNil x1 x2 x3 has 3 free variables. Pack them into an input tree: x1 # x2 # x3 Replace the original expression with: IfNil Hd (Hd $ Tl) (Tl $ Tl) Computing object-level representations of substitutions: replaceat (pos: list Selector) (t tr: NTrm): NTrm Now, we can represent and apply the substitution of Nil for x2 in the above expression: let nt1 := normconv (IfNil Hd (Hd $ Tl) (Tl $ Tl)) in let nt2 := normconv Nil in let x2p := TL::HD::nil in ntrm2trm (normncmp nt1 (replaceat x2p (normconv Id) nt2))) = IfNil Hd Nil (Tl $ Tl)

. Internalize propagation of if-condition outcome: Definition setnilat (sels: list Selector): NTrm := replaceat sels (NSelCmp nil) NNil. Definition setconsat (sels: list Selector) : NTrm := replaceat sels (NSelCmp nil) (NCons (NSelCmp (sels ++ HD::nil)) (NSelCmp (sels ++ TL::nil))). Fixpoint propagateifcond (nt: NTrm) {struct nt} : NTrm :=... NIfNil sels nt1 nt2 => let nt1a := propagateifcond nt1 in let nt2a := propagateifcond nt2 in let nt1b := normncmp nt1a (setnilat sels) in let nt2b := normncmp nt2a (setconsat sels) in NIfNil sels nt1b nt2b...

Propagation of Test Outcomes Correctness. Test outcome propagation on top of simple normalization Easier to give structurally recursive (total) definition Easier to prove correctness on top of normalization correctness proof Theorem propagateifcondpreserveseval: forall nt v, evalnt (propagateifcond nt) v = evalnt nt v. Definition norm (t: Trm) := propagateifcond (normconv t). Theorem normpreserveseval: forall t v, evalnt (norm t) v = evalt t v.

Outline Introduction 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

SWhile Language. Expression language - not Turing-complete Embed in simple imperative language ( SWhile ) with: while-loops single (implicit) variable Inductive SWhileStmt: Set := Assign: Trm -> SWhileStmt Seq: SWhileStmt -> SWhileStmt -> SWhileStmt While: Trm -> SWhileStmt -> SWhileStmt. Infix ";" := Seq (at level 65, right associativity). Notation " VAR <- e" := (Assign e) (at level 64). Notation " WHILE cond DO body DONE ":=(While cond body) Further simplification - single while-loop (analog to Kleene normal forms in recursion theory) VAR <- initexp knf; WHILE condexp knf DO VAR VAR <- finalexp knf <- bodyexp knf DONE;

SWhile Language Semantics. SWhile semantics in Coq? inductive relations (elegant, non-executable) or, a folk trick: replace a partial function f : X -> Y with a total function f : nat -> X -> option Y, where f d x = Some y -> f x = y ((f x) is defined) f d x = None means (f x) cannot be computed in stack depth d (f is structurally recursive on d) Total quasi-interpreter for single-loop programs: Definition evalknf (d: nat) (knf: KNFProg) (v: Val) : option Val :=...

Loop Unrolling Analog to call unfolding in SWhile : loop unrolling Only a simple form of (single-step) unrolling considered replace: VAR <- initexp knf; WHILE condexp knf DO VAR VAR <- finalexp knf with: <- bodyexp knf DONE; VAR <- ntrm2trm (norm (IfNil (condexp knf) Id (bodyexp knf) $ initexp knf)); WHILE condexp knf DO VAR <- bodyexp knf DONE; VAR <- finalexp knf Process tree replaced by a stream of repeated unrollings

Final Supercompiler, Correctness. Whistle the usual one: homeomorphic embedding No need for folding, generalization in this (over-)simplified setting Final supercompiler Definition sscp... (n : nat) (knf : KNFProg) : option KNFProg :=... Correctness: a) Totality (using Kruskal s Tree Theorem as an axiom) Theorem sscp_total: forall b knf, exists n, exists knf1, sscp b n knf = Some knf1.... b) Preservation of semantics Theorem sscp_correct: forall b knf knf1 n v1 v2, stricttrm (condexp knf) -> sscp b n knf = Some knf1 -> ((exists d1, evalknf d1 knf v1 = Some v2) <-> (exists d2, evalknf d2 knf1 v1 = Some v2)).

Example of Supercompilation. Consider the usual Lisp-like encoding of lists and booleans as S-expressions (WFalse := Nil, WTrue := Nil # Nil, etc.) A program to check if the input list contains WFalse: VAR <- Id # WFalse; {VAR = input # output} WHILE Hd DO VAR <- IfNil (Hd $ Hd) (Nil # WTrue) (Tl $ Hd # Tl) DONE; VAR <- Tl {VAR = output} Its specialized version non-empty input list prepended with its negated head: Definition listhaswfalse_knf_negduphd := let negate x := IfNil x WTrue WFalse in modifyknfinput listhaswfalse_knf (IfNil Id Id (negate Hd # Id)).

Example of Supercompilation (cont.) Result of supercompiling the specialized version: VAR <- IfNil Id (Nil # WFalse) (IfNil Hd (Nil # WTrue) (Nil # WTrue)); WHILE Hd DO VAR <- IfNil (Hd $ Hd) (Nil # WTrue) (Tl $ Hd # Tl) DONE; VAR <- Tl... and with superfluous IfNil removed further by hand: VAR <- IfNil Id (Nil # WFalse) (Nil # WTrue); WHILE Hd DO VAR <- IfNil (Hd $ Hd) (Nil # WTrue) (Tl $ Hd # Tl) DONE; VAR <- Tl Loop still here but a simple static post-processing could remove it

Example of Supercompilation (end) We can define a downsized version of the supercompiler, without information propagation: sscp Its result on the example: VAR <- IfNil Id (IfNil Id (IfNil Id Id (IfNil Hd (Nil # Nil) Nil # Id) # Nil) (IfNil Id (IfNil Hd (Nil # Nil # Nil) (IfNil Id Tl Id # Nil)) (IfNil Hd (IfNil Id Tl Id # Nil) (Nil # Nil # Nil)))) (IfNil Id (IfNil Hd (Nil # Nil # Nil) (IfNil Id Tl Id # Nil)) (IfNil Hd (IfNil Id Tl Id # Nil) (Nil # Nil # Nil))); WHILE Hd DO VAR <- IfNil (Hd $ Hd) (Nil # Nil # Nil) (Tl $ Hd # Tl) DONE; VAR <- Tl

Outline Introduction Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation Expression Language Tests, Extensional Equivalence Normalization can simplify test generation Inductive NTrm: Set := NNil: NTrm NCons: NTrm -> NTrm -> NTrm NSelCmp: list Selector -> NTrm NIfNil: list Selector -> NTrm -> NTrm -> NTrm NBottom: NTrm. Idea: expressions can extract information from input tree only through selector compositions max. length of selector compositions = N Expression cannot look deeper than N inside input tree Trees of depth N should suffice as tests Finite tests sets extensional equivalence decidable

Outline Introduction Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

More Realistic Language Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation More powerful forms of loop unrolling? Add function calls to expression language Inductive Trm: Set :=... Ref: FunRef -> Trm. It becomes Turing-complete Still possible to: Isolate simple normalization, and information propagation Implement them by structural recursion Complications: How to specify semantics in Coq? Normal forms - slightly more complicated We need folding and generalization now Termination proof of full supercompiler with generalization more complicated(?)

Outline Introduction Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation 1 Introduction 2 3 Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation

Test Generation, Extensional Equivalence More Realistic Language Use Information Propagation in Isolation Use Information Propagation in Isolation Apply (positive) information propagation in cases where we do need the full power of supercompilation (with its complications, like whistle, etc.) In systems like Coq itself; example: Fixpoint listhasfalse (l: list bool) : bool := match l with nil => false false::_ => true true::l1 => listhasfalse l1 end. Goal forall b l, listhasfalse (b::negb b::l) = true. compute. fold listhasfalse.... forall (b : bool) (l : list bool), (if b then if if b then false else true then listhasfalse l else true else true) = true Strengthen stream fusion?...

First formal verification of a supercompiler. Helped by a more fine-grained decomposition of the supercompilation process. Structurally recursive deforestation and information propagation, with separate proofs. Simple form of explicit substitutions also helpful. Outlook Extend to more realistic languages, more powerful transformations. Applications to test generation, compiler optimizations. Some day: self-verifiable supercompiler