Efficient Mergesort. Christian Sternagel. October 11, 2017

Similar documents
Efficient Mergesort. Christian Sternagel. August 28, 2014

Theorem Proving Principles, Techniques, Applications Recursion

COMP4161: Advanced Topics in Software Verification. fun. Gerwin Klein, June Andronick, Ramana Kumar S2/2016. data61.csiro.au

Programming Languages 3. Definition and Proof by Induction

A framework for establishing Strong Eventual Consistency for Conflict-free Replicated Data types

Isabelle s meta-logic. p.1

CSCE 314 Programming Languages

Shell CSCE 314 TAMU. Higher Order Functions

Shell CSCE 314 TAMU. Functions continued

PROGRAMMING IN HASKELL. CS Chapter 6 - Recursive Functions

Overview. A Compact Introduction to Isabelle/HOL. Tobias Nipkow. System Architecture. Overview of Isabelle/HOL

CS 457/557: Functional Languages

CSc 520 Principles of Programming Languages. Currying Revisited... Currying Revisited. 16: Haskell Higher-Order Functions

CSc 372. Comparative Programming Languages. 11 : Haskell Higher-Order Functions. Department of Computer Science University of Arizona

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Functions

CS 320: Concepts of Programming Languages

Programming and Proving in

Formalization of a near-linear time algorithm for solving the unification problem

Splay Tree. Tobias Nipkow. April 17, 2016

Programming and Proving in Isabelle/HOL

Functional Programming and Modeling

2 Programming and Proving

MoreIntro_annotated.v. MoreIntro_annotated.v. Printed by Zach Tatlock. Oct 04, 16 21:55 Page 1/10

HOL DEFINING HIGHER ORDER LOGIC LAST TIME ON HOL CONTENT. Slide 3. Slide 1. Slide 4. Slide 2 WHAT IS HIGHER ORDER LOGIC? 2 LAST TIME ON HOL 1

Foundations of Computation

PROGRAMMING IN HASKELL. Chapter 5 - List Comprehensions

MoreIntro.v. MoreIntro.v. Printed by Zach Tatlock. Oct 07, 16 18:11 Page 1/10. Oct 07, 16 18:11 Page 2/10. Monday October 10, 2016 lec02/moreintro.

Program Calculus Calculational Programming

Lecture 6: Sequential Sorting

10 Years of Partiality and General Recursion in Type Theory

A Tutorial Introduction to Structured Isar Proofs

Cornell University 12 Oct Solutions. (a) [9 pts] For each of the 3 functions below, pick an appropriate type for it from the list below.

A Second Look At ML. Chapter Seven Modern Programming Languages, 2nd ed. 1

Watch out for the arrows. Recollecting Haskell, Part V. A start on higher types: Mapping, 1. A start on higher types: Mapping, 2.

Week 5 Tutorial Structural Induction

COMPUTER SCIENCE TRIPOS

Standard prelude. Appendix A. A.1 Classes

Concrete Semantics. A Proof Assistant Approach. Tobias Nipkow Fakultät für Informatik Technische Universität München

Preuves Interactives et Applications

Let s Talk About Logic

EDAF40. 2nd June :00-19:00. WRITE ONLY ON ONE SIDE OF THE PAPER - the exams will be scanned in and only the front/ odd pages will be read.

CS 320: Concepts of Programming Languages

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form.

Lecture 10 September 11, 2017

CIS 500 Software Foundations. Midterm I. (Standard and advanced versions together) October 1, 2013 Answer key

Introduction to Programming: Lecture 6

A Sudoku Solver (1A) Richard Bird Implementation. Young Won Lim 11/15/16

Processadors de Llenguatge II. Functional Paradigm. Pratt A.7 Robert Harper s SML tutorial (Sec II)

Programming and Proving in Isabelle/HOL

Verification of Selection and Heap Sort Using Locales

Basic Foundations of Isabelle/HOL

Static Contract Checking for Haskell

Testing. Wouter Swierstra and Alejandro Serrano. Advanced functional programming - Lecture 2. [Faculty of Science Information and Computing Sciences]

λ calculus is inconsistent

02157 Functional Programming Tagged values and Higher-order list functions

Some aspects of Unix file-system security

1 The smallest free number

Generic Constructors and Eliminators from Descriptions

The List Datatype. CSc 372. Comparative Programming Languages. 6 : Haskell Lists. Department of Computer Science University of Arizona

Functional Programming. Overview. Topics. Definition n-th Fibonacci Number. Graph

Programming in Haskell Aug-Nov 2015

IP Addresses. Cornelius Diekmann, Julius Michaelis, Lars Hupel. August 16, 2018

Introduction to SML Basic Types, Tuples, Lists, Trees and Higher-Order Functions

Patterns The Essence of Functional Programming

02157 Functional Programming. Michael R. Ha. Tagged values and Higher-order list functions. Michael R. Hansen

Programming and Proving in Isabelle/HOL

1. M,M sequential composition: try tactic M; if it succeeds try tactic M. sequential composition (, )

Exercise 1 ( = 24 points)

CSc 372. Comparative Programming Languages. 8 : Haskell Function Examples. Department of Computer Science University of Arizona

Functional Programming

Interactive Proof: Introduction to Isabelle/HOL

According to Larry Wall (designer of PERL): a language by geniuses! for geniuses. Lecture 7: Haskell. Haskell 98. Haskell (cont) - Type-safe!

The Isar Proof Language in 2016

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p.

What s in Main. Tobias Nipkow. December 5, Abstract

Extended Static Checking for Haskell (ESC/Haskell)

List are immutable Lists have recursive structure Lists are homogeneous

Tobias Nipkow, Gerwin Klein. Concrete Semantics. with Isabelle/HOL. October 16, Springer-Verlag

Expr_annotated.v. Expr_annotated.v. Printed by Zach Tatlock

11. Divide and Conquer

Functional Programming TDA 452, DIT 142

Functional Programming. Overview. Topics. Recall λ-terms. Examples

Lecture 4: Higher Order Functions

Introduction to ML. Based on materials by Vitaly Shmatikov. General-purpose, non-c-like, non-oo language. Related languages: Haskell, Ocaml, F#,

02157 Functional Programming. Michael R. Ha. Disjoint Unions and Higher-order list functions. Michael R. Hansen

CIS 500: Software Foundations

Isabelle Primer for Mathematicians

OCaml. ML Flow. Complex types: Lists. Complex types: Lists. The PL for the discerning hacker. All elements must have same type.

Lists. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Logic, Languages and Programming

A Sudoku Solver Pruning (3A)

Introduction to dependent types in Coq

COMP 3141 Software System Design and Implementation

Course year Typeclasses and their instances

Chapter 3 Linear Structures: Lists

Programming Paradigms Written Exam (6 CPs)

Proving Properties of Recursive Functions and Data Structures. CS 270 Math Foundations of CS Jeremy Johnson

Principles of Programming Languages

Exercise 1 (2+2+2 points)

CSE 3302 Programming Languages Lecture 8: Functional Programming

Transcription:

Efficient Mergesort Christian Sternagel October 11, 2017 Abstract We provide a formalization of the mergesort algorithm as used in GHC s Data.List module, proving correctness and stability. Furthermore, experimental data suggests that generated (Haskell-)code for this algorithm is much faster than for previous algorithms available in the Isabelle distribution. theory Efficient-Sort imports HOL Library.Multiset begin A high-level overview of this formalization as well as some experimental data is to be found in [1]. 1 Chaining Lists by Predicates Make sure that some binary predicate P is satisfied between every two consecutive elements of a list. We call such a list a chain in the following. inductive linked :: ( a a bool) a list bool for P:: a a bool where Nil[iff ]: linked P [] singleton[iff ]: linked P [x] many: P x y = linked P (y#ys) = linked P (x#y#ys) declare eqtruei [OF Nil, code] eqtruei [OF singleton, code] lemma linked-many-eq[simp, code]: linked P (x#y#zs) P x y linked P (y#zs) by (blast intro: linked.many elim: linked.cases) Take the longest prefix of a list that forms a chain. fun take-chain :: a ( a a bool) a list a list where 1

take-chain a P [] = [] take-chain a P (x#xs) = (if P a x then x # take-chain x P xs else []) Drop the longest prefix of a list that forms a chain. fun drop-chain :: a ( a a bool) a list a list where drop-chain a P [] = [] drop-chain a P (x#xs) = (if P a x then drop-chain x P xs else x#xs) lemma take-chain-drop-chain-id[simp]: take-chain a P xs @ drop-chain a P xs = xs by (induct xs arbitrary: a) simp-all lemma linked-take-chain: linked P (x # take-chain x P xs) by (induct xs arbitrary: x) simp-all lemma linked-rev-take-chain-append: linked P (x#ys) = linked P (rev (take-chain x (λx y. P y x) xs) @ x#ys) by (induct xs arbitrary: x ys) simp-all lemma linked-rev-take-chain: linked P (rev (take-chain x (λx y. P y x) xs) @ [x]) using linked-rev-take-chain-append[of P x [] xs] by simp lemma linked-append: linked P (xs@ys) linked P xs linked P ys (if xs [] ys [] then P (last xs) (hd ys) else True) (is?lhs =?rhs) proof assume?lhs thus?rhs proof (induct xs) case (Cons x xs) thus?case by (cases xs, simp-all) (cases ys, auto) simp assume?rhs thus?lhs proof (induct xs) case (Cons x xs) thus?case by (cases ys, auto) (cases xs, auto) simp lemma length-drop-chain[termination-simp]: length (drop-chain b P xs) length xs (is?p b xs) proof (induct xs arbitrary: b rule: length-induct) fix xs:: a list and b assume IH : ys. length ys < length xs ( x.?p x ys) 2

show?p b xs proof (cases xs) case (Cons y ys) with IH [rule-format, of ys y] show?thesis by simp simp lemma take-chain-map[simp]: take-chain (f x) P (map f xs) = map f (take-chain x (λx y. P (f x) (f y)) xs) by (induct xs arbitrary: x) simp-all 1.1 Sorted is a Special Case of Linked lemma (in linorder) linked-le-sorted-conv[simp]: linked (op ) xs = sorted xs proof assume sorted xs thus linked (op ) xs proof (induct xs rule: sorted.induct) case (Cons xs x) thus?case by (cases xs) simp-all simp (induct xs rule: linked.induct, simp-all) lemma (in linorder) linked-less-imp-sorted: linked (op <) xs = sorted xs by (induct xs rule: linked.induct) simp-all abbreviation (in linorder) (input) lt :: ( b a) b b bool where lt key x y key x < key y abbreviation (in linorder) (input) le :: ( b a) b b bool where le key x y key x key y abbreviation (in linorder) (input) gt :: ( b a) b b bool where gt key x y key x > key y abbreviation (in linorder) (input) ge :: ( b a) b b bool where ge key x y key x key y lemma (in linorder) sorted-take-chain-le[simp]: sorted (key x # map key (take-chain x (le key) xs)) using linked-take-chain[of op, of key x map key xs] by simp lemma (in linorder) sorted-rev-take-chain-gt-append: assumes linked (op <) (key x # map key ys) shows sorted (map key (rev (take-chain x (gt key) xs)) @ key x # map key ys) using linked-less-imp-sorted[of linked-rev-take-chain-append[of assms, of map key xs]] by (simp add: rev-map) lemma mset-take-chain-drop-chain[simp]: 3

mset (take-chain x P xs) + mset (drop-chain x P xs) = mset xs by (induct xs arbitrary: x) (simp-all add: ac-simps) lemma mset-drop-chain-take-chain[simp]: mset (drop-chain x P xs) + mset (take-chain x P xs) = mset xs by (induct xs arbitrary: x) (simp-all add: ac-simps) 2 GHC Version of Mergesort In the following we show that the mergesort implementation used in GHC (see http://haskell.org/ghc/docs/7.0-latest/html/libraries/base-4.3.1.0/src/ Data-List.html#sort) is a correct and stable sorting algorithm. Furthermore, experimental data suggests that generated code for this implementation is much more efficient than for the implementation provided by Multiset. context linorder begin Split a list into chunks of ascending and descending parts, where descending parts are reversed. Hence, the result is a list of sorted lists. fun sequences :: ( b a) b list b list list and asc :: ( b a) b ( b list b list) b list b list list and desc :: ( b a) b b list b list b list list where sequences key (a#b#xs) = (if key a > key b then desc key b [a] xs else asc key b (op # a) xs) sequences key xs = [xs] asc key a f (b#bs) = (if key a > key b then asc key b (f op # a) bs else f [a] # sequences key (b#bs)) asc key a f bs = f [a] # sequences key bs desc key a as (b#bs) = (if key a > key b then desc key b (a#as) bs else (a#as) # sequences key (b#bs)) desc key a as bs = (a#as) # sequences key bs fun merge :: ( b a) b list b list b list where merge key (a#as) (b#bs) = (if key a > key b then b # merge key (a#as) bs else a # merge key as (b#bs)) merge key [] bs = bs merge key as [] = as fun merge-pairs :: ( b a) b list list b list list where merge-pairs key (a#b#xs) = merge key a b # merge-pairs key xs merge-pairs key xs = xs lemma merge-nil2 [simp]: merge key as [] = as by (cases as) simp-all 4

lemma length-merge[simp]: length (merge key xs ys) = length xs + length ys by (induct xs ys rule: merge.induct) simp-all lemma merge-pairs-length[termination-simp]: length (merge-pairs key xs) length xs by (induct xs rule: merge-pairs.induct) simp-all fun merge-all :: ( b a) b list list b list where merge-all key [] = [] merge-all key [x] = x merge-all key xs = merge-all key (merge-pairs key xs) lemma mset-merge[simp]: mset (merge key xs ys) = mset xs + mset ys by (induct xs ys rule: merge.induct) (simp-all add: ac-simps) lemma set-merge[simp]: set (merge key xs ys) = set xs set ys unfolding set-mset-mset[symmetric] by simp lemma mset-concat-merge-pairs[simp]: mset (concat (merge-pairs key xs)) = mset (concat xs) by (induct xs rule: merge-pairs.induct) (auto simp: ac-simps) lemma set-concat-merge-pairs[simp]: set (concat (merge-pairs key xs)) = set (concat xs) unfolding set-mset-mset[symmetric] by simp lemma mset-merge-all[simp]: mset (merge-all key xs) = mset (concat xs) by (induct xs rule: merge-all.induct) (simp-all add: ac-simps) lemma set-merge-all[simp]: set (merge-all key xs) = set (concat xs) unfolding set-mset-mset[symmetric] by simp lemma sorted-merge[simp]: assumes sorted (map key xs) and sorted (map key ys) shows sorted (map key (merge key xs ys)) using assms by (induct xs ys rule: merge.induct) (auto simp: sorted-cons) lemma sorted-merge-pairs[simp]: assumes x set xs. sorted (map key x) shows x set (merge-pairs key xs). sorted (map key x) using assms by (induct xs rule: merge-pairs.induct) simp-all lemma sorted-merge-all: 5

assumes x set xs. sorted (map key x) shows sorted (map key (merge-all key xs)) using assms by (induct xs rule: merge-all.induct) simp-all lemma desc-take-chain-drop-chain-conv[simp]: desc key a bs xs = (rev (take-chain a (gt key) xs) @ a # bs) # sequences key (drop-chain a (gt key) xs) proof (induct xs arbitrary: a bs) case (Cons x xs) thus?case by (cases key a < key x) simp-all simp lemma asc-take-chain-drop-chain-conv-append: assumes xs ys. f (xs@ys) = f xs @ ys shows asc key a (f op @ as) xs = (f as @ a # take-chain a (le key) xs) # sequences key (drop-chain a (le key) xs) using assms proof (induct xs arbitrary: as a) case (Cons x xs) show?case proof (cases le key a x) case False with Cons show?thesis by auto case True with Cons(1 )[of x as@[a]] and Cons(2 ) show?thesis by (simp add: o-def ) simp lemma asc-take-chain-drop-chain-conv[simp]: asc key b (op # a) xs = (a # b # take-chain b (le key) xs) # sequences key (drop-chain b (le key) xs) proof let?f = op # a have xs ys. (op # a) (xs@ys) = (op # a) xs @ ys by simp from asc-take-chain-drop-chain-conv-append[of?f key b [] xs, OF this] show?thesis by (simp add: o-def ) lemma sequences-induct[case-names Nil singleton many]: assumes key. P key [] and key x. P key [x] and key a b xs. (le key a b = P key (drop-chain b (le key) xs)) = ( le key a b = P key (drop-chain b (gt key) xs)) = P key (a#b#xs) shows P key xs using assms by (induction-schema) (pat-completeness, lexicographic-order) 6

lemma sorted-sequences: x set (sequences key xs). sorted (map key x) proof (induct key xs rule: sequences-induct) case (many key a b xs) thus?case using sorted-rev-take-chain-gt-append[of key b [a] xs] by (cases le key a b) auto simp-all lemma mset-sequences[simp]: mset (concat (sequences key xs)) = mset xs by (induct key xs rule: sequences-induct) (simp-all add: ac-simps) lemma filter-by-key-drop-chain-gt[simp]: assumes key b key a shows [y drop-chain b (gt key) xs. key a = key y] = [y xs. key a = key y] using assms by (induct xs arbitrary: b) auto lemma filter-by-key-take-chain-gt[simp]: assumes key b key a shows [y take-chain b (gt key) xs. key a = key y] = [] using assms by (induct xs arbitrary: b) auto lemma filter-take-chain-drop-chain[simp]: filter P (take-chain x Q xs) @ filter P (drop-chain x Q xs) = filter P xs by (simp add: filter-append[symmetric]) lemma filter-by-key-rev-take-chain-gt-conv[simp]: [y rev (take-chain b (gt key) xs). key x = key y] = [y take-chain b (gt key) xs. key x = key y] by (induct xs arbitrary: b) auto lemma filter-by-key-sequences[simp]: [y concat (sequences key xs). key x = key y] = [y xs. key x = key y] (is?p) by (induct key xs rule: sequences-induct) auto lemma merge-simp[simp]: assumes sorted (map key xs) shows merge key xs (y#ys) = takewhile (ge key y) xs @ y # merge key (dropwhile (ge key y) xs) ys using assms by (induct xs arbitrary: y ys) (auto simp: sorted-cons) lemma sorted-map-dropwhile[simp]: assumes sorted (map key xs) shows sorted (map key (dropwhile (ge key y) xs)) using sorted-dropwhile[of assms] by (simp add: dropwhile-map o-def ) lemma sorted-merge-induct[consumes 1, case-names Nil IH ]: 7

assumes sorted (map key xs) and xs. P xs [] and xs y ys. sorted (map key xs) = P (dropwhile (ge key y) xs) ys = P xs (y#ys) shows P xs ys using assms(2 ) assms(1 ) by (induction-schema) (case-tac ys, simp-all, lexicographic-order) lemma filter-by-key-dropwhile[simp]: assumes sorted (map key xs) shows [y dropwhile (λx. key x key z) xs. key z = key y] = [] (is [y dropwhile?p xs. key z = key y] = []) using assms proof (induct xs rule: rev-induct) case Nil thus?case by simp case (snoc x xs) hence IH : [y dropwhile?p xs. key z = key y] = [] by (auto simp: sorted-append) show?case proof (cases z set xs.?p z) case True show?thesis using dropwhile-append2 [of xs?p [x]] and True by simp case False then obtain a where a: a set xs?p a by auto show?thesis unfolding dropwhile-append1 [of a xs?p, OF a] using snoc and False by (auto simp: IH sorted-append) lemma filter-by-key-takewhile[simp]: assumes sorted (map key xs) shows [y takewhile (λx. key x key z) xs. key z = key y] = [y xs. key z = key y] (is [y takewhile?p xs. key z = key y] = -) using assms proof (induct xs rule: rev-induct) case Nil thus?case by simp case (snoc x xs) hence IH : [y takewhile?p xs. key z = key y] = [y xs. key z = key y] by (auto simp: sorted-append) show?case proof (cases z set xs.?p z) case True show?thesis 8

using takewhile-append2 [of xs?p [x]] and True by simp case False then obtain a where a: a set xs?p a by auto show?thesis unfolding takewhile-append1 [of a xs?p, OF a] using snoc and False by (auto simp: IH sorted-append) lemma filter-takewhile-dropwhile-id[simp]: filter P (takewhile Q xs) @ filter P (dropwhile Q xs) = filter P xs by (simp add: filter-append[symmetric]) lemma filter-by-key-merge-is-append[simp]: assumes sorted (map key xs) shows [y merge key xs ys. key x = key y] = [y xs. key x = key y] @ [y ys. key x = key y] using assms by (induct xs ys rule: sorted-merge-induct) auto lemma filter-by-key-merge-pairs[simp]: assumes xs set xss. sorted (map key xs) shows [y concat (merge-pairs key xss). key x = key y] = [y concat xss. key x = key y] using assms by (induct xss rule: merge-pairs.induct) simp-all lemma filter-by-key-merge-all[simp]: assumes xs set xss. sorted (map key xs) shows [y merge-all key xss. key x = key y] = [y concat xss. key x = key y] using assms by (induct xss rule: merge-all.induct) simp-all lemma filter-by-key-merge-all-sequences[simp]: [x merge-all key (sequences key xs). key y = key x] = [x xs. key y = key x] using sorted-sequences[of key xs] by simp lemma sort-key-merge-all-sequences: sort-key key = merge-all key sequences key by (intro ext properties-for-sort-key) (simp-all add: sorted-merge-all[of sorted-sequences]) Replace existing code equations for sort-key by merge-all key sequences key. declare sort-key-merge-all-sequences[code] end hide-const lt le gt ge 9

end References [1] Christian Sternagel. Proof pearl - a mechanized proof of GHC s mergesort. Journal of Automated Reasoning, 2012. doi:10.1007/ s10817-012-9260-7. 10