Efficient Mergesort. Christian Sternagel. August 28, 2014

Similar documents
Efficient Mergesort. Christian Sternagel. October 11, 2017

Programming Languages 3. Definition and Proof by Induction

PROGRAMMING IN HASKELL. CS Chapter 6 - Recursive Functions

Shell CSCE 314 TAMU. Functions continued

CSc 520 Principles of Programming Languages. Currying Revisited... Currying Revisited. 16: Haskell Higher-Order Functions

CSc 372. Comparative Programming Languages. 11 : Haskell Higher-Order Functions. Department of Computer Science University of Arizona

CSCE 314 Programming Languages

Shell CSCE 314 TAMU. Higher Order Functions

COMP4161: Advanced Topics in Software Verification. fun. Gerwin Klein, June Andronick, Ramana Kumar S2/2016. data61.csiro.au

CSCE 314 TAMU Fall CSCE 314: Programming Languages Dr. Flemming Andersen. Haskell Functions

CS 457/557: Functional Languages

CS 320: Concepts of Programming Languages

Theorem Proving Principles, Techniques, Applications Recursion

PROGRAMMING IN HASKELL. Chapter 5 - List Comprehensions

Introduction to Programming: Lecture 6

Watch out for the arrows. Recollecting Haskell, Part V. A start on higher types: Mapping, 1. A start on higher types: Mapping, 2.

Let s Talk About Logic

Isabelle s meta-logic. p.1

Week 5 Tutorial Structural Induction

Foundations of Computation

Overview. A Compact Introduction to Isabelle/HOL. Tobias Nipkow. System Architecture. Overview of Isabelle/HOL

10 Years of Partiality and General Recursion in Type Theory

A Sudoku Solver (1A) Richard Bird Implementation. Young Won Lim 11/15/16

COMPUTER SCIENCE TRIPOS

Lecture 10 September 11, 2017

Standard prelude. Appendix A. A.1 Classes

Lecture 6: Sequential Sorting

Program Calculus Calculational Programming

The CYK Algorithm. We present now an algorithm to decide if w L(G), assuming G to be in Chomsky Normal Form.

Static Contract Checking for Haskell

Testing. Wouter Swierstra and Alejandro Serrano. Advanced functional programming - Lecture 2. [Faculty of Science Information and Computing Sciences]

Programming in Haskell Aug-Nov 2015

COMP 3141 Software System Design and Implementation

A Second Look At ML. Chapter Seven Modern Programming Languages, 2nd ed. 1

1 The smallest free number

Extended Static Checking for Haskell (ESC/Haskell)

Logic, Languages and Programming

Functional Programming and Modeling

Processadors de Llenguatge II. Functional Paradigm. Pratt A.7 Robert Harper s SML tutorial (Sec II)

FUNCTIONAL PEARLS The countdown problem

EDAF40. 2nd June :00-19:00. WRITE ONLY ON ONE SIDE OF THE PAPER - the exams will be scanned in and only the front/ odd pages will be read.

11. Divide and Conquer

Verification of Selection and Heap Sort Using Locales

Functional Programming. Overview. Topics. Definition n-th Fibonacci Number. Graph

CS 320: Concepts of Programming Languages

A Sudoku Solver Pruning (3A)

The List Datatype. CSc 372. Comparative Programming Languages. 6 : Haskell Lists. Department of Computer Science University of Arizona

Cornell University 12 Oct Solutions. (a) [9 pts] For each of the 3 functions below, pick an appropriate type for it from the list below.

Functional Programming. Overview. Topics. Recall λ-terms. Examples

02157 Functional Programming Tagged values and Higher-order list functions

Lists. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

A framework for establishing Strong Eventual Consistency for Conflict-free Replicated Data types

School of Computer Science

Functional Programming TDA 452, DIT 142

Functional Programming

CSE 3302 Programming Languages Lecture 8: Functional Programming

Lecture 19: Functions, Types and Data Structures in Haskell

Programming Languages. Tail Recursion. CSE 130: Winter Lecture 8: NOT TR. last thing function does is a recursive call

02157 Functional Programming. Michael R. Ha. Tagged values and Higher-order list functions. Michael R. Hansen

Course year Typeclasses and their instances

Evaluation of a new recursion induction principle for automated induction

General Computer Science (CH ) Fall 2016 SML Tutorial: Practice Problems

Proving Properties of Recursive Functions and Data Structures. CS 270 Math Foundations of CS Jeremy Johnson

What s in Main. Tobias Nipkow. December 5, Abstract

OCaml. ML Flow. Complex types: Lists. Complex types: Lists. The PL for the discerning hacker. All elements must have same type.

CSc 372. Comparative Programming Languages. 3 : Haskell Introduction. Department of Computer Science University of Arizona

Haske k ll An introduction to Functional functional programming using Haskell Purely Lazy Example: QuickSort in Java Example: QuickSort in Haskell

Lecture 2: List algorithms using recursion and list comprehensions

λ calculus is inconsistent

An introduction introduction to functional functional programming programming using usin Haskell

Patterns The Essence of Functional Programming

Functional Programming in Haskell Part I : Basics

Formalization of a near-linear time algorithm for solving the unification problem

02157 Functional Programming. Michael R. Ha. Disjoint Unions and Higher-order list functions. Michael R. Hansen

CSc 372. Comparative Programming Languages. 8 : Haskell Function Examples. Department of Computer Science University of Arizona

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p.

List are immutable Lists have recursive structure Lists are homogeneous

From Types to Contracts

Principles of Programming Languages

Exercise 1 ( = 24 points)

Functional Programming TDA 452, DIT 142

News. Programming Languages. Complex types: Lists. Recap: ML s Holy Trinity. CSE 130: Spring 2012

2 Programming and Proving

Exercise 1 (2+2+2 points)

Part 20. Sorting and Selection

Lists, Lambdas and Databases

Thinking Induc,vely. COS 326 David Walker Princeton University

Programming Paradigms Written Exam (6 CPs)

CSE 130: Programming Languages. Polymorphism. Ranjit Jhala UC San Diego

Famous Equations. Rewrite Systems. Subject. Beautiful Results. Today. Tentative Course Outline. a 2 +b 2 = c 2 F = ma e iπ +1 = 0.

Chapter 3 Linear Structures: Lists

Introduction to ML. Based on materials by Vitaly Shmatikov. General-purpose, non-c-like, non-oo language. Related languages: Haskell, Ocaml, F#,

Software System Design and Implementation

Shell CSCE 314 TAMU. Haskell Functions

Principles of Programming Languages

Ready, Set, Verify! Applying hs-to-coq to Real-World Haskell Code (Experience Report)

COMP 4161 NICTA Advanced Course. Advanced Topics in Software Verification. Toby Murray, June Andronick, Gerwin Klein

INTRODUCTION TO FUNCTIONAL PROGRAMMING

1. M,M sequential composition: try tactic M; if it succeeds try tactic M. sequential composition (, )

Programming and Proving in Isabelle/HOL

Transcription:

Efficient Mergesort Christian Sternagel August 28, 2014 Abstract We provide a formalization of the mergesort algorithm as used in GHC s Data.List module, proving correctness and stability. Furthermore, experimental data suggests that generated (Haskell-)code for this algorithm is much faster than for previous algorithms available in the Isabelle distribution. theory Efficient-Sort imports /src/hol/library/multiset begin A high-level overview of this formalization as well as some experimental data is to be found in [1]. 1 Chaining Lists by Predicates Make sure that some binary predicate P is satisfied between every two consecutive elements of a list. We call such a list a chain in the following. inductive linked :: ( a a bool) a list bool for P:: a a bool where Nil[iff ]: linked P [] singleton[iff ]: linked P [x] many: P x y = linked P (y#ys) = linked P (x#y#ys) declare eqtruei [OF Nil, code] eqtruei [OF singleton, code] lemma linked-many-eq[simp, code]: linked P (x#y#zs) P x y linked P (y#zs) Take the longest prefix of a list that forms a chain. fun take-chain :: a ( a a bool) a list a list where 1

take-chain a P [] = [] take-chain a P (x#xs) = (if P a x then x # take-chain x P xs else []) Drop the longest prefix of a list that forms a chain. fun drop-chain :: a ( a a bool) a list a list where drop-chain a P [] = [] drop-chain a P (x#xs) = (if P a x then drop-chain x P xs else x#xs) lemma take-chain-drop-chain-id[simp]: take-chain a P xs @ drop-chain a P xs = xs lemma linked-take-chain: linked P (x # take-chain x P xs) lemma linked-rev-take-chain-append: linked P (x#ys) = linked P (rev (take-chain x (λx y. P y x) xs) @ x#ys) lemma linked-rev-take-chain: linked P (rev (take-chain x (λx y. P y x) xs) @ [x]) lemma linked-append: linked P (xs@ys) linked P xs linked P ys (if xs [] ys [] then P (last xs) (hd ys) else True) (is?lhs =?rhs) lemma length-drop-chain[termination-simp]: length (drop-chain b P xs) length xs (is?p b xs) lemma take-chain-map[simp]: take-chain (f x) P (map f xs) = map f (take-chain x (λx y. P (f x) (f y)) xs) 1.1 Sorted is a Special Case of Linked lemma (in linorder) linked-le-sorted-conv[simp]: linked (op ) xs = sorted xs lemma (in linorder) linked-less-imp-sorted: 2

linked (op <) xs = sorted xs abbreviation (in linorder) (input) lt :: ( b a) b b bool where lt key x y key x < key y abbreviation (in linorder) (input) le :: ( b a) b b bool where le key x y key x key y abbreviation (in linorder) (input) gt :: ( b a) b b bool where gt key x y key x > key y abbreviation (in linorder) (input) ge :: ( b a) b b bool where ge key x y key x key y lemma (in linorder) sorted-take-chain-le[simp]: sorted (key x # map key (take-chain x (le key) xs)) lemma (in linorder) sorted-rev-take-chain-gt-append: assumes linked (op <) (key x # map key ys) shows sorted (map key (rev (take-chain x (gt key) xs)) @ key x # map key ys) lemma multiset-of-take-chain-drop-chain[simp]: multiset-of (take-chain x P xs) + multiset-of (drop-chain x P xs) = multiset-of xs lemma multiset-of-drop-chain-take-chain[simp]: multiset-of (drop-chain x P xs) + multiset-of (take-chain x P xs) = multiset-of xs 2 GHC Version of Mergesort In the following we show that the mergesort implementation used in GHC (see http://haskell.org/ghc/docs/7.0-latest/html/libraries/base-4.3.1.0/src/ Data-List.html#sort) is a correct and stable sorting algorithm. Furthermore, experimental data suggests that generated code for this implementation is much more efficient than for the implementation provided by Multiset. context linorder begin Split a list into chunks of ascending and descending parts, where descending parts are reversed. Hence, the result is a list of sorted lists. fun sequences :: ( b a) b list b list list 3

and asc :: ( b a) b ( b list b list) b list b list list and desc :: ( b a) b b list b list b list list where sequences key (a#b#xs) = (if key a > key b then desc key b [a] xs else asc key b (op # a) xs) sequences key xs = [xs] asc key a f (b#bs) = (if key a > key b then asc key b (f op # a) bs else f [a] # sequences key (b#bs)) asc key a f bs = f [a] # sequences key bs desc key a as (b#bs) = (if key a > key b then desc key b (a#as) bs else (a#as) # sequences key (b#bs)) desc key a as bs = (a#as) # sequences key bs fun merge :: ( b a) b list b list b list where merge key (a#as) (b#bs) = (if key a > key b then b # merge key (a#as) bs else a # merge key as (b#bs)) merge key [] bs = bs merge key as [] = as fun merge-pairs :: ( b a) b list list b list list where merge-pairs key (a#b#xs) = merge key a b # merge-pairs key xs merge-pairs key xs = xs lemma merge-nil2 [simp]: merge key as [] = as lemma length-merge[simp]: length (merge key xs ys) = length xs + length ys lemma merge-pairs-length[termination-simp]: length (merge-pairs key xs) length xs fun merge-all :: ( b a) b list list b list where merge-all key [] = [] merge-all key [x] = x merge-all key xs = merge-all key (merge-pairs key xs) lemma multiset-of-merge[simp]: multiset-of (merge key xs ys) = multiset-of xs + multiset-of ys lemma set-merge[simp]: set (merge key xs ys) = set xs set ys 4

lemma multiset-of-concat-merge-pairs[simp]: multiset-of (concat (merge-pairs key xs)) = multiset-of (concat xs) lemma set-concat-merge-pairs[simp]: set (concat (merge-pairs key xs)) = set (concat xs) lemma multiset-of-merge-all[simp]: multiset-of (merge-all key xs) = multiset-of (concat xs) lemma set-merge-all[simp]: set (merge-all key xs) = set (concat xs) lemma sorted-merge[simp]: and sorted (map key ys) shows sorted (map key (merge key xs ys)) lemma sorted-merge-pairs[simp]: assumes x set xs. sorted (map key x) shows x set (merge-pairs key xs). sorted (map key x) lemma sorted-merge-all: assumes x set xs. sorted (map key x) shows sorted (map key (merge-all key xs)) lemma desc-take-chain-drop-chain-conv[simp]: desc key a bs xs = (rev (take-chain a (gt key) xs) @ a # bs) # sequences key (drop-chain a (gt key) xs) lemma asc-take-chain-drop-chain-conv-append: assumes xs ys. f (xs@ys) = f xs @ ys shows asc key a (f op @ as) xs = (f as @ a # take-chain a (le key) xs) # sequences key (drop-chain a (le key) xs) lemma asc-take-chain-drop-chain-conv[simp]: asc key b (op # a) xs = (a # b # take-chain b (le key) xs) # sequences key (drop-chain b (le key) xs) 5

lemma sequences-induct[case-names Nil singleton many]: assumes key. P key [] and key x. P key [x] and key a b xs. (le key a b = P key (drop-chain b (le key) xs)) = ( le key a b = P key (drop-chain b (gt key) xs)) = P key (a#b#xs) shows P key xs lemma sorted-sequences: x set (sequences key xs). sorted (map key x) lemma multiset-of-sequences[simp]: multiset-of (concat (sequences key xs)) = multiset-of xs lemma filter-by-key-drop-chain-gt[simp]: assumes key b key a shows [y drop-chain b (gt key) xs. key a = key y] = [y xs. key a = key y] lemma filter-by-key-take-chain-gt[simp]: assumes key b key a shows [y take-chain b (gt key) xs. key a = key y] = [] lemma filter-take-chain-drop-chain[simp]: filter P (take-chain x Q xs) @ filter P (drop-chain x Q xs) = filter P xs lemma filter-by-key-rev-take-chain-gt-conv[simp]: [y rev (take-chain b (gt key) xs). key x = key y] = [y take-chain b (gt key) xs. key x = key y] lemma filter-by-key-sequences[simp]: [y concat (sequences key xs). key x = key y] = [y xs. key x = key y] (is?p) lemma merge-simp[simp]: shows merge key xs (y#ys) = takewhile (ge key y) xs @ y # merge key (dropwhile (ge key y) xs) ys lemma sorted-map-dropwhile[simp]: 6

shows sorted (map key (dropwhile (ge key y) xs)) lemma sorted-merge-induct[consumes 1, case-names Nil IH ]: and xs. P xs [] and xs y ys. sorted (map key xs) = P (dropwhile (ge key y) xs) ys = P xs (y#ys) shows P xs ys lemma filter-by-key-dropwhile[simp]: shows [y dropwhile (λx. key x key z) xs. key z = key y] = [] (is [y dropwhile?p xs. key z = key y] = []) lemma filter-by-key-takewhile[simp]: shows [y takewhile (λx. key x key z) xs. key z = key y] = [y xs. key z = key y] (is [y takewhile?p xs. key z = key y] = -) lemma filter-takewhile-dropwhile-id[simp]: filter P (takewhile Q xs) @ filter P (dropwhile Q xs) = filter P xs lemma filter-by-key-merge-is-append[simp]: shows [y merge key xs ys. key x = key y] = [y xs. key x = key y] @ [y ys. key x = key y] lemma filter-by-key-merge-pairs[simp]: assumes xs set xss. sorted (map key xs) shows [y concat (merge-pairs key xss). key x = key y] = [y concat xss. key x = key y] lemma filter-by-key-merge-all[simp]: assumes xs set xss. sorted (map key xs) shows [y merge-all key xss. key x = key y] = [y concat xss. key x = key y] lemma filter-by-key-merge-all-sequences[simp]: [x merge-all key (sequences key xs). key y = key x] 7

= [x xs. key y = key x] lemma sort-key-merge-all-sequences: sort-key key = merge-all key sequences key Replace existing code equations for sort-key by merge-all key sequences key. declare sort-key-merge-all-sequences[code] end end References [1] Christian Sternagel. Proof pearl - a mechanized proof of GHC s mergesort. Journal of Automated Reasoning, 2012. doi:10.1007/ s10817-012-9260-7. 8