Mergesort again. 1. Split the list into two equal parts

Similar documents
Mergesort again. 1. Split the list into two equal parts

Better sorting algorithms (Weiss chapter )

Data structures. More sorting. Dr. Alex Gerdes DIT961 - VT 2018

Real-world sorting (not on exam)

Quicksort (Weiss chapter 8.6)

Sorting. Weiss chapter , 8.6

Divide and Conquer Sorting Algorithms and Noncomparison-based

CS240 Fall Mike Lam, Professor. Quick Sort

COSC242 Lecture 7 Mergesort and Quicksort

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

CS 171: Introduction to Computer Science II. Quicksort

Faster Sorting Methods

Algorithmic Analysis and Sorting, Part Two

Sorting (Weiss chapter )

Algorithmic Analysis and Sorting, Part Two. CS106B Winter

Sorting. Task Description. Selection Sort. Should we worry about speed?

Sorting Goodrich, Tamassia Sorting 1

Sorting Algorithms. + Analysis of the Sorting Algorithms

Cosc 241 Programming and Problem Solving Lecture 17 (30/4/18) Quicksort

Sorting Algorithms Day 2 4/5/17

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

CS 310 Advanced Data Structures and Algorithms

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Quick-Sort. Quick-sort is a randomized sorting algorithm based on the divide-and-conquer paradigm:

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)?

Searching in General

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

About this exam review

ECE 2574: Data Structures and Algorithms - Basic Sorting Algorithms. C. L. Wyatt

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Quick Sort. CSE Data Structures May 15, 2002

CSC 273 Data Structures

Quick-Sort. Quick-Sort 1

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

Chapter 7 Sorting. Terminology. Selection Sort

QuickSort

Data Structures and Algorithms

ITEC2620 Introduction to Data Structures

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

CS1 Lecture 30 Apr. 2, 2018

CS 171: Introduction to Computer Science II. Quicksort

Lecture 8: Mergesort / Quicksort Steven Skiena

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Cpt S 122 Data Structures. Sorting

CSCI 204 Introduction to Computer Science II

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

CS61BL. Lecture 5: Graphs Sorting

Sorting. CSE 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation. CSE143 Au

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

BM267 - Introduction to Data Structures

Quick-Sort fi fi fi 7 9. Quick-Sort Goodrich, Tamassia

Sorting. Order in the court! sorting 1

Sorting. CSC 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation CSC Picture

Data Structures and Algorithms

// walk through array stepping by step amount, moving elements down for (i = unsorted; i >= step && item < a[i-step]; i-=step) { a[i] = a[i-step];

Sorting: Quick Sort. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

Binary heaps (chapters ) Leftist heaps

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)

MERGESORT & QUICKSORT cs2420 Introduction to Algorithms and Data Structures Spring 2015

Lecture 7: Searching and Sorting Algorithms

Sorting. Data structures and Algorithms

Sorting. Dr. Baldassano Yu s Elite Education

Programming II (CS300)

CS 61B Summer 2005 (Porter) Midterm 2 July 21, SOLUTIONS. Do not open until told to begin

CSE 373: Data Structures and Algorithms

S O R T I N G Sorting a list of elements implemented as an array. In all algorithms of this handout the sorting of elements is in ascending order

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Sorting. Order in the court! sorting 1

Unit-2 Divide and conquer 2016

SORTING, SETS, AND SELECTION

L14 Quicksort and Performance Optimization

Hiroki Yasuga, Elisabeth Kolp, Andreas Lang. 25th September 2014, Scientific Programming

Randomized Algorithms, Quicksort and Randomized Selection

Overview of Sorting Algorithms

ECE 242 Data Structures and Algorithms. Advanced Sorting II. Lecture 17. Prof.

CS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions. The Midterm Exam was given in class on Wednesday, March 18, 2009.

CS 206 Introduction to Computer Science II

CIS 121 Data Structures and Algorithms with Java Spring Code Snippets and Recurrences Monday, January 29/Tuesday, January 30

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

CS 311 Data Structures and Algorithms, Fall 2009 Midterm Exam Solutions. The Midterm Exam was give in class on Wednesday, October 21, 2009.

Topic 17 Fast Sorting

CHAPTER 7 Iris Hui-Ru Jiang Fall 2008

Computer Science 4U Unit 1. Programming Concepts and Skills Algorithms

MPATE-GE 2618: C Programming for Music Technology. Unit 4.2

CSE 143. Two important problems. Searching and Sorting. Review: Linear Search. Review: Binary Search. Example. How Efficient Is Linear Search?

Arrays, Vectors Searching, Sorting

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

Sorting (I) Hwansoo Han

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Sorting. Lecture10: Sorting II. Sorting Algorithms. Performance of Sorting Algorithms

Transcription:

Quicksort

Mergesort again 1. Split the list into two equal parts 5 3 9 2 8 7 3 2 1 4 5 3 9 2 8 7 3 2 1 4

Mergesort again 2. Recursively mergesort the two parts 5 3 9 2 8 7 3 2 1 4 2 3 5 8 9 1 2 3 4 7

Mergesort again 3. Merge the two sorted lists together 2 3 5 8 9 1 2 3 4 7 1 2 2 3 3 4 5 7 8 9

Quicksort Mergesort is great... except that it's not inplace So it needs to allocate memory And it has a high constant factor Quicksort: let's do divide-and-conquer sorting, but do it in-place

Quicksort Pick an element from the array, called the pivot Partition the array: First come all the elements smaller than the pivot, then the pivot, then all the elements greater than the pivot Partitioning has complexity O(n) Recursively quicksort the two partitions

Quicksort 5 3 9 2 8 7 3 2 1 4 Say the pivot is 5. Partition the array into: all elements less than 5, then 5, then all elements greater than 5 3 3 2 2 1 4 5 9 8 7 Less than the pivot Greater than the pivot

Quicksort Now recursively quicksort the two partitions! 3 3 2 2 1 4 5 9 8 7 Quicksort Quicksort 1 2 2 3 3 4 5 7 8 9

Pseudocode // call as sort(a, 0, a.length-1); void sort(int[] a, int low, int high) { if (low >= high) return; int pivot = partition(a, low, high); // assume that partition returns the // index where the pivot now is sort(a, low, pivot-1); sort(a, pivot+1, high); } Common optimisation: switch to insertion sort when the input array is small

Quicksort's performance Mergesort is fast because it splits the array into two equal halves Quicksort just gives you two halves of whatever size! So does it still work fast?

Complexity of quicksort In the best case, partitioning splits an array of size n into two halves of size n/2: n n/2 n/2

Complexity of quicksort The recursive calls will split these arrays into four arrays of size n/4: n n/2 n/2 n/4 n/4 n/4 n/4

n n/2 n/2 Total time is O(n log n)! n/4 n/4 n/4 n/4 log n levels n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 O(n) time per level

Complexity of quicksort But that's the best case! In the worst case, everything is greater than the pivot (say) The recursive call has size n-1 Which in turn recurses with size n-2, etc. Amount of time spent in partitioning: n + (n-1) + (n-2) + + 1 = O(n 2 )

n n-1 Total time is O(n 2 )! n-2 n levels n-3 O(n) time per level

Worst cases When we simply use the first element as the pivot, we get this worst case for: Sorted arrays Reverse-sorted arrays The best pivot to use is the median value of the array, but in practice it's too expensive to compute... Most important decision in QuickSort: what to use as the pivot

Complexity of quicksort You don't need to split the array into exactly equal parts, it's enough to have some balance e.g. 10%/90% split still gives O(n log n) runtime

Partitioning algorithm 1. Pick a pivot (here 5) 5 3 9 2 8 7 3 2 1 4

Partitioning algorithm 2. Set two indexes, low and high 5 3 9 2 8 7 3 2 1 4 low high Idea: everything to the left of low is less than the pivot (coloured yellow), everything to the right of high is greater than the pivot (green)

Partitioning algorithm 3. Move low right until you find something greater or equal to the pivot 5 3 9 2 8 7 3 2 1 4 low high

Partitioning algorithm 3. Move low right until you find something greater or equal to the pivot 5 3 9 2 8 7 3 2 1 4 low while (a[low] < pivot) low++; high

Partitioning algorithm 3. Move low right until you find something greater than the pivot 5 3 9 2 8 7 3 2 1 4 low while (a[low] < pivot) low++; high

Partitioning algorithm 3. Move high left until you find something less than the pivot 5 3 9 2 8 7 3 2 1 4 low while (a[high] < pivot) high--; high

Partitioning algorithm 4. Swap them! 5 3 4 2 8 7 3 2 1 9 low high swap(a[low], a[high]);

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 8 7 3 2 1 9 low low++; high--; high

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 8 7 3 2 1 9 low high while (a[low] < pivot) low++;

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 8 7 3 2 1 9 low high

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 8 7 3 2 1 9 low high while (a[high] < pivot) high++;

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 1 7 3 2 8 9 low swap(a[low], a[high]); high

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 1 7 3 2 8 9 low high low++; high--;

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 1 7 3 2 8 9 low high

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 1 7 3 2 8 9 low high

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 1 2 3 7 8 9 low high

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 1 2 3 7 8 9 low high

Partitioning algorithm 5. Advance low and high and repeat 5 3 4 2 1 2 3 7 8 9 low high

Partitioning algorithm 6. When low and high have crossed, we are finished! But the pivot is in the wrong place. 5 3 4 2 1 2 3 7 8 9 low high

Partitioning algorithm 7. Last step: swap pivot with high 3 3 4 2 1 2 5 7 8 9 low high

Details 1. What to do if we want to use a different element (not the first) for the pivot? Swap the pivot with the first element before starting partitioning!

Details 2. What happens if the array contains many duplicates? Notice that we only advance a[low] as long as a[low] < pivot If a[low] == pivot we stop, same for a[high] If the array contains just one element over and over again, low and high will advance at the same rate Hence we get equal-sized partitions

Pivot Which pivot should we pick? First element: gives O(n 2 ) behaviour for alreadysorted lists Median-of-three: pick first, middle and last element of the array and pick the median of those three Pick pivot at random: gives O(n log n) expected (probabilistic) complexity

Quicksort Typically the fastest sorting algorithm......but very sensitive to details! Must choose a good pivot to avoid O(n 2 ) case Must take care with duplicates Switch to insertion sort for small arrays to get better constant factors If you do all that right, you get an in-place sorting algorithm, with low constant factors and O(n log n) complexity

Stable sorting When sorting complex objects, e.g. where each element contains various information about a person, the ordering may only take part of the data in account (via Comparable, Comparator, Ord) Then it s sometimes important that objects that are deemed equal by the ordering should appear in the same order as they did in the original list. A sorting algorithm that does not change the order of equal elements is called stable.

Stable sorting Let s say that we want to sort [(5, a ), (3, d ), (2, f ), (3, b )] and that the ordering of the pairs is defined to be the natural ordering of the first component. Unstable sorting might result in [(2, f ), (3, b ), (3, d ), (5, a )] Stable sorting always gives [(2, f ), (3, d ), (3, b ), (5, a )] Insertion sort is stable (provided that the insert inequality check is the right one, so that equal elements are not swapped).

Merge sort vs quicksort Quicksort: In-place O(n log n) but O(n 2 ) if you are not careful Works on arrays only (random access) Not stable Compared to mergesort: Not in-place O(n log n) Only requires sequential access to the list this makes it good in functional programming Stable

Sorting Why is sorting important? Because sorted data is much easier to deal with! Searching use binary instead of linear search Finding duplicates takes linear instead of quadratic time etc. Most sorting algorithms are based on comparisons Compare elements is one bigger than the other? If not, do something about it! Advantage: they can work on all sorts of data Disadvantage: specialised algorithms for e.g. sorting lists of integers can be faster

Real-world sorting

Sorting algorithms so far Insertion sort Worst case Average case Best case O(n 2 ) O(n 2 ) O(n) Quicksort O(n 2 ) O(n log n) O(n log n) Mergesort O(n log n) O(n log n) O(n log n)

Sorting algorithms so far Insertion sort Worst case Average case Best case O(n 2 ) O(n2) O(n) Quicksort O(n 2 ) O(n log n) O(n log n) No clear winner... the best algorithms combine ideas from several Mergesort O(n log n) O(n log n) O(n log n)

Introsort Quicksort: fast in practice, but O(n 2 ) worst case Introsort: Start with Quicksort If the recursion depth gets too big, switch to heapsort, which is O(n log n) Plus standard Quicksort optimisations: Choose pivot via median-of-three Switch to insertion sort for small arrays Used by e.g. C++ STL,.NET,...

Dual-pivot quicksort Instead of one pivot, pick two, x and y Instead of partitioning the array into two halves, partition it into three thirds < x x x and y y > y Look up the Dutch national flag problem if you want to know how Recursively sort the three partitions! If the two pivots are equal, don't sort the middle partition (neatly handles the case where the array has a lot of duplicates) Used by Java for primitive types (int,...)

Traditional merge sort (a recap)

1 3 6 8 4 2 1 3 7 5 split split 1 3 6 8 4 2 1 3 7 5 split split split split 1 3 6 8 4 2 1 3 7 5 1 3 2 1 1 3 6 8 4 2 1 3 7 5

1 3 6 8 4 2 1 3 7 5 1 3 1 2 1 3 6 4 8 1 2 3 5 7 merg e merg e 1 3 4 6 8 1 2 3 5 merg e merg e merg e merg e 1 1 2 3 3 4 5 6 7 8 7

Natural merge sort Traditional merge sort splits the input list into single elements before merging everything back together again Better idea: split the input into runs A run is a sequence of increasing elements...or a sequence of decreasing elements First reverse all the decreasing runs, so they become increasing Then merge all the runs together

1 3 6 8 4 2 1 3 7 5 split split split split 1 3 6 8 4 2 1 3 7 5 merg rever se 1 2 4 merg merg merg e 1 1 2 3 4e 6 8 e 3 e5 7 merg e merg e 1 1 2 3 3 4 5 6 7 8

Natural merge sort Big advantage: O(n) time for sorted data...and reverse-sorted data...and almost -sorted data Complexity: O(n log r), where r is the number of runs in the input...worst case, each run has two elements, so r = n/2, so O(n log n) Used by GHC

1 3 6 8 4 2 1 3 7 5 split split split split 1 3 6 8 4 2 1 3 7 5 merg reverse Total time is 1 2 4 O(n log r)! merg merg e e merg e e 1 1 2 3 4 6 8 3 5 7 merg e merg e 1 1 2 3 3 4 5 6 7 8 O (log r) levels O(n) time per level

Timsort Natural mergesort is really good on sorted/nearly-sorted data You get long runs so not many merges to do But not so good on highly unsorted data Short runs so many merges Idea of Timsort: on short, very unsorted parts of the list, switch to insertion sort How to detect unsortedness? Several short runs next to one another

1 3 6 8 4 2 1 3 7 5 split split Don't split 3 7 54 into two runs, it's too short! revers e split 1 3 6 8 2 1 3 7 1 2 4 5 merg merg insertion e 1 1 2 3 4e 6 8 sort 3 5 7 merg e merg e 1 1 2 3 3 4 5 6 7 8

Timsort Specifically: If we come across a short run, join it together with all following short runs until we reach a threshold Then use insertion sort on that part Also some optimisations for merge: Merge smaller runs together first If the merge begins with several elements from the same array, use binary search to find out how many and then copy them all in one go Used in Java for arrays of objects, Python http://en.wikipedia.org/wiki/timsort

The best sorting algorithm? A good sorting algorithm should: have O(n log n) complexity have O(n) complexity on nearly-sorted data be simple be in-place be cache-friendly No algorithm seems to have all of these!

Summary No one-size fits all answer Best overall complexity: natural mergesort But quicksort has smaller constant factors Timsort a good compromise? Different algorithms are good in different situations...something you should find in the lab :) Best sorting functions combine ideas from several algorithms Introsort: quicksort+heapsort+insertion sort Timsort: natural mergesort+insertion sort