Divide and Conquer Sorting Algorithms and Noncomparison-based

Similar documents
Mergesort again. 1. Split the list into two equal parts

Mergesort again. 1. Split the list into two equal parts

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Data structures. More sorting. Dr. Alex Gerdes DIT961 - VT 2018

Sorting Algorithms. + Analysis of the Sorting Algorithms

Cpt S 122 Data Structures. Sorting

Better sorting algorithms (Weiss chapter )

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Quick Sort. CSE Data Structures May 15, 2002

Real-world sorting (not on exam)

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

CS240 Fall Mike Lam, Professor. Quick Sort

Quicksort (Weiss chapter 8.6)

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

4. Sorting and Order-Statistics

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Sorting and Selection

Divide and Conquer Algorithms: Advanced Sorting

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Sorting. Bringing Order to the World

Divide and Conquer Algorithms: Advanced Sorting. Prichard Ch. 10.2: Advanced Sorting Algorithms

CSE 143. Two important problems. Searching and Sorting. Review: Linear Search. Review: Binary Search. Example. How Efficient Is Linear Search?

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

Sorting. Task Description. Selection Sort. Should we worry about speed?

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Lecture Notes 14 More sorting CSS Data Structures and Object-Oriented Programming Professor Clark F. Olson

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Sorting. Dr. Baldassano Yu s Elite Education

ECE 242 Data Structures and Algorithms. Advanced Sorting II. Lecture 17. Prof.

Sorting (I) Hwansoo Han

Sorting: Quick Sort. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

Chapter 7 Sorting. Terminology. Selection Sort

Parallel Sorting Algorithms

COMP Data Structures

Run Times. Efficiency Issues. Run Times cont d. More on O( ) notation

CS 310 Advanced Data Structures and Algorithms

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

Data Structures and Algorithms

ECE 2574: Data Structures and Algorithms - Basic Sorting Algorithms. C. L. Wyatt

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Overview of Sorting Algorithms

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS 171: Introduction to Computer Science II. Quicksort

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

ITEC2620 Introduction to Data Structures

Data Structures and Algorithms

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

Question 7.11 Show how heapsort processes the input:

CS221: Algorithms and Data Structures. Sorting Takes Priority. Steve Wolfman (minor tweaks by Alan Hu)

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Sorting Algorithms. For special input, O(n) sorting is possible. Between O(n 2 ) and O(nlogn) E.g., input integer between O(n) and O(n)

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

CS Divide and Conquer

BM267 - Introduction to Data Structures

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Sorting. Weiss chapter , 8.6

We can use a max-heap to sort data.

Unit-2 Divide and conquer 2016

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Sorting is a problem for which we can prove a non-trivial lower bound.

Randomized Algorithms, Quicksort and Randomized Selection

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Divide-and-Conquer. Dr. Yingwu Zhu

2/26/2016. Divide and Conquer. Chapter 6. The Divide and Conquer Paradigm

Lecture 8: Mergesort / Quicksort Steven Skiena

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)?

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

CSC Design and Analysis of Algorithms

CS Sorting Terms & Definitions. Comparing Sorting Algorithms. Bubble Sort. Bubble Sort: Graphical Trace

Sorting: Given a list A with n elements possessing a total order, return a list with the same elements in non-decreasing order.

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

CS61B Lectures # Purposes of Sorting. Some Definitions. Classifications. Sorting supports searching Binary search standard example

9/10/12. Outline. Part 5. Computational Complexity (2) Examples. (revisit) Properties of Growth-rate functions(1/3)

DATA STRUCTURES AND ALGORITHMS

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

Lecture Notes on Quicksort

Lecture Notes on Quicksort

// walk through array stepping by step amount, moving elements down for (i = unsorted; i >= step && item < a[i-step]; i-=step) { a[i] = a[i-step];

Lecture 7 Quicksort : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Sorting is ordering a list of objects. Here are some sorting algorithms

COS 226 Lecture 4: Mergesort. Merging two sorted files. Prototypical divide-and-conquer algorithm

CSE 373: Data Structures and Algorithms

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

MERGESORT & QUICKSORT cs2420 Introduction to Algorithms and Data Structures Spring 2015

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

Introduction to Computers and Programming. Today

Data Structure Lecture#17: Internal Sorting 2 (Chapter 7) U Kang Seoul National University

Introduction. e.g., the item could be an entire block of information about a student, while the search key might be only the student's name

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

Transcription:

Divide and Conquer Sorting Algorithms and Noncomparison-based Sorting Algorithms COMP1927 16x1 Sedgewick Chapters 7 and 8 Sedgewick Chapter 6.10, Chapter 10

DIVIDE AND CONQUER SORTING ALGORITHMS Step 1 If a collection has less than two elements, it s already sorted Otherwise, split it into two parts Step 2 Sort both parts separately Step 3 Combine the sorted collections to return the final result

MERGE SORT Basic idea: Divide and Conquer split the array into two equal-sized partitions (recursively) sort each of the partitions merge the two sorted partitions together Merging: Basic idea copy elements from the inputs one at a time give preference to the smaller of the two when one exhausted, copy the rest of the other

DIVIDE AND CONQUER SORTING: MERGESORT Split the sequence in halves Sort both halves independently What is the best way to combine them? look at the first element in each sequence, pick the smallest of both, insert in sorted collection, continue until all elements are used up 5 6 1 7 3 2 8 4 5 6 1 7 3 2 8 4 1 5 6 7 2 3 4 8 1 2 3 4 5 6 7 8 (1) split (2)call sort rec. (3)merge

MERGE SORT: ARRAY IMPLEMENTATION assuming we have merge implemented, mergesort can be defined as: void merge (int a[], int l, int m, int r); void mergesort (Item a[], int l, int r){ int m = (l+r)/2; if (r <= l) { return; } mergesort (a, l, m); mergesort (a, m+1, r); merge (a, l, m, r); }

MERGE ARRAY IMPLEMENTATION void merge(int a[], int l, int mid, int r) { int i, j, k, nitems = r-l+1; int *tmp = malloc(nitems*sizeof(int)); i = l; j = mid+1; k = 0; while (i <= mid && j <= r) { if ( a[i] < a[j] ) { tmp[k++] = a[i++]; }else{ tmp[k++] = a[j++]; } } while (i <= mid) tmp[k++] = a[i++]; while (j <= r) tmp[k++] = a[j++]; } //copy back for (i = l, k = 0; i <= r; i++, k++) a[i] = tmp[k]; free(tmp);

MERGESORT: WORK COMPLEXITY How many steps? Constant time (on arrays, for example) to split array into two halves N steps to combine (merge) T(N) = N + 2* T(N/2) Substitute N = 2 N T(2 N ) = 2 N + 2T(2 N /2) = 2 N + 2T(2 N-1 ) T(2 N )/2 N = 1 + T(2 N-1 )/(2 N-1 ) = 1 + (1 + T(2 N-2 )/(2 N-2 )) = 1 + 1 + (1+ T(2 N-3 )/2 N-3 ) etc = N T(2 N ) = 2 N N T(N) = Nlog2N

MERGE SORT WORK COMPLEXITY How many steps does it take to sort a collection of N elements? split array into equal-sized partitions same happens at every recursive level each "level" requires N comparisons In worst case exactly interleaved and is N halving at each level log 2 N levels Overall: Merge sort is in O(nlogn), Stable as long as merge implemented to be stable Not in-place: Uses O(n) memory for merge and O(logn) stack space Non-adaptive : still nlogn for ordered data

BOTTOM UP MERGE SORT Basic Idea: Non-recursive On each pass, array contains sorted sections of length m At start treat as n sorted sections of length 1 1st pass merges adjacent elements into sections of length 2 2nd pass merges adjacent elements into sections of length 4 continue until a single sorted section of length n This approach is used for sorting diskfiles

BOTTOM-UP MERGE SORT ARRAY IMPLEMENTATION #define min(a,b) (A<B? A : B) int merge (int a[], int l, int m, int r); void mergesortbu (int a[], int l, int r){ int i, m, end; for (m = 1; m <= r-l; m = 2*m) { for (i = l; i <= r-m; i += 2*m) { end = min(i + 2*m 1, r)); merge (a, i, i+m-1, end); } } }

MERGE SORT: IMPLEMENTATION Straight forward to implement on lists Traverses its input in sequential order Do not need extra space for merging lists Works for top-down and bottom up versions

DIVIDE AND CONQUER SORTING: QUICKSORT Mergesort uses a trivial split operation and puts all the work in combining the result Can we split the collection in a more intelligent way, such that combining the results is trivial? make sure all elements in one part are less than all the elements in the second part 5 6 1 7 3 2 8 4 1 3 2 4 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 (1) split (2) call sort rec. (3)combine

MORE ON QUICK SORT: IMPLEMENTATION On arrays, we need in-place partitioning: we need to swap elements in the array, such that for some pivot we choose, and some index i, all j<i, a[j] a[i], and k>i, a[k] a[i] a[i] a[i] l i r

QUICK SORT Given such a partition function, the implementation of quick sort on arrays is easy: However, it s surprisingly tricky to get partition right for all cases int partition(int a[], int l, int r); void quicksort (int a[], int l, int r){ int i; if (r <= l) { return; } i = partition (a, l, r); quicksort (a, l, i-1); quicksort (a, i+1, r); }

QUICK SORT: PARTITIONING int partition (int a[], int l, int r) { int i = l-1; int j = r; int pivot = a[r]; //rightmost is pivot } for (;;) { while ( a[++i] < pivot) ; while ( pivot < a[--j] && j!= l); if (i >= j) { break; } swap(i,j,a); } //put pivot into place swap(i,r a); return i; //Index of the pivot

QUICKSORT: WORK COMPLEXITY How many steps? N steps to split array in two Combing the sorted sub-results in constant time Best case (both parts have the same size): T(N) = N + 2* T(N/2) O(N * log N) Worst case (one part contains all elements): T(N) = N + T(N-1) = N + N-1 + T(N-2) = N + N-1 + N-2 +... + 1 = N(N+1)/2 = O(N 2 )

QUICK-SORT PROPERTIES It is not adaptive: existing order in the sequence only makes it worse It is not stable in our implementation. Can be made stable. In-place: Partitioning done in place Recursive calls use stack space of O(N) in worst case O(log N) on average

QUICK SORT - PERFORMANCE PROBLEMS Taking the first or last element as pivot is often a bad choice sequence might be partially sorted already Already ordered data is a worst case scenario Reverse ordered data is a worst case scenario split into parts of size N-1 and 0 Ideally our pivot would be The median value In the worst case our pivot is the largest or smallest value

QUICK SORT CHOOSING BETTER A PIVOT We can reduce the probability of picking a bad pivot picking a random element as the pivot picking the best out of three (or more) Median of Three partitioning Compare left-most, middle and right-most element Pick the median of these 3 values to be the pivot Does not eliminate the worst case but makes it less likely Ordered data no longer a worst case scenario

QUICK SORT MEDIAN OF THREE PARTITIONING- CHOOSING A BETTER PIVOT l (l+r)/2 r-1 r (1) pick a[l],a[r], a[(r+l)/2] (2) swap a[r-1] and a[(r+l)/2] (3) sort a[l], a[r-1],a[r] such that a[l]<=a[r-1] <= a[r] (4) call partition on a[l+1] to a[r-1]

QUICK SORT: PERFORMANCE AND OPTIMISATION Optimised versions of quick sort are frequently used For small sequences, quick sort is relatively expensive because of the recursive calls Quick sort with subfile cutoff Handle small partitions less than a certain threshold length differently Switch to insertion sort for the small partitions Don t sort. Leave and do insertion sort at the end Handling duplicates more efficiently by using three way partitioning.

QUICKSORT ON LINKED LISTS Straight forward to do if we just use first or last element as the pivot Picking the pivot via randomisation or median of 3 is now O(n) instead of O(1).

QUICK SORT VS MERGE SORT On typical modern architectures, efficient quicksort implementations generally outperform mergesort for sorting RAM-based arrays. Quick Sort is also a cache friendly sorting algorithm as it has good locality of reference when used for arrays. On the other hand, merge sort is a stable sort, parallelizes better, and is more efficient at handling slow-to-access sequential media. Merge sort is often the best choice for sorting a linked list and the merging can be done without using extra space that is used during merge for arrays.

HOW FAST CAN A SORT BECOME? All the sorts we have seen so far have been comparison based sorts find order by comparing elements in the sequence can sort any type of data as long as there is a way to compare 2 items Theoretical lower bound on worst case running time of comparison based sorts O(nlog(n)). Algorithms such as quicksort and mergesort are really about as fast as we can go for unknown types of data.

SORTING HAS A THEORETICAL NLOGN LOWER BOUND If there is 3 items, then 3! = 6 possible permutations or 6 possible different inputs If there are n items, then n! possible permutations or inputs If we do 1 comparison we can divide into 2 different categories If we do k comparisons we can divide into 2 k different categories We need to do enough comparisons so n! <= 2 K log n! <= log 2 k log n! <= k n log n <= k (using stirling s approximation)

NON-COMPARISON BASED SORTING Non-comparison based sorting We may not actually have to compare pairs of elements to sort the data. Specialised sorts can be implemented if additional information about the data to be sorted is known. Take advantage of special properties of keys We can do some kinds of sorts in linear time!

KEY INDEXED COUNTING SORT Basic Idea: Using an array, count up number of times each key appears Use this information as an index of where the item belongs in the final sorted array Place items in the final sorted array based on their index For example: Sorting numbers from 0..10 If I knew there were three 0 s and two 1 s If I had a 2, it would go at index 5 If I got another 2, it would go at index 6.

KEY INDEXED COUNTING SORT May work in O(n) time. How? Because it uses no comparisons! But we have to make assumptions about the size and nature of the data Assumptions Sequence of size N Each key is in the range of 0 - M-1 Time Complexity Efficient if M is not too large compared to N O(n + M) - Not good in cases like : 1,2,999999 In-place? No. Uses temporary arrays of O(n+M) Is stable

RADIX SORTING Comparison based sorting: Sorting based on comparing two whole keys Radix sorting: Processing keys one piece at a time Keys are treated as numbers represented in base-r (radix) number system Binary numbers R is 2 Decimal numbers R is 10 Ascii strings R is 128 or 256 Unicode strings R is 65,536 Sorting is done individually on each digit in the key on at a time digit by digit or character by character

RADIX SORT LSD (LEAST SIGNIFICANT DIGIT FIRST) Consider characters or digits or bits from Right to Left (ie from least significant) Stably sort using dth digit as the key Can use Key Indexed Counting sort. For example: sorting 1019, 2301, 3129, 2122 1019, 2301, 3129, 2122 -> 2301, 2122, 1019, 3129 2301, 2122, 1019, 3129 -> 2301, 1019, 2122, 3129 2301, 1019, 2122, 3129 -> 1019, 2122, 3129, 2301 1019, 2122, 3129, 2301 -> 1019, 2122, 2301,3129

RADIX SORT LSD PROPERTIES O(w(n+R)) w is the width of the data ie 987 is 3 digits wide, aaa is 3 characters, integers (binary rep) could have w as 32 and R of 2 The algorithm makes w passes over all n keys. Not in place: extra space: O(n + R) Stable Can modify to use for variable length data Imagine sorting strings like zaaaaaaa and aaaaaaaa Can spend lots of work comparing insignificant details

RADIX SORT MSD (MOST SIGNIFICANT DIGIT FIRST) Partition file into R pieces according to first character Can use key-indexed counting Recursively sort all strings that start with each character key-indexed counts delineate files to sort O(w(n+R)) in worst case Extra space N + DR (D is depth of recursion) Don't have to go through all of the digits to get a sorted array. This can make MSD radix sort considerably faster Can use insertion sort for small subfiles