CSE 373: Data Structures & Algorithms More Sor9ng and Beyond Comparison Sor9ng

Similar documents
Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

CSE 373: Data Structures and Algorithms

CSE 373: Data Structure & Algorithms Comparison Sor:ng

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Announcements. HW4: Due tomorrow! Final in EXACTLY 2 weeks. Start studying. Summer 2016 CSE373: Data Structures & Algorithms 1

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms. Catie Baker Spring 2015

CSE 373: Data Structures and Algorithms More Asympto<c Analysis; More Heaps

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Sorting Algorithms. + Analysis of the Sorting Algorithms

CSE 373: Data Structures and Algorithms

CSE 326: Data Structures Sorting Conclusion

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

CS240 Fall Mike Lam, Professor. Quick Sort

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

COMP Data Structures

Quick Sort. CSE Data Structures May 15, 2002

Data Structures and Algorithms

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Faster Sorting Methods

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

Sorting and Selection

L14 Quicksort and Performance Optimization

CSE 373: Data Structures and Algorithms

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

CS61BL. Lecture 5: Graphs Sorting

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Question 7.11 Show how heapsort processes the input:

CSE 373: Data Structures and Algorithms

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Better sorting algorithms (Weiss chapter )

Sorting. Dr. Baldassano Yu s Elite Education

Quicksort (Weiss chapter 8.6)

CSE 373: Data Structures and Algorithms

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

SORTING AND SELECTION

Sorting Algorithms Day 2 4/5/17

Design and Analysis of Algorithms

SORTING, SETS, AND SELECTION

DATA STRUCTURES AND ALGORITHMS

CS1 Lecture 30 Apr. 2, 2018

Divide and Conquer Sorting Algorithms and Noncomparison-based

4. Sorting and Order-Statistics

Programming Challenges - Week 3

CSE 373: Floyd s buildheap algorithm; divide-and-conquer. Michael Lee Wednesday, Feb 7, 2018

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

Sorting (I) Hwansoo Han

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Hiroki Yasuga, Elisabeth Kolp, Andreas Lang. 25th September 2014, Scientific Programming

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Quicksort. The divide-and-conquer strategy is used in quicksort. Below the recursion step is described:

Sorting. Data Structures and Algorithms CSE 373 SU 18 BEN JONES 1

Teach A level Compu/ng: Algorithms and Data Structures

Plan of the lecture. Quick-Sort. Partition of lists (or using extra workspace) Quick-Sort ( 10.2) Quick-Sort Tree. Partitioning arrays

Programming II (CS300)

Mergesort again. 1. Split the list into two equal parts

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

CS221: Algorithms and Data Structures. Sorting Takes Priority. Steve Wolfman (minor tweaks by Alan Hu)

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang

CSE 326: Data Structures Lecture #16 Sorting Things Out. Unix Tutorial!!

Sorting. Task Description. Selection Sort. Should we worry about speed?

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Outline

Quicksort. Repeat the process recursively for the left- and rightsub-blocks.

Examination Questions Midterm 2

EECS 2011M: Fundamentals of Data Structures

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Lecture Notes on Quicksort

having any value between and. For array element, the plot will have a dot at the intersection of and, subject to scaling constraints.

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

Lecture Notes on Quicksort

S O R T I N G Sorting a list of elements implemented as an array. In all algorithms of this handout the sorting of elements is in ascending order

Practical Session #11 - Sort properties, QuickSort algorithm, Selection

We can use a max-heap to sort data.

Sorting: Given a list A with n elements possessing a total order, return a list with the same elements in non-decreasing order.

Unit-2 Divide and conquer 2016

CS61B Lectures # Purposes of Sorting. Some Definitions. Classifications. Sorting supports searching Binary search standard example

Sorting is ordering a list of objects. Here are some sorting algorithms

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Data Structures and Algorithms Chapter 4

O(n): printing a list of n items to the screen, looking at each item once.

Sorting. Quicksort analysis Bubble sort. November 20, 2017 Hassan Khosravi / Geoffrey Tien 1

Selection (deterministic & randomized): finding the median in linear time

Sorting. Weiss chapter , 8.6

ECE 242 Data Structures and Algorithms. Advanced Sorting II. Lecture 17. Prof.

SORTING LOWER BOUND & BUCKET-SORT AND RADIX-SORT

COMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)

Sorting is a problem for which we can prove a non-trivial lower bound.

Measuring algorithm efficiency

Mergesort again. 1. Split the list into two equal parts

CSE 143. Two important problems. Searching and Sorting. Review: Linear Search. Review: Binary Search. Example. How Efficient Is Linear Search?

Real-world sorting (not on exam)

CS : Data Structures

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Transcription:

CSE 373: Data Structures & More Sor9ng and Beyond Comparison Sor9ng Riley Porter Winter 2017 1

Course Logis9cs HW5 due in a couple days à more graphs! Don t forget about the write- up! HW6 out later today à sor9ng (and to a lesser degree reading specs/other files/ tests). Final exam in 2 weeks! CSE373: Data Structures 2 &

Review: Sor9ng: The Big Picture O(n 2 ) O(n) Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: Ω(n log n) Specialized algorithms: O(n) Handling huge data sets Inser9on sort Selec9on sort Bubble sort Shell sort Heap sort Merge sort Quick sort (avg) Bucket sort Radix sort External sor9ng 3

Quick Sort Divide: Split array around a pivot 5 2 8 4 7 3 1 6 2 4 5 7 8 1 3 pivot 6 numbers <= pivot numbers > pivot 4

Quick Sort Divide: Pick a pivot, par99on into groups Unsorted <= P P > P Conquer: Return array when length 1 Combine: Combine sorted par99ons and pivot <= P P > P Sorted 5

Quick Sort Pseudocode Core idea: Pick some item from the array and call it the pivot. Put all items smaller in the pivot into one group and all items larger in the other and recursively sort. If the array has size 0 or 1, just return it unchanged. quicksort(input) { if (input.length < 2) { return input; } else { pivot = getpivot(input); smallerhalf = sort(getsmaller(pivot, input)); largerhalf = sort(getbigger(pivot, input)); return smallerhalf + pivot + largerhalf; } } 6

Think in Terms of Sets S 13 81 92 43 65 31 57 26 75 0 select pivot value S 1 S 2 partition S 13 0 43 31 65 26 57 92 75 81 S 1 S 2 Quicksort(S 1 ) and Quicksort(S 2 ) 0 13 26 31 43 57 65 75 81 92 S 0 13 26 31 43 57 65 75 81 92 Presto! S is sorted [Weiss] 7

Quick Sort Example: Divide Pivot rule: pick the element at index 0 7 3 8 4 5 2 1 6 3 4 5 2 1 6 7 8 2 1 3 4 5 6 1 2 4 5 6 8 5 6

Quick Sort Example: Combine Combine: this is the order of the elements we ll care about when combining 7 3 8 4 5 2 1 6 3 4 5 2 1 6 7 8 2 1 3 4 5 6 1 2 4 5 6 9 5 6

Quick Sort Example: Combine Combine: put lei par99on < pivot < right par99on 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 1 2 4 5 6 10 5 6

Have not yet explained: Details How to pick the pivot element Any choice is correct: data will end up sorted But as analysis will show, want the two par99ons to be about equal in size How to implement par99oning In linear 9me In place 11

Pivots Worst pivot? Greatest/least element Recurse on problem of size n 1 8 2 9 4 5 3 1 6 1 8 2 9 4 5 3 6 Best pivot? Median Halve each 9me 8 2 9 4 5 3 1 6 5 2 4 3 1 8 9 6 12

Poten9al pivot rules Pick first or last element: fast, but worst- case occurs with mostly sorted input (as we ve seen) Try looping through the array: we ll get a good value, but that s slow and hard to implement Pick random element: cool, does as well as any technique, but (pseudo)random number genera9on can be slow Pick the median of first, middle, and last: Easy to implement and is a common heuris9c that tends to work well e.g., arr[lo], arr[hi-1], arr[(hi+lo)/2] 13

Median Pivot Example Pick the median of first, middle, and last 7 2 8 4 5 3 1 6 Median = 6 Swap the median with the first value 7 2 8 4 5 3 1 6 Pivot is now at index 0, and we re ready to go 6 2 8 4 5 3 1 7 14

Par99oning Conceptually simple, but hardest part to code up correctly Aier picking pivot, need to par99on in linear 9me in place One approach (there are slightly fancier ones): 1. Put pivot in index lo 2. Use two pointers i and j, star9ng at lo+1 and hi-1 3. while (i < j) if (arr[j] > pivot) j-- else if (arr[i] < pivot) i++ else swap arr[i] with arr[j] 4. Swap pivot with arr[i] * *skip step 4 if pivot ends up being least element 15

Example Step one: pick pivot as median of 3 lo = 0, hi = 10 0 1 2 3 4 5 6 7 8 9 8 1 4 9 0 3 5 2 7 6 Step two: move pivot to the lo posi9on 0 1 2 3 4 5 6 7 8 9 6 1 4 9 0 3 5 2 7 8 16

Quick Sort Par99on Example 6 1 4 9 0 3 5 7 2 8 6 1 4 9 0 3 5 7 2 8 6 1 4 9 0 3 5 7 2 8 6 1 4 9 0 3 5 7 2 8 6 1 4 2 0 3 5 7 9 8 6 1 4 2 0 3 5 7 9 8 5 1 4 2 0 3 6 7 9 8 17

Quick Sort Analysis Best- case: Pivot is always the median, split data in half Same as mergesort: O(n log n), O(n) par99on work for O(log(n)) levels Worst- case: Pivot is always smallest or largest element Basically same as selec9on sort: O(n 2 ) Average- case (e.g., with random pivot) O(n log n), you re not responsible for proof (in text) 18

Quick Sort Analysis In- place: Yep! We can use a couple pointers and par99on the array in place, recursing on different lo and hi indices Stable: Not necessarily. Depends on how you handle equal values when par99oning. A stable version of quick sort uses some extra storage for par99oning. 19

Divide and Conquer: Cutoffs For small n, all that recursion tends to cost more than doing a simple, quadra9c sort Remember asympto9c complexity is for large n Common engineering technique: switch algorithm below a cutoff Reasonable rule of thumb: use inser9on sort for n < 10 Notes: Cutoffs are also the norm with parallel algorithms Switch to sequen9al algorithm None of this affects asympto9c complexity 20

Cutoff Pseudocode void quicksort(int[] arr, int lo, int hi) { if(hi lo < CUTOFF) insertionsort(arr,lo,hi); else } No9ce how this cuts out the vast majority of the recursive calls Think of the recursive calls to quicksort as a tree Trims out the bosom layers of the tree 21

Cool Comparison Sor9ng Links Visualiza9on of sorts on different inputs: hsp://www.sor9ng- algorithms.com/ Visualiza9on of sor9ng with sound: hsps://www.youtube.com/watch?v=t8g- iyghpea Sor9ng via dance: hsps://www.youtube.com/watch?v=xaqr3g_nvoo XKCD Ineffec9ve sorts: hsps://xkcd.com/1185/ 22

Sor9ng: The Big Picture Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: Ω(n log n) Specialized algorithms: O(n) Handling huge data sets Inser9on sort Selec9on sort Shell sort Heap sort Merge sort Quick sort (avg) Bucket sort Radix sort External sor9ng 23

How Fast Can We Sort? Heapsort & mergesort have O(n log n) worst- case running 9me Quicksort has O(n log n) average- case running 9me These bounds are all 9ght, actually Θ(n log n) Assuming our comparison model: The only opera9on an algorithm can perform on data items is a 2- element comparison. There is no lower asympto9c complexity, such as O(n) or O(n log log n) 24

Coun9ng Comparisons No maser what the algorithm is, it cannot make progress without doing comparisons IntuiIon: Each comparison can at best eliminate half the remaining possibili9es of possible orderings Can represent this process as a decision tree Nodes contain set of remaining possibili9es Edges are answers from a comparison The algorithm does not actually build the tree; it s what our proof uses to represent the most the algorithm could know so far as the algorithm progresses 25

Decision Tree for n = 3 a < b < c a < c < b c < a < b a < c a > c a < b < c c < a < b a < c < b b < c b > c a < b < c a < c < b a < b a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a a > b b < a < c b < c < a c < b < a b < c b > c b < a < c c < b < a b < c < a c < a c > a b < c < a The leaves contain all the possible orderings of a, b, c b < a < c 26

Example if a < c < b possible orders a < b < c a < c < b c < a < b a < c a > c a < b < c c < a < b a < c < b b < c b > c a < b < c a < c < b a < b a < b < c, b < c < a, a < c < b, c < a < b, b < a < c, c < b < a actual order a > b b < a < c b < c < a c < b < a b < c b > c b < a < c c < b < a b < c < a c < a c > a b < c < a b < a < c 27

What the Decision Tree Tells Us A binary tree because each comparison has 2 outcomes (we re comparing 2 elements at a 9me) Because any data is possible, any algorithm needs to ask enough ques9ons to produce all orderings. The facts we can get from that: 1. Each ordering is a different leaf (only one is correct) 2. Running any algorithm on any input will at best correspond to a root- to- leaf path in some decision tree. Worst number of comparisons is the longest path from root- to- leaf in the decision tree for input size n 3. There is no worst- case running 9me beser than the height of a tree with <num possible orderings> leaves 28

How many possible orderings? Assume we have n elements to sort. How many permuta8ons of the elements (possible orderings)? For simplicity, assume none are equal (no duplicates) Example, n=3 a[0]<a[1]<a[2] a[0]<a[2]<a[1] a[1]<a[0]<a[2] a[1]<a[2]<a[0] a[2]<a[0]<a[1] a[2]<a[1]<a[0] In general, n choices for least element, n- 1 for next, n- 2 for next, n(n- 1)(n- 2) (2)(1) = n! possible orderings That means with n! possible leaves, best height for tree is log(n!), given that best case tree splits leaves in half at each branch 29

What does that mean for run9me? That proves run9me is at least Ω(log (n!)). Can we write that more clearly? Nice! Any sor9ng algorithm must do at best (1/2)*(nlog n n) comparisons: Ω(nlog n) 30

Sor9ng: The Big Picture Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: Ω(n log n) Specialized algorithms: O(n) Handling huge data sets Inser9on sort Selec9on sort Shell sort Heap sort Merge sort Quick sort (avg) Bucket sort Radix sort External sor9ng 31

BucketSort (a.k.a. BinSort) If all values to be sorted are known to be integers between 1 and K (or any small range): Create an array of size K Put each element in its proper bucket (a.k.a. bin) If data is only integers, no need to store more than a count of how 9mes that bucket has been used Output result via linear pass through array of buckets count array 1 3 2 1 3 2 Example: K=5 input (5,1,3,4,3,2,1,1,5,4,5) output: 1,1,1,2,3,3,4,4,5,5,5 4 2 5 3 32

Analyzing Bucket Sort Overall: O(n+K) Linear in n, but also linear in K Good when K is smaller (or not much larger) than n We don t spend 9me doing comparisons of duplicates Bad when K is much larger than n Wasted space; wasted 9me during linear O(K) pass For data in addi9on to integer keys, use list at each bucket 33

Bucket Sort with non integers Most real lists aren t just keys; we have data Each bucket is a list (say, linked list) To add to a bucket, insert in O(1) (at beginning, or keep pointer to last element) count array 1 2 3 4 5 Rocky V Harry Poser Casablanca Star Wars Example: Movie ra9ngs; scale 1-5 Input: 5: Casablanca 3: Harry Poser movies 5: Star Wars Original Trilogy 1: Rocky V Result: 1: Rocky V, 3: Harry Poser, 5: Casablanca, 5: Star Wars Easy to keep stable ; Casablanca s9ll before Star Wars 34

Radix sort Radix = the base of a number system Examples will use base 10 because we are used to that In implementa9ons use larger numbers For example, for ASCII strings, might use 128 Idea: Bucket sort on one digit at a 9me Number of buckets = radix Star9ng with least significant digit Keeping sort stable Do one pass per digit Invariant: Aier k passes (digits), the last k digits are sorted 35

Radix Sort Example Radix = 10 Input: 478, 537, 9, 721, 3, 38, 143, 67 3 passes (input is 3 digits at max), on each pass, stable sort the input highlighted in yellow 4 7 8 5 3 7 0 0 9 7 2 1 0 0 3 0 3 8 1 4 3 0 6 7 7 2 1 0 0 3 1 4 3 5 3 7 0 6 7 4 7 8 0 3 8 0 0 9 0 0 3 0 0 9 7 2 1 5 3 7 0 3 8 1 4 3 0 6 7 4 7 8 0 0 3 0 0 9 0 3 8 0 6 7 1 4 3 4 7 8 5 3 7 7 2 1 36

Example Radix = 10 0 1 2 3 4 5 6 7 8 9 721 3 143 67 537 478 9 38 Input: 478 537 9 721 3 38 143 67 First pass: bucket sort by ones digit 37 Order now: 721 003 143 537 067 478 038 009

0 1 2 3 4 5 6 7 8 9 Example 721 3 143 537 67 478 38 9 Radix = 10 0 1 2 3 4 5 6 7 8 9 3 721 537 143 67 478 9 38 Order was: 721 003 143 537 067 478 038 009 Second pass: stable bucket sort by tens digit Order now: 003 009 721 537 038 143 067 478 38

0 1 2 3 4 5 6 7 8 9 Example 3 9 721 537 38 143 67 478 Radix = 10 Order was: 003 009 721 537 038 143 067 478 0 3 9 38 67 Third pass: 1 2 3 4 5 6 7 8 9 143 478 537 stable bucket sort by 100s digit 721 Order now: 003 009 038 067 143 478 537 721 39

Analysis Input size: n Number of buckets = Radix: B Number of passes = Digits : P Work per pass is 1 bucket sort: O(B+n) Total work is O(P(B+n)) Compared to comparison sorts, some9mes a win, but oien not Example: Strings of English lesers up to length 15 Run- 9me propor9onal to: 15*(52 + n) This is less than n log n only if n > 33,000 Of course, cross- over point depends on constant factors of the implementa9ons 40

Sor9ng Takeaways Simple O(n 2 ) sorts can be fastest for small n Selec9on sort, Inser9on sort (laser linear for mostly- sorted) Good for below a cut- off to help divide- and- conquer sorts O(n log n) sorts Heap sort, in- place but not stable nor parallelizable Merge sort, not in place but stable and works as external sort Quick sort, in place but not stable and O(n 2 ) in worst- case Oien fastest, but depends on costs of comparisons/copies Ω (n log n) is worst- case and average lower- bound for sor9ng by comparisons Non- comparison sorts Bucket sort good for small number of possible key values Radix sort uses fewer buckets and more phases Best way to sort? It depends! 41