CSE 373: Data Structures and Algorithms

Similar documents
CSE 373: Data Structures and Algorithms

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

CSE 373: Data Structures and Algorithms

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

CSE 373 NOVEMBER 8 TH COMPARISON SORTS

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Announcements. HW4: Due tomorrow! Final in EXACTLY 2 weeks. Start studying. Summer 2016 CSE373: Data Structures & Algorithms 1

CSE 373: Data Structure & Algorithms Comparison Sor:ng

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

Sorting. CSE 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation. CSE143 Au

CSE 373: Data Structures and Algorithms

Lecture 19 Sorting Goodrich, Tamassia

Sorting. Sorting in Arrays. SelectionSort. SelectionSort. Binary search works great, but how do we create a sorted array in the first place?

CSE 373: Data Structures and Algorithms

II (Sorting and) Order Statistics

Copyright 2009, Artur Czumaj 1

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

Sorting. CSC 143 Java. Insert for a Sorted List. Insertion Sort. Insertion Sort As A Card Game Operation CSC Picture

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms. Catie Baker Spring 2015

CS1 Lecture 30 Apr. 2, 2018

CSE 373: Data Structures & Algorithms More Sor9ng and Beyond Comparison Sor9ng

Quick Sort. CSE Data Structures May 15, 2002

COMP Data Structures

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Mergesort again. 1. Split the list into two equal parts

CSE 373: Data Structures and Algorithms

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Priority Queues Heaps Heapsort

Binary Heaps. COL 106 Shweta Agrawal and Amit Kumar

We can use a max-heap to sort data.

CSE 143. Two important problems. Searching and Sorting. Review: Linear Search. Review: Binary Search. Example. How Efficient Is Linear Search?

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Data Structures and Algorithms

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Sorting Algorithms. + Analysis of the Sorting Algorithms

O(n): printing a list of n items to the screen, looking at each item once.

CS : Data Structures

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Chapter 7 Sorting. Terminology. Selection Sort

Priority Queues and Heaps

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

CS240 Fall Mike Lam, Professor. Quick Sort

algorithm evaluation Performance

Sorting (I) Hwansoo Han

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Unit Outline. Comparing Sorting Algorithms Heapsort Mergesort Quicksort More Comparisons Complexity of Sorting 2 / 33

Sorting and Selection

CS221: Algorithms and Data Structures. Sorting Takes Priority. Steve Wolfman (minor tweaks by Alan Hu)

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

CSE 373: Floyd s buildheap algorithm; divide-and-conquer. Michael Lee Wednesday, Feb 7, 2018

CS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017

CSE 332 Spring 2014: Midterm Exam (closed book, closed notes, no calculators)

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Programming Challenges - Week 3

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Lecture #2. 1 Overview. 2 Worst-Case Analysis vs. Average Case Analysis. 3 Divide-and-Conquer Design Paradigm. 4 Quicksort. 4.

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Sorting. Task Description. Selection Sort. Should we worry about speed?

4. Sorting and Order-Statistics

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Sorting. CPSC 259: Data Structures and Algorithms for Electrical Engineers. Hassan Khosravi

Sorting is a problem for which we can prove a non-trivial lower bound.

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang

Divide and Conquer CISC4080, Computer Algorithms CIS, Fordham Univ. Acknowledgement. Outline

EECS 2011M: Fundamentals of Data Structures

Sorting Pearson Education, Inc. All rights reserved.

COMP 250 Fall recurrences 2 Oct. 13, 2017

Lecture 8: Mergesort / Quicksort Steven Skiena

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Introduction. e.g., the item could be an entire block of information about a student, while the search key might be only the student's name

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

CSE 373 Spring 2010: Midterm #1 (closed book, closed notes, NO calculators allowed)

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Intro to Algorithms. Professor Kevin Gold

Module 2: Classical Algorithm Design Techniques

CSE 332: Data Structures & Parallelism Lecture 3: Priority Queues. Ruth Anderson Winter 2019

CS 171: Introduction to Computer Science II. Quicksort

Lecture 2: Divide&Conquer Paradigm, Merge sort and Quicksort

3. Priority Queues. ADT Stack : LIFO. ADT Queue : FIFO. ADT Priority Queue : pick the element with the lowest (or highest) priority.

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

Merge Sort. Algorithm Analysis. November 15, 2017 Hassan Khosravi / Geoffrey Tien 1

CS301 - Data Structures Glossary By

Measuring algorithm efficiency

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues and Binary Heaps. Linda Shapiro Spring 2016

Binary heaps (chapters ) Leftist heaps

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

SORTING. Comparison of Quadratic Sorts

Computer Science 4U Unit 1. Programming Concepts and Skills Algorithms

Merge Sort

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

10/5/2016. Comparing Algorithms. Analyzing Code ( worst case ) Example. Analyzing Code. Binary Search. Linear Search

Transcription:

CSE 373: Data Structures and Algorithms Lecture 19: Comparison Sorting Algorithms Instructor: Lilian de Greef Quarter: Summer 2017

Today Intro to sorting Comparison sorting Insertion Sort Selection Sort Heap Sort Merge Sort

Sorting Now looking at algorithms instead of data structures!

Introduction to Sorting Stacks, queues, priority queues, and dictionaries all focused on providing one element at a time But often we know we want all the things in some order Humans can sort, but computers can sort fast Very common to need data sorted somehow Alphabetical list of people List of countries ordered by population Search engine results by relevance List store catalogue by price Algorithms have different asymptotic and constant-factor trade-offs No single best sort for all scenarios Knowing one way to sort just isn t enough

More Reasons to Sort General technique in computing: Preprocess data to make subsequent operations faster Example: Sort the data so that you can Find the k th largest in constant time for any k Perform binary search to find elements in logarithmic time Whether the performance of the preprocessing matters depends on How often the data will change (and how much it will change) How much data there is

The main problem, stated carefully For now, assume we have n comparable elements in an array and we want to rearrange them to be in increasing order Input: An array A of data records A key value in each data record A comparison function Effect: Reorganize the elements of A such that for any i and j, if i < j then (Also, A must have exactly the same data it started with) Could also sort in reverse order, of course An algorithm doing this is a comparison sort

Variations on the Basic Problem 1. Maybe elements are in a linked list (could convert to array and back in linear time, but some algorithms needn t do so) 2. Maybe ties need to be resolved by original array position Sorts that do this naturally are called 3. Maybe we must not use more than O(1) auxiliary space Sorts meeting this requirement are called 4. Maybe we can do more with elements than just compare Sometimes leads to faster algorithms 5. Maybe we have too much data to fit in memory Use an algorithm

Sorting: The Big Picture Surprising amount of neat stuff to say about sorting: Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: W(n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort Heap sort Merge sort Quick sort Bucket sort Radix sort External sorting

(space for notes from demo)

Insertion Sort Idea: At step k, put the k th element in the correct position among the first k elements Alternate way of saying this: Sort first two elements Now insert 3 rd element in order Now insert 4 th element in order Loop invariant : when loop index is i, first i elements are sorted Time? Best-case Worst-case Average case

Selection sort Idea: At step k, find the smallest element among the not-yet-sorted elements and put it at position k Alternate way of saying this: Find smallest element, put it 1 st Find next smallest element, put it 2 nd Find next smallest element, put it 3 rd Loop invariant : when loop index is i, first i elements are the i smallest elements in sorted order Time? Best-case Worst-case Average case

Insertion Sort vs. Selection Sort Different algorithms Solve the same problem Have the same worst-case and average-case asymptotic complexity Insertion-sort has better best-case complexity; preferable when input is mostly sorted Other algorithms are more efficient for large arrays that are not already almost sorted Insertion sort may do well on small arrays

The Big Picture Surprising amount of juicy computer science: 2-3 lectures Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: W(n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort Heap sort Merge sort Quick sort (avg) Bucket sort Radix sort External sorting

Heap sort Sorting with a heap: insert each arr[i], or better yet use buildheap for(i=0; i < arr.length; i++) arr[i] = Worst-case running time: We have the array-to-sort and the heap So this is not an in-place sort There s a trick to make it in-place

In-place heap sort But this reverse sorts how would you fix that? Treat the initial array as a heap (via buildheap) When you delete the i th element, put it at arr[n-i] That array location isn t needed for the heap anymore! 4 7 5 9 8 6 10 3 2 1 heap part sorted part 5 7 6 9 8 10 4 3 2 1 arr[n-i]= deletemin() heap part sorted part

AVL sort We can also use a balanced tree to: insert each element: total time O(n log n) Repeatedly deletemin: total time O(n log n) Better: in-order traversal O(n), but still O(n log n) overall Compared to heap sort both are O(n log n) in worst, best, and average case neither parallelizes well heap sort is can be done in-place, has better constant factors Design decision: which would you choose between Heap Sort and AVL Sort? Why?

Hash sort??? Finding min item in a hashtable is O(n), so this would be a slower, more complicated selection sort

Divide and conquer Very important technique in algorithm design 1. Divide problem into smaller parts 2. Independently solve the simpler parts Think recursion Or parallelism 3. Combine solution of parts to produce overall solution Two great sorting methods are fundamentally divide-and-conquer (Merge Sort & Quicksort)

Merge Sort Merge Sort: recursively Sort the left half of the elements Sort the right half of the elements Merge the two sorted halves into a sorted whole

(space for notes from demo)

Merge sort 8 2 9 4 5 3 1 6 To sort array from position lo to position hi: If range is 1 element long, it is already sorted! Else: Sort from lo to (hi+lo)/2 Sort from (hi+lo)/2 to hi Merge the two halves together Merging takes two sorted parts and sorts everything O(n) but requires auxiliary space

Merge Sort: Example focused on merging Start with: 8 2 9 4 5 3 1 6 Main array After recursion: (not magic J) 2 4 8 9 1 3 5 6 Main array Merge: Use 3 fingers and 1 more array Auxiliary array (After merge, copy back to original array) Main array

Merge Sort: Example showing recursion Divide 1 Element Divide Divide 8 2 9 4 5 3 1 6 8 2 9 4 5 3 1 6 8 2 8 2 9 4 5 3 1 6 9 4 5 3 1 6 Merge 2 8 4 9 3 5 1 6 Merge 2 4 8 9 1 3 5 6 Merge 1 2 3 4 5 6 8 9

One way to practice on your own time: Make yourself an unsorted array Try using one of the sorting algorithms on it You know you got the right end result if it comes out sorted Can use the same example for merge sort as the previous slide to double check in-between steps

Some details: saving a little time What if the final steps of our merge looked like this: 2 4 5 6 1 3 8 9 Main array 1 2 3 4 5 6 Auxiliary array Wasteful to copy to the auxiliary array just to copy back

Some details: saving a little time If left-side finishes first, just stop the merge and copy back: Main array Auxiliary array If right-side finishes first, copy dregs into right then copy back Main array Auxiliary array

Some details: saving space and copying Simplest / Worst: Use a new auxiliary array of size (hi-lo) for every merge Better: Use a new auxiliary array of size n for every merging stage Better: Reuse same auxiliary array of size n for every merging stage Best (but a little tricky): Don t copy back at 2 nd, 4 th, 6 th, merging stages, use the original array as the auxiliary array and vice-versa Need one copy at end if number of stages is odd

Swapping Original / Auxiliary Array ( best ) First recurse down to lists of size 1 As we return from the recursion, swap between arrays Main array Auxiliary array Main array Auxiliary array Main array Merge by 1 Merge by 2 Merge by 4 Merge by 8 Merge by 16 Auxiliary array Main array Copy if Needed (Arguably easier to code up without recursion at all)

Linked lists and big data We defined sorting over an array, but sometimes you want to sort linked lists One approach: Convert to array: Sort: Convert back to list: Merge sort works very nicely on linked lists directly Heapsort and quicksort do not Insertion sort and selection sort do but they re slower Merge sort is also the sort of choice for external sorting Linear merges minimize disk accesses And can leverage multiple disks to get streaming accesses

Analysis Having defined an algorithm and argued it is correct, we should analyze its running time and space: To sort n elements, we: Return immediately if n=1 Else do 2 subproblems of size and then an merge Recurrence relation:

Analysis intuitively This recurrence is common, you just know it s O(n log n) Merge sort is relatively easy to intuit (best, worst, and average): The recursion tree will have height At each level we do a total amount of merging equal to

Analysis more formally (One of the recurrence classics) For simplicity, ignore constants (let constants be ) T(1) = 1 T(n) = 2T(n/2) + n = 2(2T(n/4) + n/2) + n = 4T(n/4) + 2n = 4(2T(n/8) + n/4) + 2n = 8T(n/8) + 3n. = 2 k T(n/2 k ) + kn We will continue to recurse until we reach the base case, i.e. T(1) for T(1), n/2 k = 1, i.e., log n = k So the total amount of work is 2 k T(n/2 k ) + kn = 2 log n T(1) + n log n = n + n log n = O(n log n)

Divide-and-Conquer Sorting Two great sorting methods are fundamentally divide-and-conquer 1. Merge Sort: Sort the left half of the elements (recursively) Sort the right half of the elements (recursively) Merge the two sorted halves into a sorted whole 2. Quicksort: Pick a pivot element Divide elements into less-than pivot and greater-than pivot Sort the two divisions (recursively on each) Answer is sorted-less-than, followed by pivot, followed by sorted-greater-than

Quicksort Overview (sneak preview) 1. Pick a pivot element 2. Partition all the data into: A. The elements less than the pivot B. The pivot C. The elements greater than the pivot 3. Recursively sort A and C 4. The final answer is as simple as A, B, C (also is an American saying)

Cool Resources http://www.cs.usfca.edu/~galles/visualization/comparisonsort.html http://www.sorting-algorithms.com/ https://www.youtube.com/watch?v=t8g-iyghpea Seriously, check them out!