CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

Similar documents
Lecture 8: Mergesort / Quicksort Steven Skiena

Data Structures and Algorithms Week 4

Data Structures and Algorithms Chapter 4

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Sorting. Popular algorithms: Many algorithms for sorting in parallel also exist.

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Design and Analysis of Algorithms

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

4. Sorting and Order-Statistics

COMP Data Structures

Sorting and Selection

Can we do faster? What is the theoretical best we can do?

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

Data Structures and Algorithms. Roberto Sebastiani

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Sorting Algorithms. For special input, O(n) sorting is possible. Between O(n 2 ) and O(nlogn) E.g., input integer between O(n) and O(n)

Lecture 9: Sorting Algorithms

Searching, Sorting. part 1

Lecture: Analysis of Algorithms (CS )

CS 303 Design and Analysis of Algorithms

Algorithms and Data Structures for Mathematicians

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Chapter 8 Sort in Linear Time

1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

Design and Analysis of Algorithms PART III

II (Sorting and) Order Statistics

Sorting. Two types of sort internal - all done in memory external - secondary storage may be used

Chapter 8 Sorting in Linear Time

Can we do faster? What is the theoretical best we can do?

Lecture 5: Sorting Part A

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

SORTING AND SELECTION

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort?

Total Points: 60. Duration: 1hr

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Deliverables. Quick Sort. Randomized Quick Sort. Median Order statistics. Heap Sort. External Merge Sort

SORTING LOWER BOUND & BUCKET-SORT AND RADIX-SORT

Other techniques for sorting exist, such as Linear Sorting which is not based on comparisons. Linear Sorting techniques include:

How much space does this routine use in the worst case for a given n? public static void use_space(int n) { int b; int [] A;

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

403: Algorithms and Data Structures. Heaps. Fall 2016 UAlbany Computer Science. Some slides borrowed by David Luebke

Lower Bound on Comparison-based Sorting

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Sorting (I) Hwansoo Han

Heaps and Priority Queues

CS 310 Advanced Data Structures and Algorithms

Algorithms Chapter 8 Sorting in Linear Time

Question 7.11 Show how heapsort processes the input:

Tirgul 4. Order statistics. Minimum & Maximum. Order Statistics. Heaps. minimum/maximum Selection. Overview Heapify Build-Heap

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

Algorithms and Data Structures

SORTING, SETS, AND SELECTION

EECS 2011M: Fundamentals of Data Structures

COT 6405 Introduction to Theory of Algorithms

Randomized Algorithms, Quicksort and Randomized Selection

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Fast Sorting and Selection. A Lower Bound for Worst Case

O(n): printing a list of n items to the screen, looking at each item once.

Premaster Course Algorithms 1 Chapter 2: Heapsort and Quicksort

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

having any value between and. For array element, the plot will have a dot at the intersection of and, subject to scaling constraints.

A data structure and associated algorithms, NOT GARBAGE COLLECTION

11/8/2016. Chapter 7 Sorting. Introduction. Introduction. Sorting Algorithms. Sorting. Sorting

Lecture 7. Transform-and-Conquer

Sorting. Sorting. Stable Sorting. In-place Sort. Bubble Sort. Bubble Sort. Selection (Tournament) Heapsort (Smoothsort) Mergesort Quicksort Bogosort

Introduction. e.g., the item could be an entire block of information about a student, while the search key might be only the student's name

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

Data Structures and Algorithms

Mergesort again. 1. Split the list into two equal parts

CHAPTER 7 Iris Hui-Ru Jiang Fall 2008

Introduction to Algorithms and Data Structures. Lecture 12: Sorting (3) Quick sort, complexity of sort algorithms, and counting sort

Algorithms Lab 3. (a) What is the minimum number of elements in the heap, as a function of h?

Copyright 2009, Artur Czumaj 1

HOMEWORK 4 CS5381 Analysis of Algorithm

Programming II (CS300)

CS 561, Lecture 1. Jared Saia University of New Mexico

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Searching in General

Sorting Algorithms. + Analysis of the Sorting Algorithms

The Heap Data Structure

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

CSC Design and Analysis of Algorithms. Lecture 8. Transform and Conquer II Algorithm Design Technique. Transform and Conquer

CS 234. Module 8. November 15, CS 234 Module 8 ADT Priority Queue 1 / 22

Overview of Presentation. Heapsort. Heap Properties. What is Heap? Building a Heap. Two Basic Procedure on Heap

CSci 231 Final Review

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

CS240 Fall Mike Lam, Professor. Quick Sort

Lower bound for comparison-based sorting

Sorting. Outline. Sorting with a priority queue Selection-sort Insertion-sort Heap Sort Quick Sort

University of Waterloo CS240 Winter 2018 Assignment 2. Due Date: Wednesday, Jan. 31st (Part 1) resp. Feb. 7th (Part 2), at 5pm

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Heaps. Heapsort. [Reading: CLRS 6] Laura Toma, csci2000, Bowdoin College

Parallel Sorting Algorithms

Heapsort. Heap data structure

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

2/26/2016. Divide and Conquer. Chapter 6. The Divide and Conquer Paradigm

Transcription:

CS 5321: Advanced Algorithms Sorting Ali Ebnenasir Department of Computer Science Michigan Technological University Acknowledgement Eric Torng Moon Jung Chung Charles Ofria Nishit Chapter 22 Bill 23 Martin 17 Ryan 34 Prashanti 25 Bryan 19 Hao 26 Jun 18 Ben 16 Nick 15 Bin 14 Jun Ma 32

Outline Heapsort Quicksort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting Why don't CS profs ever stop talking about sorting?! Computers spend more time sorting than anything else, historically 25% on mainframes. Sorting is the best studied problem in computer science, with a variety of different algorithms known. Most of the interesting ideas we encounter in the course are taught in the context of sorting, such as divide-andconquer, randomized algorithms, and lower bounds. You should have seen most of the algorithms - we will concentrate on the analysis Example Problems a. You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay. b. You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

Heaps and Heapsort Definition Operations and uses in heap construction Insertion Heapify Delete Max Heapsort Definition A binary heap is defined to be a binary tree with a key in each node such that: 1: All leaves are on, at most, two adjacent levels. 2: All leaves on the lowest level occur to the left, and all levels except the lowest one are completely filled. 3: The key in root is greater than all its children, and the left and right subtrees are again binary heaps. Conditions 1 and 2 specify shape of the tree, and condition 3 the labeling of the tree. Heap Property Max-Heap property A[parent(i)] >= A[i] Min-Heap property A[parent(i)] =< A[i]

Which of these are heaps? Partial Order Property The ancestor relation in a heap defines a partial order on its elements, which means it is reflexive, anti-symmetric, and transitive. Reflexive: x is an ancestor of itself. Anti-symmetric: if x is an ancestor of y and y is an ancestor of x, then x=y. Transitive: if x is an ancestor of y and y is an ancestor of z, x is an ancestor of z. Partial orders can be used to model hierarchies with incomplete information or equal-valued elements. Questions What are the minimum and maximum number of elements in a heap of height h? 1 node heap has height 0 What is the height of a heap with n elements? Where in a heap might the smallest node reside?

Array Implementation Root stored in index 1 Left(x) = 2x, and Right(x) = 2x + 1 Parent(x) = floor(x/2) 1 2 3 4 5 6 7 8 9 10 11 12 43 41 29 23 37 17 19 3 7 31 1 2 Insertion Operation Place item to be inserted into rightmost open array slot If item is greater than parent, then swap and recurse Number of comparisons in the worst case? 1 2 3 4 5 6 7 8 9 10 11 12 13 43 41 29 23 37 17 19 3 7 31 1 2 33 Root has the maximum Insert Algorithm Insert (X) { N = N+1; A[N] = X; i = N; while ((i >1) && (A[Parent(i)] <X)) { A[i] A[Parent(i)]; i = floor(i/2); A[i] = X;

Heap Construction By Insertion Suppose we did heap construction of an n element heap by sequentially inserting n items Let T(n) denote the number of comparisons needed in the worst-case to build a heap of n items Define a recurrence relation for T(n) T(n) = T(1) = Solve your recurrence relation to derive the worstcase time to build a heap in this manner. Heapify Operation Suppose you have a heap EXCEPT a specific value may violate the heap condition Fix by 3-way comparison working DOWN the heap WC # of comparisons? 23 41 24 1 2 3 4 5 6 7 24 41 29 23 37 17 19 29 37 17 19 Heapify A: an array i: an index to an element in the array that may violate the Max-Heap property MaxHeapify (A,i) { if (2i >N) return; // the node i is a leaf node; if ((2i + 1 <= N) && (A[2i+1] > A[2i] )) j = 2i+1 ; else j = 2i; if (A[j] > A[i] ) { swap (A[i], A[j]); MaxHeapify (A, j);

DeleteMax Operation Copy root value to be returned Move rightmost entry to root Perform heapify to fix up heap WC running time? 1 2 3 4 5 6 7 8 9 10 11 12 2 41 29 23 37 17 19 3 7 31 1 - deletemax (A) { ans = A[1]; A[1] = A[N]; N= N-1; MaxHeapify (A,1); return (ans); DeleteMax Recursive deletemin is much easier Heap Construction By Heapify How can we construct a heap from n numbers by using the heapify operation? Example: 5, 3, 17, 10, 84, 19, 6, 22, 9

Build Heap Approach 1: insert (A[i]), i=0,1,...n. complexity O(nlogn) Approach 2: bottom up construction Key idea: Suppose that a tree with root T, and left subtree T1 and right subtree T2 is a heap. How to adjust them and make a heap? Answer: Use percolate down to create heap-ordered tree Code Buildheap () { for i=n/2 down to 1 {MaxHeapify (i); Analysis: Heap Construction By Heapify There is a direct analysis in the textbook. Here we discuss a recurrence relation analysis Let T(n) denote the number of comparisons needed in the worst-case to build a heap of n items Define a recurrence relation for T(n) T(n) = T(1) = Solve your recurrence relation to derive the worstcase time to build a heap in this manner. Heap Sort How can we use a heap and heap operations to solve the sorting problem? Do we need all three operations studied? Insertion, Heapify, Delete Max What is the running time?

Quicksort Algorithm Overview Choose a pivot element Partition elements to be sorted based on partition element Recursively sort smaller and larger elements Quicksort Walkthrough 17 12 6 23 19 8 5 10 6 8 5 10 17 12 23 19 5 6 8 17 12 19 23 6 8 12 17 23 6 17 5 6 8 10 12 17 19 23 Pseudocode Quicksort(A, low, high) { if (low < high) { pivotlocation = Partition(A,low,high); Quicksort(A,low, pivotlocation - 1); Quicksort(A, pivotlocation+1, high);

Pseudocode int Partition(A,low,high) { pivot = A[high]; leftwall = low-1; for i = low to high-1 { if (A[i] < pivot) then { leftwall = leftwall+1; swap(a[i],a[leftwall]); swap(a[high],a[leftwall+1]); return leftwall+1; Worst Case for Quicksort Average Case for Quicksort?

Intuitive Average Case Analysis 0 n/4 n/2 3n/4 n Anywhere in the middle half is a decent partition (3/4) h n = 1 => n = (4/3) h // h is the number of partitions log(n) = h log(4/3) h = log(n) / log(4/3) < 2 log(n) How many steps? At most 2log(n) decent partitions suffices to sort an array of n elements. But if we just take arbitrary pivot points, how often will they, in fact, be decent? Since any number ranked between n/4 and 3n/4 would make a decent pivot, half the pivots on average are decent. Therefore, on average, we will need 2 x 2log(n) = 4log(n) partitions to guarantee sorting. Formal Average-Case Analysis Let X denote the random variable that represents the total number of comparisons performed Let indicator variable X ij = the event that the i-th smallest element and j-th smallest element are compared 0 if not compared, 1 if compared X = Σ i=1 to n-1 Σ j=i+1 to n X ij E[X] = Σ i=1 to n-1 Σ j=i+1 to n E[X ij ]

Computing E[X ij ] E[X ij ] = probability that i and j are compared Observation All comparisons are between a pivot element and another element If an item k is chosen as pivot where i < k < j, then items i and j will not be compared E[X ij ] = Items i or j must be chosen as a pivot before any items in interval (i..j) Computing E[X] E[X] = Σ i=1 to n-1 Σ j=i+1 to n E[X ij ] = Σ i=1 to n-1 Σ j=i+1 to n 2/(j-i+1) = Σ i=1 to n-1 Σ k=1 to n-i 2/(k+1) <= Σ i=1 to n-1 2 H n-i+1 <= Σ i=1 to n-1 2 H n = 2 (n-1)h n, where H n = 1+ 1/2 + 1/3 + + 1/n = ln n + O(1) Alternative Average-Case Analysis Let T(n) denote the average time required to sort an n-element array Average assuming each permutation equally likely Write a recurrence relation for T(n) T(n) = Using induction, we can then prove that T(n) = O(n log n) Requires some simplification and summation manipulation

Avoiding the Worst-Case Understanding quicksort s worst-case Methods for avoiding it Pivot strategies Randomization Understanding the worst case A B D F H J K A B D F H J A B D F H A B D F A B D A B A The worst case occurrence is a likely case for many applications. Pivot Strategies Use the middle Element of the sub-array as the pivot. Use the median element (first, middle, last) to make sure to avoid any kind of presorting. What is the worst-case performance for these pivot selection mechanisms?

Randomized Quicksort Make chance of worst-case run time equally small for all inputs Methods Choose pivot element randomly from range [low..high] Initially permute the array Sorting Algorithm Review I Θ(n 2 ) worst-case methods Insertion Sort Selection Sort Bubble Sort What is the idea behind each method? What are advantages/disadvantages of each method? Sorting Algorithm Review II Faster methods Merge Sort Quicksort Heapsort What is the idea behind merge sort? What are advantages/disadvantages of each method?

Quicksort Optimizations Quicksort is regarded as the fastest sort algorithm in most cases Some optimization possibilities Randomized pivot selection: guarantees to never have worst-case time due to bad data. Median of three pivot selection: Can be slightly faster than randomization for somewhat sorted data. Leave small sub-arrays for insertion sort: Insertion sort can be faster, in practice, for small values of n. Do the smaller partition first: minimize runtime memory. Possible reasons for not choosing quicksort Is the data already partially sorted? Do we know the distribution of the keys? Is the range of possible keys very small? Lower Bounds Any comparison-based sorting program can be thought of as defining a decision tree of possible executions.

Example Decision Tree Analysis of Decision Tree Consider the decision tree T for any comparison-based algorithm. T must have at least n! leaves. Why? Given that there are n! leaves, what must the height of the decision tree be? What does this imply about the running time of any comparison-based algorithm? Linear Time Sorting Algorithms exist for sorting n items in Θ(n) time IF we can make some assumptions about the input data These algorithms do not sort solely by comparisons, thus avoiding the Ω (n log n) lower bound on comparison-based sorting algorithms Counting sort Radix Sort Bucket Sort

Counting Sort Assumption: Keys to be sorted have a limited finite range, say [0.. k-1] Method: Count number of items with value exactly i Compute number of items with value at most i Use counts for placement of each item in final array Full details in book Running time: Θ(n+k) Key observation: Counting sort is stable Counting Sort Algorithm A: input arrays of length n B: output array of length n C: counting array of length k; C[i] denotes the number of elements less than or equal to i 1.. k : range of input numbers Counting-Sort(A, B, k) { 1) For i = 1 to k C[i] = 0; 2) For j = 1 to n C[A[j]] = C[A[j]] + 1; 3) For i = 1 to k C[i] = C[i] + C[i-1]; 4) For j = n downto 1 B[C[A[j]]] = A[j]; C[A[j]] = C[A[j]] 1; Radix Sort Assumption: Keys to be sorted have d digits Method: Use counting sort (or any stable sort) to sort numbers starting with least significant digit and ending with most significant digit Running time: Θ(d(n+k)) where k is the number of possible values for each digit, i.e., radix

Radix Sort - Example 110011 51 101001 41 010011 19 000110 6 110000 48 011001 25 010111 23 000110 6 110000 48 110011 51 101001 41 010011 19 011001 25 010111 23 110000 48 101001 41 011001 25 000110 6 110011 51 010011 19 010111 23... How do you describe stability property on this example? Bucket Sort Assumption: Keys to be sorted are uniformly distributed over a known range (say 1 to m) Method: Set up n buckets where each bucket is responsible for an equal portion of the range Sort items in buckets using insertion sort Concatenate sorted lists of items from buckets to get final sorted order Bucket Sort Analysis Key analysis: Let X be a random variable for the # of comparisons required by insertion sort on items in each bucket Let n i be the number of items in bucket i E[X] = Σ i=1 to n O(E[n i2 ]) E[n i2 ] = 2 1/n derivation in book Intuition: What is E[n i ]? Question: Why use insertion sort rather than quicksort?

Bucketsort Gone Wrong We can use bucketsort effectively whenever we understand the distribution of the data. However, bad things happen when we assume the wrong distribution. Make sure you understand your data, or use a good worst-case or randomized algorithm! 1 m/n m/n+1 2m/n 2m/n+1 3m/n