Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

Similar documents
Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

Lower bound for comparison-based sorting

Lower Bound on Comparison-based Sorting

How fast can we sort? Sorting. Decision-tree example. Decision-tree example. CS 3343 Fall by Charles E. Leiserson; small changes by Carola

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Chapter 8 Sort in Linear Time

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Chapter 8 Sorting in Linear Time

Lecture: Analysis of Algorithms (CS )

Design and Analysis of Algorithms

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

II (Sorting and) Order Statistics

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Lecture 5: Sorting Part A

EECS 2011M: Fundamentals of Data Structures

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

MergeSort. Algorithm : Design & Analysis [5]

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

COMP Data Structures

Algorithms Lab 3. (a) What is the minimum number of elements in the heap, as a function of h?

8 SortinginLinearTime

4. Sorting and Order-Statistics

CS 561, Lecture 1. Jared Saia University of New Mexico

Module 3: Sorting and Randomized Algorithms

Module 3: Sorting and Randomized Algorithms. Selection vs. Sorting. Crucial Subroutines. CS Data Structures and Data Management

How to Win Coding Competitions: Secrets of Champions. Week 3: Sorting and Search Algorithms Lecture 7: Lower bound. Stable sorting.

Introduction to Algorithms

Lower and Upper Bound Theory. Prof:Dr. Adnan YAZICI Dept. of Computer Engineering Middle East Technical Univ. Ankara - TURKEY

Introduction to Algorithms 6.046J/18.401J

Question 7.11 Show how heapsort processes the input:

Partha Sarathi Manal

CS : Data Structures

Properties of a heap (represented by an array A)

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Spring 2016

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

Module 3: Sorting and Randomized Algorithms

The complexity of Sorting and sorting in linear-time. Median and Order Statistics. Chapter 8 and Chapter 9

Binary heaps (chapters ) Leftist heaps

Binary Heaps in Dynamic Arrays

CSE 326: Data Structures Sorting Conclusion

Introduction to Algorithms

Lecture Notes for Advanced Algorithms

University of Waterloo CS240, Winter 2010 Assignment 2

Data Structures and Algorithms Chapter 4

University of Waterloo CS240 Winter 2018 Assignment 2. Due Date: Wednesday, Jan. 31st (Part 1) resp. Feb. 7th (Part 2), at 5pm

Data Structures. Sorting. Haim Kaplan & Uri Zwick December 2013

Sorting isn't only relevant to records and databases. It turns up as a component in algorithms in many other application areas surprisingly often.

Reading for this lecture (Goodrich and Tamassia):

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

Examination Questions Midterm 2

Cosc 241 Programming and Problem Solving Lecture 17 (30/4/18) Quicksort

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

Data Structures and Algorithms Week 4

COMP Analysis of Algorithms & Data Structures

lecture notes September 2, How to sort?

Lecture 3. Recurrences / Heapsort

Outline. Computer Science 331. Three Classical Algorithms. The Sorting Problem. Classical Sorting Algorithms. Mike Jacobson. Description Analysis

Lecture 8: Mergesort / Quicksort Steven Skiena

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

COT 6405 Introduction to Theory of Algorithms

SORTING. Insertion sort Selection sort Quicksort Mergesort And their asymptotic time complexity

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort?

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

CPSC 320 Midterm 2 Thursday March 13th, 2014

Can we do faster? What is the theoretical best we can do?

1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

COSC 311: ALGORITHMS HW1: SORTING

ECE608 - Chapter 8 answers

Quicksort (Weiss chapter 8.6)

1 Lower bound on runtime of comparison sorts

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers

COMP Analysis of Algorithms & Data Structures

Outline. Computer Science 331. Heap Shape. Binary Heaps. Heap Sort. Insertion Deletion. Mike Jacobson. HeapSort Priority Queues.

Sorting and Selection

l Find a partner to bounce ideas off of l Assignment 2 l Proofs should be very concrete l Proving big-o, must satisfy definition

Data Structures and Algorithms. Roberto Sebastiani

Stacks, Queues, and Priority Queues. Inf 2B: Heaps and Priority Queues. The PriorityQueue ADT

MA 252: Data Structures and Algorithms Lecture 9. Partha Sarathi Mandal. Dept. of Mathematics, IIT Guwahati

9/10/2018 Algorithms & Data Structures Analysis of Algorithms. Siyuan Jiang, Sept

Heapsort. Heap data structure

SORTING AND SELECTION

Introduction to Algorithms and Data Structures. Lecture 12: Sorting (3) Quick sort, complexity of sort algorithms, and counting sort

Overview of Presentation. Heapsort. Heap Properties. What is Heap? Building a Heap. Two Basic Procedure on Heap

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Mergesort again. 1. Split the list into two equal parts

CISC 621 Algorithms, Midterm exam March 24, 2016

1 Interlude: Is keeping the data sorted worth it? 2 Tree Heap and Priority queue

Algorithms Chapter 8 Sorting in Linear Time

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

CS61B Lectures #28. Today: Lower bounds on sorting by comparison Distribution counting, radix sorts

CS2 Algorithms and Data Structures Note 1

The priority is indicated by a number, the lower the number - the higher the priority.

Lecture 6. Binary Search Trees and Red-Black Trees

Theory and Algorithms Introduction: insertion sort, merge sort

Transcription:

Algorithms and Data Structures: Lower Bounds for Sorting ADS: lect 7 slide 1

ADS: lect 7 slide 2 Comparison Based Sorting Algorithms Definition 1 A sorting algorithm is comparison based if comparisons A[i] < A[j], A[i] A[j], A[i] = A[j], A[i] A[j], A[i] > A[j] are the only ways in which it accesses the input elements.

Comparison Based Sorting Algorithms Definition 1 A sorting algorithm is comparison based if comparisons A[i] < A[j], A[i] A[j], A[i] = A[j], A[i] A[j], A[i] > A[j] are the only ways in which it accesses the input elements. Comparison based sorting algorithms are generic in the sense that they can be used for all types of elements that are comparable (such as objects of type Comparable in Java). ADS: lect 7 slide 2

Comparison Based Sorting Algorithms Definition 1 A sorting algorithm is comparison based if comparisons A[i] < A[j], A[i] A[j], A[i] = A[j], A[i] A[j], A[i] > A[j] are the only ways in which it accesses the input elements. Comparison based sorting algorithms are generic in the sense that they can be used for all types of elements that are comparable (such as objects of type Comparable in Java). Example 2 Insertion-Sort, Quicksort, Merge-Sort, Heapsort are all comparison based. ADS: lect 7 slide 2

ADS: lect 7 slide 3 The Decision Tree Model Abstractly, we may describe the behaviour of a comparison-based sorting algorithm S on an input array A = A[1],..., A[n] by a decision tree: A[i] < A[j] i : j A[i] > A[j] A[k] < A[l] k : l A[i] = A[j] p : q r : s t : u At each leaf of the tree the output of the algorithm on the corresponding execution branch will be displayed. Outputs of sorting algorithms correspond to permutations of the input array.

ADS: lect 7 slide 4 A Simplifying Assumption In the following, we assume that all keys of elements of the input array of a sorting algorithm are distinct. (It is ok to restrict to a special case, because we want to prove a lower bound.) Thus the outcome A[i] = A[j] in a comparison will never occur, and the decision tree is in fact a binary tree: A[i] < A[j] i : j A[i] > A[j] A[k] < A[l] k : l r : s t : u

ADS: lect 7 slide 5 Insertion sort for n = 3: Example 1 : 2 2 : 3 1 : 3 A[1] A[2] A[3] 1 : 3 A[2] A[1] A[3] 2 : 3 A[1] A[3] A[2] A[3] A[1] A[2] A[2] A[3] A[1] A[3] A[2] A[1] In insertion sort, when we get the result of a comparison, we often swap some elements of the array. In showing decision trees, we don t implement a swap. Our indices always refer to the original elements at that position in the array. To understand what I mean, draw the evolving array of InsertionSort beside this decision tree.

ADS: lect 7 slide 6 A Lower Bound for Comparison Based Sorting For a comparison based sorting algorithm S: C S (n) = worst-case number of comparisons performed by S on an input array of size n.

ADS: lect 7 slide 6 A Lower Bound for Comparison Based Sorting For a comparison based sorting algorithm S: C S (n) = worst-case number of comparisons performed by S on an input array of size n. Theorem 3 For all comparison based sorting algorithms S we have C S (n) = Ω(n lg n).

ADS: lect 7 slide 6 A Lower Bound for Comparison Based Sorting For a comparison based sorting algorithm S: C S (n) = worst-case number of comparisons performed by S on an input array of size n. Theorem 3 For all comparison based sorting algorithms S we have C S (n) = Ω(n lg n). Corollary 4 The worst-case running time of any comparison based sorting algorithm is Ω(n lg n).

ADS: lect 7 slide 7 A Lower Bound for Comparison Based Sorting Proof of Theorem 3 uses Decision-Tree Model of sorting. It is an Information-Theoretic Lower Bound:

A Lower Bound for Comparison Based Sorting Proof of Theorem 3 uses Decision-Tree Model of sorting. It is an Information-Theoretic Lower Bound: Information-Theoretic means that it is based on the amount of information that an instance of the problem can encode. ADS: lect 7 slide 7

A Lower Bound for Comparison Based Sorting Proof of Theorem 3 uses Decision-Tree Model of sorting. It is an Information-Theoretic Lower Bound: Information-Theoretic means that it is based on the amount of information that an instance of the problem can encode. For sorting, the input can encode n! outputs. ADS: lect 7 slide 7

A Lower Bound for Comparison Based Sorting Proof of Theorem 3 uses Decision-Tree Model of sorting. It is an Information-Theoretic Lower Bound: Information-Theoretic means that it is based on the amount of information that an instance of the problem can encode. For sorting, the input can encode n! outputs. Proof does not make any assumption about how the sorting might be done (except it is comparison-based). ADS: lect 7 slide 7

ADS: lect 7 slide 8 Proof of Theorem 3 Observation 5 For every n, C S (n) is the height of the decision tree of S on inputs n (the longest path from the root to a leaf is the maximum number of comparisons that algorithm S will do on an input of length n). We shall prove a lower bound for the height of the decision tree for any algorithm S. Remark Maybe you are wondering... was it really ok to assume all keys are distinct?

ADS: lect 7 slide 8 Proof of Theorem 3 Observation 5 For every n, C S (n) is the height of the decision tree of S on inputs n (the longest path from the root to a leaf is the maximum number of comparisons that algorithm S will do on an input of length n). We shall prove a lower bound for the height of the decision tree for any algorithm S. Remark Maybe you are wondering... was it really ok to assume all keys are distinct? It is ok - because the problem of sorting n keys (with no distinctness assumption) is more general than the problem of sorting n distinct keys. The worst-case for sorting certainly is as bad as the worst-case for all-distinct keys sorting.

Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). ADS: lect 7 slide 9

ADS: lect 7 slide 9 Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). By Obs 6, the decision tree on inputs of size n has at least n! leaves (for any algorithm S).

ADS: lect 7 slide 9 Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). By Obs 6, the decision tree on inputs of size n has at least n! leaves (for any algorithm S). The (simplified) decision tree is a binary tree. A binary tree of height h has at most 2 h leaves.

ADS: lect 7 slide 9 Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). By Obs 6, the decision tree on inputs of size n has at least n! leaves (for any algorithm S). The (simplified) decision tree is a binary tree. A binary tree of height h has at most 2 h leaves. Putting everything together, we get n! number of leaves of decision tree

Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). By Obs 6, the decision tree on inputs of size n has at least n! leaves (for any algorithm S). The (simplified) decision tree is a binary tree. A binary tree of height h has at most 2 h leaves. Putting everything together, we get n! number of leaves of decision tree height of decision tree 2 ADS: lect 7 slide 9

Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). By Obs 6, the decision tree on inputs of size n has at least n! leaves (for any algorithm S). The (simplified) decision tree is a binary tree. A binary tree of height h has at most 2 h leaves. Putting everything together, we get n! number of leaves of decision tree height of decision tree 2 2 C S (n). ADS: lect 7 slide 9

Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). By Obs 6, the decision tree on inputs of size n has at least n! leaves (for any algorithm S). The (simplified) decision tree is a binary tree. A binary tree of height h has at most 2 h leaves. Putting everything together, we get n! number of leaves of decision tree height of decision tree 2 2 C S (n). Thus C S (n) lg(n!) = Ω(n lg(n)). ADS: lect 7 slide 9

Observation 6 Each permutation of the inputs must occur at at least one leaf of the decision tree. (Obs 6 must be true, if our algorithm is to sort properly for all inputs.). By Obs 6, the decision tree on inputs of size n has at least n! leaves (for any algorithm S). The (simplified) decision tree is a binary tree. A binary tree of height h has at most 2 h leaves. Putting everything together, we get n! number of leaves of decision tree height of decision tree 2 2 C S (n). Thus C S (n) lg(n!) = Ω(n lg(n)). To obtain the last inequality, we can use the following inequality: n n/2 n! n n This tells us that lg n! lg(n n/2 ) = (n/2) lg n = Ω(n lg(n)). Thm 3 QED ADS: lect 7 slide 9

ADS: lect 7 slide 10 An Average Case Lower Bound For any comparison based sorting algorithm S: A S (n) = average number of comparisons performed by S on an input array of size n. Theorem 7 For all comparison based sorting algorithms S we have A S (n) = Ω(n lg n).

An Average Case Lower Bound For any comparison based sorting algorithm S: A S (n) = average number of comparisons performed by S on an input array of size n. Theorem 7 For all comparison based sorting algorithms S we have A S (n) = Ω(n lg n). Proof uses the fact that the average length of a path from the root to a leaf in a binary tree with l leaves is Ω(lg l) (Theorem 11 and 12). Corollary 8 The average-case running time of any comparison based sorting algorithm is Ω(n lg n). ADS: lect 7 slide 10

Average root-leaf length in Binary tree Definition 9 For any binary tree T, let AvgRL(T ) denote the average root-to-leaf length for T. Definition 10 A near-complete binary tree T is a binary tree in which every internal node has exactly two child nodes, and all leaves are either at depth h or depth h 1. Theorem 11 Any near-complete binary tree T with leaf set L(T ), L(T ) 4, has Average root-to-leaf length AvgRL(T ) at least lg( L(T ) )/2. Theorem 12 For any binary tree T, there is a near-complete binary tree T such that L(T ) = L(T ) (same leaf set) and such that AvgRL(T ) AvgRL(T ). Hence AvgRL(T ) lg( L(T ) )/2 holds for all binary trees. Proof of Theorems 11 and 12 via BOARD notes. ADS: lect 7 slide 11

Implications of These Lower Bounds Theorem 3 and Theorem 7 are significant because they hold for all comparison-based algorithms S. They imply the following: 1. By Thm 3, any comparison-based algorithm for sorting which has a worst-case running-time of O(n lg n) is asymptotically optimal (ie, apart from the constant factor inside the O term, it is as good as possible in terms of worst-case analysis). This includes algorithms like MergeSort, HeapSort. 2. By Thm 7, any comparison-based algorithm for sorting which has an average-case running-time of O(n lg n) is the best you can hope for in terms of average-case analysis (apart from the constant factor inside the O term). This is accomplished by MergeSort and HeapSort. In Lecture 8, we will see that it is also true for QuickSort (for the average-case, not the worst-case). ADS: lect 7 slide 12

ADS: lect 7 slide 13 Lecture 9 (after average-case analysis of QuickSort) We will show how in a special case of sorting (when the inputs are numbers, coming from the range {1, 2,..., n k } for some constant k, we can sort in linear time (not a comparison-based algorithm). Reading Assignment [CLRS] Section 8.1 (2nd and 3rd edition) or [CLR] Section 9.1 Well-worth reading - this is a nice chapter of CLRS (not too long).

Problems 1. Draw (simplified) decision trees for Insertion Sort and Quicksort for n = 4. 2. Exercise 8.1-1 of [CLRS] (both 2nd and 3rd ed). 3. Resolve the complexity (in terms of no-of-comparisons) of sorting 4 numbers. 3.1 Give an algorithm which sorts any 4 numbers and which uses at most 5 comparisons in the worst-case. 3.2 Prove (using the decision-tree model) that there is no algorithm to sort 4 numbers, which uses less than 5 comparisons in the worst-case. ADS: lect 7 slide 14