Algorithms and Data Structures for Mathematicians

Similar documents
Design and Analysis of Algorithms

COMP Analysis of Algorithms & Data Structures

Introduction to Algorithms and Data Structures. Lecture 12: Sorting (3) Quick sort, complexity of sort algorithms, and counting sort

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

EECS 2011M: Fundamentals of Data Structures

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Chapter 8 Sort in Linear Time

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Sorting. Popular algorithms: Many algorithms for sorting in parallel also exist.

COT 6405 Introduction to Theory of Algorithms

Sorting Algorithms. For special input, O(n) sorting is possible. Between O(n 2 ) and O(nlogn) E.g., input integer between O(n) and O(n)

Can we do faster? What is the theoretical best we can do?

Sorting and Selection

Lecture 9: Sorting Algorithms

Design and Analysis of Algorithms PART III

/463 Algorithms - Fall 2013 Solution to Assignment 3

Lecture 2: Divide&Conquer Paradigm, Merge sort and Quicksort

COMP Data Structures

Searching, Sorting. part 1

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

7 Sorting Algorithms. 7.1 O n 2 sorting algorithms. 7.2 Shell sort. Reading: MAW 7.1 and 7.2. Insertion sort: Worst-case time: O n 2.

Chapter 8 Sorting in Linear Time

Deterministic and Randomized Quicksort. Andreas Klappenecker

Data Structures and Algorithms Week 4

Can we do faster? What is the theoretical best we can do?

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Bin Sort. Sorting integers in Range [1,...,n] Add all elements to table and then

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Data Structures and Algorithms Chapter 4

Scribe: Sam Keller (2015), Seth Hildick-Smith (2016), G. Valiant (2017) Date: January 25, 2017

data structures and algorithms lecture 6

Lecture: Analysis of Algorithms (CS )

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Lower bound for comparison-based sorting

CS 303 Design and Analysis of Algorithms

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

Next. 1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

CS 561, Lecture 1. Jared Saia University of New Mexico

Algorithms Chapter 8 Sorting in Linear Time

Lower Bound on Comparison-based Sorting

4. Sorting and Order-Statistics

Copyright 2009, Artur Czumaj 1

1. Covered basics of a simple design technique (Divideand-conquer) 2. Next, more sorting algorithms.

SORTING, SETS, AND SELECTION

Analysis of Algorithms - Quicksort -

8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel

Randomized Algorithms, Quicksort and Randomized Selection

SORTING AND SELECTION

Other techniques for sorting exist, such as Linear Sorting which is not based on comparisons. Linear Sorting techniques include:

Fast Sorting and Selection. A Lower Bound for Worst Case

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Module 3: Sorting and Randomized Algorithms

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

COMP Analysis of Algorithms & Data Structures

1 Probabilistic analysis and randomized algorithms

The divide and conquer strategy has three basic parts. For a given problem of size n,

Data Structures and Algorithms Running time, Divide and Conquer January 25, 2018

Module 3: Sorting and Randomized Algorithms

Module 3: Sorting and Randomized Algorithms. Selection vs. Sorting. Crucial Subroutines. CS Data Structures and Data Management

COMP 352 FALL Tutorial 10

Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

Algorithms and Data Structures. Marcin Sydow. Introduction. QuickSort. Sorting 2. Partition. Limit. CountSort. RadixSort. Summary

Merge Sort

Outline. CS 561, Lecture 6. Priority Queues. Applications of Priority Queue. For NASA, space is still a high priority, Dan Quayle

Quick-Sort fi fi fi 7 9. Quick-Sort Goodrich, Tamassia

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:

21# 33# 90# 91# 34# # 39# # # 31# 98# 0# 1# 2# 3# 4# 5# 6# 7# 8# 9# 10# #

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Randomized algorithms. Inge Li Gørtz

Chapter 3:- Divide and Conquer. Compiled By:- Sanjay Patel Assistant Professor, SVBIT.

Divide and Conquer 4-0

Quicksort (CLRS 7) We saw the divide-and-conquer technique at work resulting in Mergesort. Mergesort summary:

Total Points: 60. Duration: 1hr

RAM with Randomization and Quick Sort

CS Algorithms and Complexity

Quick-Sort. Quick-Sort 1

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

CS Divide and Conquer

CS Divide and Conquer

Arrays, Vectors Searching, Sorting

Sorting (Chapter 9) Alexandre David B2-206

Randomized Algorithms, Hash Functions

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Sorting. Divide-and-Conquer 1

SORTING. Michael Tsai 2017/3/28

Divide-and-Conquer CSE 680

SORTING LOWER BOUND & BUCKET-SORT AND RADIX-SORT

Deliverables. Quick Sort. Randomized Quick Sort. Median Order statistics. Heap Sort. External Merge Sort

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Sorting (Chapter 9) Alexandre David B2-206

Reading for this lecture (Goodrich and Tamassia):

Data Structures and Algorithms. Roberto Sebastiani

Transcription:

Algorithms and Data Structures for Mathematicians Lecture 5: Sorting Peter Kostolányi kostolanyi at fmph and so on Room M-258 26 October 2017

Sorting Algorithms Covered So Far Worst-case time complexity Θ(n 2 ): Insertion Sort Tree Sort ( basic binary search trees) Worst-case time complexity Θ(n log n): Merge Sort Heap Sort Tree Sort (balanced binary search trees)

Quick Sort Uses a divide and conquer strategy in a different way than merge sort: Divide the array a[f ],..., a[l] as follows: 1. Choose a pivot, e.g., as piv a[l] 2. Rearrange the array so that for some f m l: a[i] piv for i = f,..., m 1 a[m] = piv a[i] piv for i = m + 1,..., l Conquer the subproblems by recursively applying the procedure to a[f ],..., a[m 1] and a[m + 1],..., a[l] (simply return for arrays with 1 or 0 elements) Combine phase is not needed here

Quick Sort QUICKSORT(a, f, l): if f l then return else m DIVIDE(a, f, l); QUICKSORT(a, f, m 1); QUICKSORT(a, m + 1, l); end DIVIDE(a, f, l): piv a[l]; j f ; for i f to l 1 do if a[i] piv then a[j] a[i]; j j + 1; end end a[j] a[l]; return j;

Worst-Case Time Complexity of Quick Sort Suppose that a[1],..., a[n] is already sorted Then for each f < l, the array a[f ],..., a[l] is divided as follows: m = l No elements of the array are exchanged One recursive call for a[f ],..., a[l 1] and one for an empty array Time complexity on such input: T s (n) = T s (n 1) + T s (0) + Θ(n) Hence, T s (n) = Θ(n 2 ) For worst-case complexity, this implies T (n) = Ω(n 2 )

Worst-Case Time Complexity of Quick Sort We shall prove that T (n) = O(n 2 ) as well (hence, T (n) = Θ(n 2 )) By definition of worst-case complexity: T (n) max (T (k) + T (n k 1)) + c 1n 0 k n 1 for some c 1 > 0 Let c 2 > 0 be such that T (0) c 2, T (1) c 2, and 2c 2 n 2c 2 + c 1 n for n 2 We shall prove that T (n) c 2 (n 2 + 1) for all n 0 Induction on n: T (0) c2 c 2(0 2 + 1) and T (1) c 2 c 2(1 2 + 1) Now, let n 2 By the induction ( hypothesis: T (k) c2(k 2 + 1) for k = 0,..., n 1 T (n) max c2(k 2 + 1) + c 2((n k 1) 2 + 1) ) + c 1n 0 k n 1 ( T (n) max c2k 2 + c 2(n k 1) 2) + 2c 2 + c 1n 0 k n 1 c2x 2 + c 2(n x 1) 2 has positive second derivative on (0, n 1) The maximum is attained for x = 0 and x = n 1 T (n) c2(n 1) 2 +2c 2+c 1n = c 2n 2 2c 2n+c 2+2c 2+c 1n c 2(n 2 +1)

Why Quick Sort? The worst-case time complexity of quick sort is Θ(n 2 ) Quick sort is much worse than insertion sort on already sorted inputs! What is so quick about quick sort? Good average-case performance If a pivot is not chosen as a[l], but randomly: no adversary can choose an input with bad running time (randomised quick sort) Quick sort is particularly well suited for practical implementation (cache efficiency, etc.)

Randomised Quick Sort Classical quick sort: pivot is set to a[l] Randomised quick sort: pivot is set to a randomly chosen element Randomised DIVIDE: randomly choose an element, exchange it with a[l], and call the ordinary DIVIDE procedure RQUICKSORT(a, f, l): if f l then return else m RDIVIDE(a, f, l); RQUICKSORT(a, f, m 1); RQUICKSORT(a, m + 1, l); end RDIVIDE(a, f, l): index RANDOM(f, l); a[index] a[l]; return DIVIDE(a, f, l);

Analysis of Randomised Quick Sort Let X be a random variable counting comparisons a[i] piv in DIVIDE during the execution of RQUICKSORT The total running time is O(n + X ) (at most n calls of DIVIDE) We shall compute the expected value of X Let a = a[1],..., a[n] contain elements s 1... s n For each i j: s i is compared with s j at most once (each element can be a pivot at most once) Let X i,j = 1 if s i is compared with s j and X i,j = 0 otherwise We obtain: n 1 E [X ] = E = n 1 n i=1 j=i+1 i=1 j=i+1 X i,j = n 1 n i=1 j=i+1 E [X i,j ] = n Pr [s i is compared with s j ]

Analysis of Randomised Quick Sort It remains to compute the probability that s i is compared with s j Let i < j and let us denote S[i, j] := {s i,..., s j } If s i is a first element of S[i, j] chosen as a pivot, then s i is compared with s i+1,..., s j If s j is a first element of S[i, j] chosen as a pivot, then s j is compared with s i,..., s j 1 If some other element of S[i, j] is a first chosen pivot, then s i and s j are not compared at all Pr [s i is compared with s j ] = Pr [s i is a first pivot chosen from S[i, j]] + + Pr [s j is a first pivot chosen from S[i, j]] = 1 = j i + 1 + 1 j i + 1 = 2 j i + 1

Analysis of Randomised Quick Sort As a result: E[X ] = n 1 n i=1 j=i+1 n 1 2 j i + 1 = n i i=1 t=1 n 1 = O(log n) = O(n log n) i=1 n 1 2 t + 1 < The expected time complexity thus is O(n log n) n i=1 t=1 It is not hard to prove that the best-case time complexity is Θ(n log n) Hence, the expected time complexity is Θ(n log n) 2 t =

About Randomised Algorithms Randomised quick sort is a Las Vegas randomised algorithm: It always produces a correct output The running time on a given input is a random variable This is in contrast to Monte Carlo randomised algorithms: These may produce incorrect output The probability of error is typically small Usually can be made so small that it is irrelevant Often more efficient than their best known deterministic counterparts

Sorting by Comparison The sorting algorithms described so far are comparison sorts The produced output depends solely on a sequence of comparisons (of type a[i] a[j]) between the array elements There might be some other actions than comparisons and exchanges, but these depend just on the comparisons made up to the point We shall now prove the following lower bound: The worst-case time complexity of any comparison sort is Ω(n log n) For instance merge sort and heap sort are thus optimal (among comparison sorts)

Lower Bound for Sorting by Comparison Let a[1],..., a[n] be an array A comparison sort makes a sequence of comparisons a[i] a[j] Comparisons a[ ] a[ ], a[ ] = a[ ], a[ ] a[ ], and a[ ] a[ ] can be expressed in terms of a constant number of a[ ] a[ ] Decision trees: Interior nodes are pairs of indices of elements compared Leaves are sorted arrays produced on output

Lower Bound for Sorting by Comparison Example of a decision tree for n = 3: 1, 2 2, 3 1, 3 1, 2, 3 2, 1, 3 1, 3 2, 3 1, 3, 2 3, 1, 2 2, 3, 1 3, 2, 1

Lower Bound for Sorting by Comparison If all input elements are distinct: A comparison sort has to be able to produce n! different outputs Decision trees for inputs of length n need to have at least n! reachable leaves A maximum depth of a reachable leaf has to be at least log(n!) = log n + log(n 1) +... + log 1 n 2 log n 2 The worst-case time complexity is Ω(n log n) as well = Ω(n log n)

Sorting in Linear Time We shall describe a sorting algorithm that: Is not comparison sort Makes relatively strong assumptions about the universe of possible array elements Runs in time that can be considered linear worst-case under some circumstances This is just an example, there are some other similar algorithms as well

Counting Sort Assumes that array elements are integers between 0 and some k For each i from 0 to k: The algorithm counts the number of array elements equal to i Then counts the number of elements less than or equal to i Uses these values to create the sorted array

Counting Sort COUNTINGSORT(a): Create arrays b = b[1],..., b[n] and c = c[0],..., c[k] ; for i 0 to k do c[i] 0; for j 1 to n do c[a[j]] c[a[j]] + 1; for i 0 to k do c[i] c[i] + c[i 1]; for j n downto 1 do b[c[a[j]]] a[j]; c[a[j]] c[a[j]] 1; end return b; If elements have no satellite data, the final for cycle can be executed in increasing order as well The pseudocode shown above results in a stable sorting algorithm Time complexity: Θ(n + k)