COT 6405 Introduction to Theory of Algorithms

Similar documents
Design and Analysis of Algorithms

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Algorithms Chapter 8 Sorting in Linear Time

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Can we do faster? What is the theoretical best we can do?

Chapter 8 Sorting in Linear Time

II (Sorting and) Order Statistics

Can we do faster? What is the theoretical best we can do?

Sorting. Popular algorithms: Many algorithms for sorting in parallel also exist.

Lecture: Analysis of Algorithms (CS )

Other techniques for sorting exist, such as Linear Sorting which is not based on comparisons. Linear Sorting techniques include:

Chapter 8 Sort in Linear Time

Sorting Shabsi Walfish NYU - Fundamental Algorithms Summer 2006

Practical Session #11 Sorting in Linear Time

Fast Sorting and Selection. A Lower Bound for Worst Case

The complexity of Sorting and sorting in linear-time. Median and Order Statistics. Chapter 8 and Chapter 9

CS 303 Design and Analysis of Algorithms

Algorithms and Data Structures for Mathematicians

8 SortinginLinearTime

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

Introduction to Algorithms

Data Structures and Algorithms. Course slides: Radix Search, Radix sort, Bucket sort, Patricia tree

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

CS2351 Data Structures. Lecture 5: Sorting in Linear Time

Data Structures and Algorithms

7 Sorting Algorithms. 7.1 O n 2 sorting algorithms. 7.2 Shell sort. Reading: MAW 7.1 and 7.2. Insertion sort: Worst-case time: O n 2.

CSE 326: Data Structures Sorting Conclusion

CS 5321: Advanced Algorithms Sorting. Acknowledgement. Ali Ebnenasir Department of Computer Science Michigan Technological University

Comparison Based Sorting Algorithms. Algorithms and Data Structures: Lower Bounds for Sorting. Comparison Based Sorting Algorithms

Algorithms and Data Structures: Lower Bounds for Sorting. ADS: lect 7 slide 1

data structures and algorithms lecture 6

ECE608 - Chapter 8 answers

CPSC 311 Lecture Notes. Sorting and Order Statistics (Chapters 6-9)

1 ICS 161: Design and Analysis of Algorithms Lecture notes for January 23, Bucket Sorting

Sorting Algorithms. For special input, O(n) sorting is possible. Between O(n 2 ) and O(nlogn) E.g., input integer between O(n) and O(n)

CS61B, Spring 2003 Discussion #15 Amir Kamil UC Berkeley 4/28/03

How fast can we sort? Sorting. Decision-tree example. Decision-tree example. CS 3343 Fall by Charles E. Leiserson; small changes by Carola

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Lower bound for comparison-based sorting

Introduction to Algorithms and Data Structures. Lecture 12: Sorting (3) Quick sort, complexity of sort algorithms, and counting sort

Introduction to Algorithms 6.046J/18.401J

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

Module 3: Sorting and Randomized Algorithms. Selection vs. Sorting. Crucial Subroutines. CS Data Structures and Data Management

Module 3: Sorting and Randomized Algorithms

Data Structures. Sorting. Haim Kaplan & Uri Zwick December 2013

SORTING LOWER BOUND & BUCKET-SORT AND RADIX-SORT

CS61B Lectures # Purposes of Sorting. Some Definitions. Classifications. Sorting supports searching Binary search standard example

Introduction to Algorithms

SORTING AND SELECTION

CPE702 Sorting Algorithms

Divide and Conquer Sorting Algorithms and Noncomparison-based

Chapter 4: Sorting. Spring 2014 Sorting Fun 1

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

CSE373: Data Structure & Algorithms Lecture 21: More Comparison Sorting. Aaron Bauer Winter 2014

COMP Data Structures

EECS 2011M: Fundamentals of Data Structures

2/26/2016. Divide and Conquer. Chapter 6. The Divide and Conquer Paradigm

4. Sorting and Order-Statistics

Algorithms and Data Structures (INF1) Lecture 7/15 Hua Lu

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Parallel Sorting Algorithms

Lower and Upper Bound Theory. Prof:Dr. Adnan YAZICI Dept. of Computer Engineering Middle East Technical Univ. Ankara - TURKEY

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

COMP 352 FALL Tutorial 10

Introduction. e.g., the item could be an entire block of information about a student, while the search key might be only the student's name

Design and Analysis of Algorithms PART III

Bin Sort. Sorting integers in Range [1,...,n] Add all elements to table and then

SORTING, SETS, AND SELECTION

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk

Module 3: Sorting and Randomized Algorithms

Bucket-Sort and Radix-Sort

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Sorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Count Sort, Bucket Sort, Radix Sort. CSE 2320 Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington

Bucket-Sort and Radix-Sort

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

CS61B Lectures #28. Today: Lower bounds on sorting by comparison Distribution counting, radix sorts

CSci 231 Final Review

Question 7.11 Show how heapsort processes the input:

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Lecture 23: Priority Queues, Part 2 10:00 AM, Mar 19, 2018

CHAPTER 7 Iris Hui-Ru Jiang Fall 2008

8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel

Randomized Algorithms, Quicksort and Randomized Selection

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

Searching in General

Partha Sarathi Manal

Algorithms Dr. Haim Levkowitz

Algorithms and Data Structures. Marcin Sydow. Introduction. QuickSort. Sorting 2. Partition. Limit. CountSort. RadixSort. Summary

CS 561, Lecture 1. Jared Saia University of New Mexico

Lecture 19 Sorting Goodrich, Tamassia

Lecture 5: Sorting in Linear Time CSCI Algorithms I. Andrew Rosenberg

Unit #3: Recursion, Induction, and Loop Invariants

Lecture 6: Sequential Sorting

11/8/2016. Chapter 7 Sorting. Introduction. Introduction. Sorting Algorithms. Sorting. Sorting

Example Algorithms. CSE 2320 Algorithms and Data Structures Alexandra Stefan. University of Texas at Arlington. Last updated 9/7/2016

Transcription:

COT 6405 Introduction to Theory of Algorithms Topic 10. Linear Time Sorting 10/5/2016 2

How fast can we sort? The sorting algorithms we learned so far Insertion Sort, Merge Sort, Heap Sort, and Quicksort How fast are they? Insertion sort O(n 2 ) Merge Sort O(nlgn) Heap Sort O(nlgn) Quicksort O(nlgn) 10/5/2016 3

Common property Use only comparisons between elements to gain order information about an input sequence Comparison sort Given two elements a i and a j, we perform one of the following tests to determine their relative order a i < a j, a i a j, a i = a j, a i a j, a i > a j 10/5/2016 4

Decision trees We can view comparison sorts abstractly in terms of decision trees A decision tree is a binary tree that represents the comparisons between elements Each node on the tree is a comparison of i:j, i.e., a i v.s. a j 10/5/2016 5

Constructing the decision tree Given an input sequence {a 1, a 2, a 3 } 1:2 > 2:3 1:3 > > 1:2:3 1:3 2:1:3 2:3 > > 1:3:2 3:1:2 2:3:1 3:2:1 10/5/2016 6

Decision tree for an input set of four elements Given an input sequence {a 1, a 2, a 3, a 4 } 10/5/2016 7

Decision trees (cont d) What do the leaves represent? The leaf node in the tree indicates the sorted ordering How many leaves must be there for an input of size n Each of the n! permutations on n elements must appear as one of the leaves of the decision tree 10/5/2016 8

Lemma Any binary tree of height h has 2 h leaves In other words: i = number of leaves h = height Then, i 2 h How to prove this? 9

Theorem 8.1 Any comparison sort algorithm requires Ω(nlgn) comparisons in the worst case How to prove? By proving that the height of the decision tree is Ω(nlgn) What s the # of leaves of a decision tree? l =? What s the maximum # of leaves of a general binary tree? l max =? 10/5/2016 10

Proof l = n! and l max = 2 h Clearly, the # of leaves of a decision tree is less than or equal to the maximum # of leaves in a general binary tree So we have: n! 2 h Taking logarithms: lg (n!) h 10/5/2016 11

Proof (cont d) Stirling s approximation tells us: Thus, h lg (n!) h n! n n e n n lg e nlgn lg 10/5/2016 12 n n n lg e

Sorting in linear time Counting sort No direct comparisons between elements! Depends on assumption about the numbers being sorted We assume numbers are in the range [0.. k] The algorithm is NOT in place Input: A[1..n], where A[j] {0, 2, 3,, k} Output: B[1..n], sorted Auxiliary counter storage: Array C[0..k] notice: A[], B[], and C[] not sorting in place 13

Counting sort 1 CountingSort(A, B, k) 2 for i= 0 to k // counter initialization 3 C[i]= 0; 4 for j= 1 to A.length // counting each number 5 C[A[j]] += 1; 6 for i= 1 to k // aggregate counters 7 C[i] = C[i] + C[i-1]; 8 for j= A.length downto 1 //move results 9 B[C[A[j]]] = A[j]; 10 C[A[j]] -= 1; 14

A counting sort example Numbers are in the range [0.. 5] A 2 5 3 0 2 3 0 3 k = 5? C 0 1 2 3 4 5 0 0 0 0 0 0 10/5/2016 15

Filling the C array A C 2 5 3 0 2 3 0 3 0 1 2 3 4 5 21 0 0 21 0 20 13 0 01 10/5/2016 16

Filling the C array (Cont d) C 0 1 2 3 4 5 2 20 42 73 70 81 10/5/2016 17

Sorting the numbers A B C 2 5 3 0 2 3 0 3 1 2 3 4 5 6 7 8 0 1 2 3 4 5 2 2 4 7 7 8 10/5/2016 18

Sorting the numbers A B C 2 5 3 0 2 3 0 3 1 2 3 4 5 6 7 8 0 3 0 1 2 3 4 5 21 2 4 76 7 8 10/5/2016 19

Sorting the numbers A B C 2 5 3 0 2 3 0 3 1 2 3 4 5 6 7 8 0 3 3 0 1 2 3 4 5 21 2 4 76 5 7 8 10/5/2016 20

Total time: O(n + k) Counting sort Usually, k = O(n) k < c n Thus counting sort runs in O(n) time But sorting is (n lg n)! Contradiction? No contradiction--this is not a comparison sort (in fact, there are no comparisons at all!) Notice that this algorithm is stable The elements with the same value is in the same order as the original index i < j, a i = a j new index i < j 22

Stable sorting Counting sort is a stable sort: it preserves the input order among equal elements. A: 4 1 3 4 3 B: 1 3 3 4 4

Counting Sort Why don t we always use counting sort? Because it depends on range k of elements Could we use counting sort to sort 32 bit integers? Why or why not? Answer: no, k too large (2 32 = 4,294,967,296) We need huge arrays, e.g., C[4,294,967,296]? k >> n O(n+k) = O(k) 24

Radix Sort Intuitively, we may sort on the most significant digit (MSD), then the second msd, etc. Recursive MSD radix sort: Take the k-th most significant digit (MSD) Sort based on that digit, grouping same digit elements into one bucket In each bucket, start with the next digit and sort recursively Finally, concatenate the buckets in order 25

An example of a forward recursive MSD radix sort Original sequence: 170, 045, 075, 090, 002, 024, 802, 066 1st pass- Sorting by most significant digit (100 s): Zero bucket: 045, 075, 090, 002, 024, 066 One bucket: 170 Eight bucket: 802 10/5/2016 26

An example (cont d) 2nd pass- Sorting by next most significant digit (10 s), only needed by numbers in zero bucket: 045, 075, 090, 002, 024, 066 Zero bucket: 002 Twenties bucket: 024 Forties bucket: 045 Sixties bucket: 066 Seventies bucket: 075 Nineties bucket: 090 10/5/2016 27

An example (cont d) 3rd pass- Sorting by least significant digit (1 s): no need because there are no tens buckets with more than one number. 4th pass- The sorted zero hundreds buckets are concatenated and joined in sequence to give 002, 024, 045, 066, 075, 090, 170, 802 10/5/2016 28

Most Significant Digit (MSD) Radix Sort Problem: lots of intermediate piles of cards to keep track of 10 buckets each round MSD sort does not necessarily preserve the original order of duplicate keys Depending on how we sort the bucket 829 457 457 457 457 829 901 901 29

Least significant digit (LSD) Radix Sort Key idea: sort the least significant digit first Assume we have d-digit numbers in A RadixSort(A, d) for i= 1 to d StableSort(A) on digit i 30

31 Example: LSD Radix Sorting

Can we prove it works? Radix Sort Sketch of an inductive argument (induction on the number of passes) Assume lower-order digits {j: j < i } are sorted Show that sorting next digit i leaves array correctly sorted If two digits at position i are different, ordering numbers by that digit is correct (lower-order digits are irrelevant) If they are the same, numbers are already sorted on the lower-order digits. Since we use a stable sort, the numbers stay in the right order 32

Questions? Can we use any sorting algorithms instead of stable sorting in LSD Radix sorting? 10/5/2016 33

Why stable sorting 657 658 469 595 If the sorting algorithm is not stable First pass: 595 657 658 469 Second pass: 658 657 469 595 Third pass: 469 595 658 657 10/5/2016 34

Radix Sort What sort will we use to sort on digits? Counting sort is obvious choice: Sort n numbers on digits that range from 0..k Time: O(n + k) Each pass over n numbers with d digits takes time O(n+k), so the total time O(dn+dk) When d is constant and k= O(n), takes O(n) time 35

How to break words into digits? We have n word Each word is of b bits We break each word into r-bit digits, d = b/r Using counting sort, k = 2 r -1 E.g., 32-bit word, we break into 8-bit digits d = 32/8 = 4, k = 2 8-1 = 255 T(n) = ( d*(n+k) ) = (b/r * (n + 2 r )) 10/5/2016 36

How to choose r? Still in O(n) 37

Radix Sort Example Problem: sort 1 million 80-bit numbers Treat as four-digit radix 2 20 numbers r = 20 and d = 4 We can sort in just four passes with radix sort! (b/r * (n + 2 r )) = (bn/lgn) = (4,000,000) Compares well with typical O(n lg n) comparison sort Requires approximately O(n lg n) = O(20,000,000) operations So why would we ever use anything but radix sort? Doesn t sort in place (why?) Depends on implementation, e.g., quicksort uses cache better 38

Summary: Radix Sort Assumption: input has d digits ranging from 0 to k Basic idea: Sort elements by digit starting with least significant Use a stable sort (like counting sort) for each stage Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) When d is constant, and k=o(n), takes O(n) time Fast, stable, and Simple to code Doesn t sort in place Depends on implementation, e.g., quicksort uses cache better Cannot easily sort floating point numbers 39

Bucket Sort Assumes the input is generated by a random process that distributes elements uniformly over [0, 1). Idea: Divide [0, 1) into n equal-sized buckets. Distribute the n input values into the buckets. Sort each bucket. Then go through buckets in order, listing elements in each one. 40

Bucket Sort (cont d) Input: A[1.. n], where 0 A[i ] < 1 for all i. Auxiliary array: B[0.. n 1] of linked lists, each list initially empty. 10/5/2016 41

Bucket sort Implementation Easily compute the bucket index n A[i] 42

43 Bucket sort with 10 buckets

Correctness Consider A[i ] and A[ j ] Assume without loss of generality that A[i ] A[ j ] Then, bucket index n A[i ] n A[ j ] A[i ] is placed into the same bucket as A[ j ] or into a bucket with a lower index If same bucket, insertion sort fixes up If earlier bucket, concatenation of lists fixes up 44

Informal Analysis All lines of algorithm except insertion sorting take (n) altogether Since the inputs are uniformly and independently distributed over [0,1), we do not expect many numbers to fall into each bucket Intuitively, if each bucket gets a constant number of elements, it takes O(1) time to sort each bucket O(n) sort time for all buckets. 45

Formal Analysis Define a random variable: n i = the number of elements placed in bucket B[i ] Because insertion sort runs in quadratic time, bucket sort time is 46

Formal Analysis (Cont d) n i = the number of elements placed in bucket B[i ] 47

48 n i = the number of elements placed in bucket B[i ]

The Claim (x 1 +x 2 +x 3 )(x 1 +x 2 +x 3 ) = x 2 1 + x 1 x 2 + x 1 x 3 + x 2 2 + x 1 x 2 + x 2 x 3 + x 2 3 + x 1 x 3 + x 2 x 3 49

50 Analysis

51 Analysis (Cont d)

Analysis conclusion This is a probabilistic analysis We used probability to analyze an algorithm whose running time depends on the distribution of inputs. With bucket sort, if the input isn t drawn from a uniform distribution on [0, 1), all bets are off Performance-wise, but the algorithm is still correct 52

Bucket Sort Summary Assumption: input is n real # s from [0, 1) We can map other number into the range of [0, 1) Basic idea: Create n linked lists (buckets) to divide interval [0,1) into subintervals of size 1/n Add each input element to appropriate bucket and sort buckets with insertion sort Uniform input distribution O(1) bucket size Therefore the expected total time is O(n) 53

Linear Sorting Common Mistakes Using counting sort, when memory is limited The size of k the size of C[0..k] Using bucket sort, when the input are not uniform distributed 54

Linear-time Sorting Summary We have learned three linear-time sorting algorithms Their assumptions on input Counting sort [0..k] Radix sort d digits Bucket sort uniform distribution [0, 1) 55