Sortng: The Bg Pcture Gven n comparable elements n an array, sort them n an ncreasng (or decreasng) order. Smple algorthms: O(n ) Inserton sort Selecton sort Bubble sort Shell sort Fancer algorthms: O(n log n) Heap sort Merge sort Quck sort Comparson lower bound: Ω(n log n) Specalzed algorthms: O(n) Radx sort Handlng huge data sets External sortng S The steps of QuckSort 6 8 6 6 7 7 6 7 7 8 select pvot value S S partton S S S 8 6 7 6 7 S 6 7 7 6 8 [Wess] QuckSort(S ) and QuckSort(S ) Presto! S s sorted QuckSort Example 6 7 8 QuckSort Example 7 6 8 8 7 6 7 6 8 7 6 8 7 6 8 7 6 8 Choose the pvot as the medan of three. Place the pvot and the largest at the rght and the smallest at the left Move to the rght to be larger than pvot. Move to the left to be smaller than pvot. Swap QuckSort Example 7 6 8 Recursve Qucksort 7 6 8 7 6 8 7 6 8 7 6 8 Qucksort(A[]: nteger array, left,rght : nteger): { pvotndex : nteger; f left + CUTOFF rght then pvot := medan(a,left,rght); pvotndex := Partton(A,left,rght-,pvot); Qucksort(A, left, pvotndex ); Qucksort(A, pvotndex +, rght); else Insertonsort(A,left,rght); } 6 7 8 Don t use qucksort for small arrays. CUTOFF = s reasonable. S < pvot pvot S > pvot 6
QuckSort: Best case complexty QuckSort: Worst case complexty 7 8 QuckSort: Average case complexty Turns out to be O(n log n) See Secton 7.7. for an dea of the proof. Don t need to know proof detals for ths course. Features of Sortng Algorthms In-place Sorted tems occupy the same space as the orgnal tems. (No copyng requred, only O() extra space f any.) Stable Items n nput wth the same value end up n the same order as when they began. Sort Propertes Are the followng: stable? n-place? Inserton Sort? No Yes Can Be No Yes Selecton Sort? No Yes Can Be No Yes MergeSort? No Yes Can Be No Yes QuckSort? No Yes Can Be No Yes How fast can we sort? Heapsort, Mergesort, and Qucksort all run n O(N log N) best case runnng tme Can we do any better? No, f the basc acton s a comparson.
Sortng Model Recall our basc assumpton: we can only compare two elements at a tme we can only reduce the possble soluton space by half each tme we make a comparson Suppose you are gven N elements Assume no duplcates How many possble orderngs can you get? Ths s the number of potental nputs the algorthm must separate Permutatons How many possble orderngs can you get? Example: a, b, c (N = ) (a b c), (a c b), (b a c), (b c a), (c a b), (c b a) 6 orderngs = =! All the possble permutatons of a set of elements For N elements N choces for the frst poston, (N-) choces for the second poston,, () choces, choce N(N-)(N-) ()()= N! possble orderngs b < c a < c c < a < b b > c a > c Decson Tree c < a < b a < b, b < c < a, c < a < b,,, c < b < a a > b b < c < a c < b < a b < c < a c < b < a c < a c > a b < c < a b < c The leaves contan all the possble orderngs of a, b, c b > c Lower bound on Heght A bnary tree of heght h has at most how many leaves? L A bnary tree wth L leaves has heght at least: h The decson tree has how many leaves: So the decson tree has heght: h 6 select ust the frst N/ terms each of the selected terms s logn/ log(n!) s Ω(NlogN) log( N!) = log ( N ( N ) ( N ) () () ) = log N + log( N ) + log( N ) + + log + log N log N + log( N ) + log( N ) + + log N N log N N N (log N log ) = log N = Ω( N log N) 7 Ω(N log N) Run tme of any comparson-based sortng algorthm s Ω(N log N) Can we do better f we don t use comparsons? 8
BucketSort (aka BnSort) If all values to be sorted are known to be between and K, create an array count of sze K, ncrement counts whle traversng the nput, and fnally output the result. Example K=. Input = (,,,,,,,,,,) count array Runnng tme to sort n tems? BucketSort Complexty: O(n+K) Case : K s a constant BnSort s lnear tme Case : K s varable Not smply lnear tme Case : K s constant but large (e.g. )??? Fxng mpractcalty: RadxSort Radx = The base of a number system We ll use for convenence, but could be anythng Idea: BucketSort on each dgt, least sgnfcant to most sgnfcant (lsd to msd) Radx Sort Example ( st pass) Input data 7 8 by s dgt 6 7 7 8 8 Ths example uses B= and base dgts for smplcty of demonstraton. Larger bucket counts should be used n an actual mplementaton. After st pass 7 8 Radx Sort Example ( nd pass) Radx Sort Example ( rd pass) After st pass 7 8 by s dgt 7 8 6 7 8 After nd pass 7 8 After nd pass 7 8 8 by s dgt 7 6 7 8 After rd pass 8 7 Invarant: after k passes the low order k dgts are sorted.
Radxsort: Complexty How many passes? How much work per pass? Total tme? Concluson? In practce RadxSort only good for large number of elements wth relatvely small values Hard on the cache compared to MergeSort/QuckSort Internal versus External Sortng Need sortng algorthms that mnmze dsk/tape access tme External sortng Basc Idea: Load chunk of data nto RAM, sort, store ths run on dsk/tape Use the Merge routne from Mergesort to merge runs Repeat untl you have only one run (one sorted chunk) Text gves some examples 6