Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs: Sortng 2/24/2010 1 2/24/2010 2 Why Sort? Sortng: The Bg Pcture Gven n comparable elements n an array, sort them n an ncreasng (or decreasng) order. Smple algorthms: O(n 2 ) Fancer algorthms: O(n log n) Comparson lower bound: Ω(n log n) Specalzed algorthms: O(n) Handlng huge data sets 2/24/2010 Inserton sort Selecton sort Bubble sort Shell sort Heap sort Merge sort Quck sort Bucket sort Radx sort External sortng 2/24/2010 4 Inserton Sort: Idea At the k th step, put the k th nput element n the correct place among the frst k elements Result: After the k th step, the frst k elements are sorted. Runtme: worst case : best case : average case : Selecton Sort: Idea Fnd the smallest element, put t 1 st Fnd the next smallest element, put t 2 nd Fnd the next smallest, put t rd And so on 2/24/2010 5 2/24/2010 6
Mystery(nt array a[]) { for (nt p = 1; p < length; p++) { nt tmp = a[p]; for (nt = p; > 0 && tmp < a[-1]; --) a[] = a[-1]; a[] = tmp; Student Actvty What sort s ths? What s ts runnng tme? Best? Avg? Worst? 2/24/2010 7 Selecton Sort: Code vod SelectonSort (Array a[0..n-1]) { for (=0, <n; ++) { = Fnd ndex of smallest entry n a[..n-1] Swap(a[],a[]) Runtme: worst case : best case : average case : 2/24/2010 8 Dvde and conquer A common and mportant technque n algorthms Dvde problem nto parts Solve parts Merge solutons Dvde and Conquer Sortng MergeSort: Dvde array nto two halves Recursvely sort left and rght halves Merge halves QuckSort: Partton array nto small tems and large tems Recursvely sort the two smaller portons 2/24/2010 12 2/24/2010 1 Merge Sort MergeSort (Array [1..n]) 1. Splt Array n half 2. Recursvely sort each half. Merge two halves together Merge Sort: Complexty Merge (a1[1..n],a2[1..n]) 1=1, 2=1 Whle (1<n, 2<n) { f (a1[1] < a2[2]) { Next s a1[1] 1++ else { Next s a2[2] 2++ 2/24/2010 Now throw n the dregs 15 The 2-ponter method 2/24/2010 17
Auxlary array The mergng requres an auxlary array Qucksort Uses dvde and conquer Doesn t requre O(N) extra space lke MergeSort 2 4 8 1 5 6 Partton nto left and rght Left less than pvot Rght greater than pvot Recursvely sort left and rght Concatenate left and rght 2/24/2010 18 2/24/2010 22 Quck Sort The steps of QuckSort < < < 28 < < < 15 47 1. Pck a pvot 2. Dvde nto less-than & greater-than pvot. Sort each sde recursvely 2/24/2010 2 S 1 1 0 26 81 2 4 65 4 1 65 57 1 57 2/24/2010 24 26 2 75 75 0 81 select pvot value S 1 S 2 partton S S 1 S 2 0 1 26 1 4 57 65 75 81 2 S 26 1 4 1 57 75 0 65 81 2 [Wess] QuckSort(S 1 ) and QuckSort(S 2 ) Presto! S s sorted Ideas? Selectng the pvot QuckSort Example 0 1 2 4 5 6 7 8 8 1 4 0 5 2 7 6 0 1 4 7 5 2 6 8 Choose the pvot as the medan of three. 2/24/2010 25 Place the pvot and the largest at the rght and the smallest at the left 2/24/2010 27
QuckSort Example 0 1 4 7 5 2 6 8 0 1 4 7 5 2 6 8 0 1 4 7 5 2 6 8 0 1 4 2 7 5 6 8 QuckSort Example 0 1 4 2 7 5 6 8 0 1 4 2 7 5 6 8 0 1 4 2 5 7 6 8 0 1 4 2 5 7 6 8 0 1 4 2 5 7 6 8 Move to the rght to be larger than pvot. Move to the left to be smaller than pvot. Swap 2/24/2010 28 0 1 4 2 5 6 7 8 2/24/2010 S 1 < pvot pvot S 2 > pvot 2 Student Actvty Recursve Qucksort Qucksort(A[]: nteger array, left,rght : nteger): { pvotndex : nteger; f left + CUTOFF rght then pvot := medan(a,left,rght); pvotndex := Partton(A,left,rght-1,pvot); Qucksort(A, left, pvotndex 1); Qucksort(A, pvotndex + 1, rght); else Insertonsort(A,left,rght); Recurrence Relatons Wrte the recurrence relaton for QuckSort: Best Case: Worst Case: Don t use qucksort for small arrays. CUTOFF = 10 s reasonable. 2/24/2010 0 2/24/2010 2 QuckSort: Best case complexty QuckSort: Worst case complexty 2/24/2010 2/24/2010 4
QuckSort: Average case complexty Turns out to be O(n log n) See Secton 7.7.5 for an dea of the proof. Don t need to know proof detals for ths course. Qucksort Complexty Worst case: O(n 2 ) Best case: O(n log n) Average Case: O(n log n) 2/24/2010 5 2/24/2010 6 Mergesort and massve data MergeSort s the bass of massve sortng Qucksort and Heapsort both ump all over the array, leadng to expensve random dsk accesses Mergesort scans lnearly through arrays, leadng to (relatvely) effcent sequental dsk access In-memory sortng of reasonable blocks can be combned wth larger mergesorts Mergesort can leverage multple dsks 2/24/2010 7 Features of Sortng Algorthms In-place Sorted tems occupy the same space as the orgnal tems. (No copyng requred, only O(1) extra space f any.) Stable Items n nput wth the same value end up n the same order as when they began. 2/24/2010 8 How fast can we sort? Heapsort, Mergesort, and Qucksort all run n O(N log N) best case runnng tme Can we do any better? No, f the basc acton s a comparson. Sortng Model Recall our basc assumpton: we can only compare two elements at a tme we can only reduce the possble soluton space by half each tme we make a comparson Suppose you are gven N elements Assume no duplcates How many possble orderngs can you get? Example: a, b, c (N = ) 2/24/2010 40 2/24/2010 41
Permutatons How many possble orderngs can you get? Example: a, b, c (N = ) (a b c), (a c b), (b a c), (b c a), (c a b), (c b a) 6 orderngs = 2 1 =! (e, factoral ) All the possble permutatons of a set of elements For N elements N choces for the frst poston, (N-1) choces for the second poston,, (2) choces, 1 choce N(N-1)(N-2)L(2)(1)= N! possble orderngs 2/24/2010 42 b < c a < b < c a < c a < b < c a < c < b a < b < c c < a < b a < c < b b > c a < c < b a > c Decson Tree c < a < b a < b a < b < c, b < c < a, c < a < b, a < c < b, b < a < c, c < b < a a > b b < c < a b < a < c c < b < a b < c < a c < b < a b < a < c c < a c > a b < c < a b < c b > c b < a < c The leaves contan all the possble orderngs of a, b, c 2/24/2010 4 Student Actvty Lower bound on Heght A bnary tree of heght h has at most how many leaves? L A bnary tree wth L leaves has heght at least: h The decson tree has how many leaves: So the decson tree has heght: h select ust the frst N/2 terms each of the selected terms s logn/2 log(n!) s Ω(NlogN) log( N!) = log ( N ( N 1) ( N 2) L(2) (1) ) = log N + log( N 1) + log( N 2) + L+ log 2 + log1 N log N + log( N 1) + log( N 2) + L+ log 2 N N log 2 2 N N N (log N log 2) = log N 2 2 2 = Ω( N log N) 2/24/2010 45 Ω(N log N) Run tme of any comparson-based sortng algorthm s Ω(N log N) Can we do better f we don t use comparsons? 2/24/2010 46 BucketSort (aka BnSort) If all values to be sorted are known to be between 1 and K, create an array count of sze K, ncrement counts whle traversng the nput, and fnally output the result. Example K=5. Input = (5,1,,4,,2,1,1,5,4,5) count array 1 2 4 5 Runnng tme to sort n tems? 2/24/2010 47
BucketSort Complexty: O(n+K) Case 1: K s a constant BnSort s lnear tme Case 2: K s varable Not smply lnear tme Case : K s constant but large (e.g. 2 2 )??? Fxng mpractcalty: RadxSort Radx = The base of a number system We ll use 10 for convenence, but could be anythng Idea: BucketSort on each dgt, least sgnfcant to most sgnfcant (lsd to msd) 2/24/2010 48 2/24/2010 4 Radx Sort Example (1 st pass) Radx Sort Example (2 nd pass) Input data 8 0 1 2 Bucket sort by 1 s dgt 4 5 6 7 8 8 After 1 st pass 8 After 1 st pass 8 0 0 0 1 2 Bucket sort by 10 s dgt 8 4 5 6 7 8 After 2 nd pass 8 Ths example uses B=10 and base 10 dgts for smplcty of demonstraton. Larger bucket counts should be used 2/24/2010 n an actual mplementaton. 50 2/24/2010 51 Radx Sort Example ( rd pass) Student Actvty BucketSort on lsd: RadxSort Input:126, 28, 66, 41, 416, 11, 28 After 2 nd pass 8 0 00 00 08 0 1 Bucket sort by 100 s dgt 2 4 5 6 7 8 After rd pass 8 0 1 2 4 5 6 7 8 BucketSort on next-hgher dgt: 0 1 2 4 5 6 7 8 Invarant: after k passes the low order k dgts are sorted. BucketSort on msd: 2/24/2010 52 0 1 2 4 5 6 7 8 2/24/2010 5
Radxsort: Complexty How many passes? How much work per pass? Total tme? Concluson? In practce RadxSort only good for large number of elements wth relatvely small values 2/24/2010 54 Hard on the cache compared to MergeSort/QuckSort Internal versus External Sortng Need sortng algorthms that mnmze dsk/tape access tme External sortng Basc Idea: Load chunk of data nto RAM, sort, store ths run on dsk/tape Use the Merge routne from Mergesort to merge runs Repeat untl you have only one run (one sorted chunk) Text gves some examples 2/24/2010 55