Sortng CSE 6 Data Structures Unt 15 Readng: Sectons.1-. Bubble and Insert sort,.5 Heap sort, Secton..6 Radx sort, Secton.6 Mergesort, Secton. Qucksort, Secton.8 Lower bound Sortng Input an array A of data records a key value n each data record a comparson functon whch mposes a consstent orderng on the keys (e.g., ntegers) Output reorganze the elements of A such that For any and, f < then A[] A[] Consstent Orderng The comparson functon must provde a consstent orderng on the set of possble keys You can compare any two keys and get back an ndcaton of a < b, a > b, or a = b The comparson functons must be consstent If compare(a,b) says a<b, then compare(b,a) must say b>a If compare(a,b) says a=b, then compare(b,a) must say b=a Why Sort? Sortng algorthms are among the most frequently used algorthms n computer scence Allows bnary search of an N-element array n O(log N) tme Allows O(1) tme access to kth largest element n the array for any k Allows easy detecton of any duplcates 4
Evaluatng a Sort Algorthm: Tme How fast s the algorthm? The defnton of a sorted array A says that for any <, A[] < A[] Ths means that you need to at least check on each element at the very mnmum, I.e., at least O(N) And you could end up checkng each element aganst every other element, whch s O(N ) The bg queston s: How close to O(N) can you get? 5 Space How much space does the sortng algorthm requre n order to sort the collecton of tems? Is copyng needed? O(n) addtonal space In-place sortng no copyng O(1) addtonal space Somewhere n between for temporary, e.g. O(logn) space External memory sortng data so large that does not ft n memory 6 Stablty Example Stablty: Does t rearrange the order of nput data records whch have the same key value (duplcates)? E.g. Phone book sorted by name. Now sort by county s the lst stll sorted by name wthn each county? Extremely mportant property for databases A stable sortng algorthm s one whch does not rearrange the order of duplcate keys 5 a 8 a 5 b 4 b c a b c 4 5 a 5 b 8 Stable Sort 5 a 8 a 5 b 4 b c c b a 4 5 a 5 b 8 Unstable Sort 8
Bubble Sort Bubble elements to to ther proper place n the array by comparng elements and +1, and swappng f A[] > A[+1] Bubble every element towards ts correct poston last poston has the largest element then bubble every element except the last one towards ts correct poston then repeat untl done or untl the end of the quarter, whchever comes frst... Bubblesort bubble(a[1..n]: nteger array, n : nteger): {, : nteger; for = 1 to n-1 do for = to n +1 do f A[-1] > A[] then SWAP(A[-1],A[]); SWAP(a,b) : { t :nteger; t:=a; a:=b; b:=t; =1: Largest element s placed at last poston =k: k th Largest element s placed at k th to last poston 9 10 Bubblesort (recursve) bubble(a[1..n]: nteger array, n : nteger): { 11 Put the largest element n ts place larger value? 8 8 1 8 9 10 1 18 15 16 1 14 swap 1 8 9 10 1 18 15 16 1 14 9 10 1 1 8 9 10 1 18 15 16 1 14 swap 1 8 9 10 1 1 8 9 10 1 1 8 9 10 1 1 8 9 10 1 1 8 9 10 1 18 15 16 1 14 swap 18 15 18 15 18 15 18 15 16 1 14 swap 16 1 14 swap 16 1 14 swap 16 1 14 1
Put nd largest element n ts place larger value? 8 1 8 9 10 1 1 8 9 10 1 18 18 9 10 1 1 8 9 10 1 1 8 9 10 1 1 8 9 10 1 18 15 16 1 14 swap 15 18 16 1 14 swap 15 16 18 1 14 swap 15 16 1 18 14 swap 15 16 1 14 18 Two elements done, only n- more to go... Bubble Sort: Just Say No Bubble elements to to ther proper place n the array by comparng elements and +1, and swappng f A[] > A[+1] We bubblze for =1 to n (.e, n tmes) Each bubblzaton s a loop that makes n- comparsons Ths s O(n ) 1 14 Inserton Sort What f frst k elements of array are already sorted? 4,, 1, 5, 19, 16 We can shft the tal of the sorted elements lst down and then nsert next element nto proper poston and we get k+1 sorted elements 4, 5,, 1, 19, 16 15 Inserton Sort InsertonSort(A[1..N]: nteger array, N: nteger) {,, temp: nteger ; for = to N { temp := A[]; := -1; whle > 1 and A[-1] > temp { A[] := A[-1]; := 1; A[] = temp; Is Inserton sort n place? Stable? Runnng tme =? Have we used somethng smlar before? 16
Example Example 1 8 9 10 1 18 15 16 1 14 1 8 9 10 1 18 15 16 1 14 1 8 9 10 1 1 8 9 10 1 15 16 18 1 14 15 16 1 18 14 1 8 9 10 1 18 15 16 1 14 1 8 9 10 1 15 16 1 18 14 1 8 9 10 1 18 15 16 1 14 1 8 9 10 1 15 16 1 14 18 1 8 9 10 1 15 18 16 1 14 1 8 9 10 1 15 16 14 1 18 1 8 9 10 1 15 18 16 1 14 1 8 9 10 1 15 14 16 1 18 1 8 9 10 1 15 16 18 1 14 1 8 9 10 1 14 15 16 1 18 1 18 Inserton Sort Characterstcs In place and Stable Runnng tme Worst case s O(N ) reverse order nput must copy every element every tme Good sortng algorthm for almost sorted data Each tem s close to where t belongs n sorted order. Inversons An nverson s a par of elements n wrong order < but A[] > A[] By defnton, a sorted array has no nversons So you can thnk of sortng as the process of removng nversons n the order of the elements 19 0
Inversons A sngle value out of place can cause several nversons Reverse order All values out of place (reverse order) causes numerous nversons value 1 8 9 10 1 14 15 16 1 18 value 1 8 9 10 1 18 1 16 15 14 ndex 0 1 4 5 6 8 9 10 11 1 1 ndex 0 1 4 5 6 8 9 10 11 1 1 1 Inversons Our smple sortng algorthms so far swap adacent elements and remove ust one nverson at a tme Ther runnng tme s proportonal to number of nversons n array Gven N dstnct keys, the maxmum possble number of nversons s n 1 (n 1) + (n ) +... + 1= = 1 = (n -1)n Inversons and Adacent Swap Sorts "Average" lst wll contan half the max number of nversons = ( n 1) n 4 So the average runnng tme of Inserton sort s Θ(N ) Any sortng algorthm that only swaps adacent elements requres Ω(N ) tme because each swap removes only one nverson (lower bound) 4
Heap Sort Usng Bnary Heaps for Sortng We use a Max-Heap Root node = A[1] Chldren of A[] = A[], A[+1] Keep track of current sze N (number of nodes) value ndex 5 6 4 1 4 5 6 8 N = 5 5 4 6 Buld a max-heap Do N DeleteMax operatons and store each Max element as t comes out of the heap Data comes out n largest to smallest order Where can we put the elements as they are removed from the heap? Buld Max-heap DeleteMax 5 5 4 6 6 4 5 6 1 Removal = 1 Addton Repeated DeleteMax Every tme we do a DeleteMax, the heap gets smaller by one node, and we have one more node to store Store the data at the end of the heap array Not "n the heap" but t s n the heap array value ndex 6 5 4 1 4 5 6 8 N = 4 5 6 4 5 4 6 1 4 5 6 8 N = 4 5 6 1 4 5 6 8 N = 6 6 5 4 4 5 8
Heap Sort s In-place Heapsort: Analyss After all the DeleteMaxs, the heap s gone but the array s full and s n sorted order value ndex 4 5 6 1 4 5 6 N = 0 8 6 4 5 Runnng tme tme to buld max-heap s O(N) tme for N DeleteMax operatons s N O(log N) total tme s O(N log N) Can also show that runnng tme s Ω(N log N) for some nputs, so worst case s Θ(N log N) Average case runnng tme s also O(N log N) Heapsort s n-place but not stable (why?) 9 0 Bucket Sort: Sortng Integers The goal: sort N numbers, all between 1 to k. Example: sort 8 numbers,6,,4,11,,5,. All between 1 to 1. The method: Use an array of k queues. Queue (for 1 k) keeps the nput numbers whose value s. Each queue s denoted a bucket. Scan the lst and put the elements n the buckets. Output the content of the buckets from 1 to k. 1 Bucket Sort: Sortng Integers Example: sort 8 numbers,6,,4,11,,9, all between 1 to 1. Step 1: scan the lst and put the elements n the queues 1 4 5 6 8 9 10 11 1 Step : concatenate the queues 4 6 9 11 4 6 9 11 Tme complexty: O(n+k).,,4,6,,,9,11
Radx Sort: Sortng ntegers Radx Sort Example Hstorcally goes back to the 1890 census. Radx sort = mult-pass bucket sort of ntegers n the range 0 to B P -1 Bucket-sort from least sgnfcant to most sgnfcant dgt (base B) Requres P(B+N) operatons where P s the number of passes (the number of base B dgts n the largest possble nput number). If P and B are constants then O(N) tme to sort! Input data 48 5 9 1 8 1 6 0 1 1 1 Bucket sort by 1 s dgt 4 5 6 Ths example uses B=10 and base 10 dgts for smplcty of demonstraton. Larger bucket counts should be used n an actual mplementaton. 5 6 8 48 8 9 9 After 1 st pass 1 1 5 6 48 8 9 4 Radx Sort Example Radx Sort Example After 1 st pass 1 1 5 6 48 8 9 0 0 09 1 1 1 5 8 Bucket sort by 10 s dgt 4 5 6 6 48 8 9 After nd pass 9 1 1 5 8 6 48 After nd pass 9 1 1 5 8 6 48 0 00 009 08 06 1 1 Bucket sort by 100 s dgt 4 48 5 5 6 1 8 9 After rd pass 9 8 6 1 48 5 1 Invarant: after k passes the low order k dgts are sorted. 5 6
Propertes of Radx Sort Not n-place needs lots of auxlary storage. Stable equal keys always end up n same bucket n the same order. Fast Tme to sort N numbers n the range 0 to B P -1 s O(P(B+N)) (P teratons, B buckets n each) Dvde and Conquer Very mportant strategy n computer scence: Dvde problem nto smaller parts Independently solve the parts Combne these solutons to get overall soluton Idea 1: Dvde array nto two halves, recursvely sort left and rght halves, then merge two halves Mergesort Idea : Partton array nto tems that are small and tems that are large, then recursvely sort the two sets Qucksort 8 Mergesort Mergesort Example 8 9 4 5 1 6 Dvde t n two at the mdpont Conquer each sde n turn (by recursvely sortng) Merge two halves together 9 8 9 4 5 1 6 Dvde Dvde 8 9 4 5 1 6 Dvde 8 9 4 5 1 6 1 element 8 9 4 5 1 6 Merge Merge 8 4 9 5 1 6 4 8 9 1 5 6 Merge 1 4 5 6 8 9 40
Auxlary Array The mergng requres an auxlary array. 4 8 9 1 5 6 Auxlary Array The mergng requres an auxlary array. 4 8 9 1 5 6 Auxlary array 1 Auxlary array 41 4 Auxlary Array Mergng The mergng requres an auxlary array. normal 4 8 9 1 5 6 target 1 4 5 Auxlary array copy Left completed frst target 4 44
Mergng Mergng second target frst Rght completed frst Merge(A[], T[] : nteger array, left, rght : nteger) : { md,,, k, l, target : nteger; md := (rght + left)/; := left; := md + 1; target := left; whle < md and < rght do f A[] < A[] then T[target] := A[] ; := + 1; else T[target] := A[]; := + 1; target := target + 1; f > md then //left completed// for k := left to target-1 do A[k] := T[k]; f > rght then //rght completed// k : = md; l := rght; whle k > do A[l] := A[k]; k := k-1; l := l-1; for k := left to target-1 do A[k] := T[k]; 45 46 Recursve Mergesort Iteratve Mergesort Mergesort(A[], T[] : nteger array, left, rght : nteger) : { f left < rght then md := (left + rght)/; Mergesort(A,T,left,md); Mergesort(A,T,md+1,rght); Merge(A,T,left,rght); ManMergesort(A[1..n]: nteger array, n : nteger) : { T[1..n]: nteger array; Mergesort[A,T,1,n]; Merge by 1 Merge by Merge by 4 Merge by 8 4 48
Iteratve Mergesort Iteratve Mergesort Need of a last copy Merge by 1 Merge by Merge by 4 Merge by 8 Merge by 16 49 IteratveMergesort(A[1..n]: nteger array, n : nteger) : { //precondton: n s a power of //, m, party : nteger; T[1..n]: nteger array; m := ; party := 0; whle m < n do for = 1 to n m + 1 by m do f party = 0 then Merge(A,T,,+m-1); else Merge(T,A,,+m-1); party := 1 party; m := *m; f party = 1 then for = 1 to n do A[] := T[]; How do you handle non-powers of? How can the fnal copy be avoded? 50 Mergesort Analyss Let T(N) be the runnng tme for an array of N elements Mergesort dvdes array n half and calls tself on the two halves. After returnng, t merges both halves usng a temporary array Each recursve call takes T(N/) and mergng takes O(N) Mergesort Recurrence Relaton The recurrence relaton for T(N) s: T(1) < a base case: 1 element array constant tme T(N) < T(N/) + dn Sortng N elements takes the tme to sort the left half plus the tme to sort the rght half plus an O(N) tme to merge the two halves T(N)=? 51 5
Mergesort Analyss Upper Bound Propertes of Mergesort T(n) T(n/) + dn Assumng (T(n/4) + dn/) + dn = 4T(n/4) + dn 4(T(n/8) + dn/4) + dn = 8T(n/8) + dn n s a power of Not n-place Requres an auxlary array (O(n) extra space) Stable k T(n/ k ) + kdn Make sure that left s sent to target on equal values. = nt(1) + kdn cn + dn log = O(n logn) n f n = k n = k, k = log n Iteratve Mergesort reduces copyng. 5 54 Qucksort Qucksort uses a dvde and conquer strategy, but does not requre the O(N) extra space that MergeSort does Partton array nto left and rght sub-arrays Choose an element of the array, called pvot the elements n left sub-array are all less than pvot elements n rght sub-array are all greater than pvot Recursvely sort left and rght sub-arrays Concatenate left and rght sub-arrays n O(1) tme Four easy steps To sort an array S 1. If the number of elements n S s 0 or 1, then return. The array s sorted.. Pck an element v n S. Ths s the pvot value.. Partton S-{v nto two dsont subsets, S 1 = {all values x v, and S = {all values x v. 4. Return QuckSort(S 1 ), v, QuckSort(S ) 55 56
S The steps of QuckSort 1 1 0 6 81 9 4 65 4 1 65 5 1 5 6 9 5 5 0 81 select pvot value S 1 S partton S S 1 S 0 1 6 1 4 5 65 5 81 9 S 0 1 6 1 4 5 65 5 81 9 [Wess] QuckSort(S 1 ) and QuckSort(S ) Vola! S s sorted 5 Detals, detals Implementng the actual parttonng Pckng the pvot want a value that wll cause S 1 and S to be non-zero, and close to equal n sze f possble Dealng wth cases where an element equals the pvot 58 Qucksort Parttonng Need to partton the array nto left and rght subarrays the elements n left sub-array are pvot elements n rght sub-array are pvot How do the elements get to the correct partton? Choose an element from the array as the pvot Make one pass through the rest of the array and swap as needed to put elements n parttons Parttonng:Choosng the pvot One mplementaton (there are others) medan fnds pvot and sorts left, center, rght Medan takes the medan of leftmost, mddle, and rghtmost elements An alternatve s to choose the pvot randomly (need a random number generator; expensve ) Another alternatve s to choose the frst element (but can be very bad. Why?) Swap pvot wth next to last element 59 60
Parttonng n-place Set ponters and to start and end of array Increment untl you ht element A[] > pvot Decrement untl you ht element A[] < pvot Swap A[] and A[] Repeat untl and cross Swap pvot (at A[N-]) wth A[] 0 1 4 5 6 8 9 8 1 4 9 0 5 6 0 1 4 9 5 6 8 Example Choose the pvot as the medan of three Medan of 0, 6, 8 s 6. Pvot s 6 Place the largest at the rght and the smallest at the left. Swap pvot wth next to last element. 61 6 Example 0 1 4 9 5 6 8 0 1 4 9 5 6 8 0 1 4 9 5 6 8 0 1 4 5 9 6 8 Move to the rght up to A[] larger than pvot. Move to the left up to A[] smaller than pvot. Swap Example 0 1 4 5 9 6 8 0 1 4 5 9 6 8 0 1 4 5 9 6 8 0 1 4 5 9 6 8 0 1 4 5 9 6 8 0 1 4 5 6 9 8 Cross-over > 6 S 1 < pvot pvot S > pvot 64
Recursve Qucksort Qucksort(A[]: nteger array, left,rght : nteger): { pvotndex : nteger; f left + CUTOFF rght then pvot := medan(a,left,rght); pvotndex := Partton(A,left,rght-1,pvot); Qucksort(A, left, pvotndex 1); Qucksort(A, pvotndex + 1, rght); else Insertonsort(A,left,rght); Don t use qucksort for small arrays. CUTOFF = 10 s reasonable. 65 Qucksort Best Case Performance Algorthm always chooses best pvot and splts sub-arrays n half at each recurson T(0) = T(1) = O(1) constant tme f 0 or 1 element For N > 1, recursve calls plus lnear tme for parttonng T(N) = T(N/) + O(N) Same recurrence relaton as Mergesort T(N) = O(N log N) 66 Qucksort Worst Case Performance Algorthm always chooses the worst pvot one sub-array s empty at each recurson T(N) a for N C T(N) T(N-1) + bn T(N-) + b(n-1) + bn T(C) + b(c+1)+ + bn a +b(c + (C+1) + (C+) + + N) T(N) = O(N ) Fortunately, average case performance s O(N log N) (see text for proof) 6 Propertes of Qucksort Not stable because of long dstance swappng. No teratve verson (wthout usng a stack). Pure qucksort not good for small arrays. In-place, but uses auxlary storage because of recursve call (O(logn) space). O(n log n) average case performance, but O(n ) worst case performance. 68
How fast can we sort? Heapsort, Mergesort, and Qucksort all run n O(N log N) best case runnng tme Can we do any better? No, f sortng s comparson-based. We saw that radx sort s O(N) but t s only for ntegers from bounded-range. Sortng Model Recall the basc assumpton: we can only compare two elements at a tme we can only reduce the possble soluton space by half each tme we make a comparson Suppose you are gven N elements Assume no duplcates How many possble orderngs can you get? Example: a, b, c (N = ) 69 0 Permutatons How many possble orderngs can you get? Example: a, b, c (N = ) (a b c), (a c b), (b a c), (b c a), (c a b), (c b a) 6 orderngs = 1 =! (.e., factoral ) All the possble permutatons of a set of elements For N elements N choces for the frst poston, (N-1) choces for the second poston,, () choces, 1 choce N(N-1)(N-)()(1)= N! possble orderngs 1 b < c a < b < c a < c a < b < c a < c < b a < b < c c < a < b a < c < b b > c a < c < b Decson Tree a > c c < a < b a < b a < b < c, b < c < a, c < a < b, a < c < b, b < a < c, c < b < a a > b b < c < a b < a < c c < b < a b < c < a c < b < a b < a < c c < a c > a b < c < a b < c The leaves contan all the possble orderngs of a, b, c b > c b < a < c
Decson Trees A Decson Tree s a Bnary Tree such that: Each node = a set of orderngs.e., the remanng soluton space Each edge = 1 comparson Each leaf = 1 unque orderng How many leaves for N dstnct elements? N!,.e., a leaf for each possble orderng Only 1 leaf has the orderng that s the desred correctly sorted arrangement Decson Trees and Sortng Every comparson-based sortng algorthm corresponds to a decson tree Fnds correct leaf by choosng edges to follow.e., by makng comparsons Each decson reduces the possble soluton space by one half Run tme s maxmum no. of comparsons maxmum number of comparsons s the length of the longest path n the decson tree,.e. the heght of the tree 4 Decson Tree Example How many leaves on a tree? a < c a < b < c c < a < b a < c < b a > c a < b a < b < c, b < c < a, c < a < b, a < c < b, b < a < c, c < b < a a > b! possble orders b < c b < c < a b < a < c c < b < a b > c Suppose you have a bnary tree of heght d. How many leaves can the tree have? d = 1 at most leaves, d = at most 4 leaves, etc. b < c a < b < c a < b < c a < c < b b > c a < c < b c < a < b actual order b < c < a c < b < a b < a < c c < a c > a b < c < a b < a < c 5 6
Lower bound on Heght A bnary tree of heght d has at most d leaves depth d = 1 leaves, d = 4 leaves, etc. Can prove by nducton Number of leaves, L < d Heght d > log L The decson tree has N! leaves So the decson tree has heght d log (N!) select ust the frst N/ terms each of the selected terms s logn/ n! πn ( n/ e) Sterlng s formula log(n!) s Ω(NlogN) log( N!) = log n ( N ( N 1) ( N ) () (1) ) = log N + log( N 1) + log( N ) + + log + log1 N log N + log( N 1) + log( N ) + + log N N log N N N (log N log ) = log N = Ω( N log N) 8 Summary of Sortng Sortng choces: O(N ) Bubblesort, Inserton Sort O(N log N) average case runnng tme: Heapsort: In-place, not stable. Mergesort: O(N) extra space, stable. Qucksort: clamed fastest n practce but, O(N ) worst case. Needs extra storage for recurson. Not stable. Run tme of any comparson-based sortng algorthm s Ω(N log N) O(N) Radx Sort: fast and stable. Not comparson based. Not n-place. 9