Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place Heap Sort T(n) = Θ(n lg(n)) In-place Seems pretty good. Can we do better? Sortng Assumptons 1. No knowledge of the keys or numbers we are sortng on. 2. Each key supports a comparson nterface or operator. 3. Sortng entre records, as opposed to numbers, s an mplementaton detal. 4. Each key s unque (ust for convenence). Comparson Sortng Comparson Sortng Gven a set of n values, there can be n! permutatons of these values. So f we look at the behavor of the sortng algorthm over all possble n! nputs we can determne the worst-case complexty of the algorthm.

Decson Tree Decson tree model Full bnary tree A full bnary tree (sometmes proper bnary tree or 2- tree) s a tree n whch every node other than the leaves has two chldren Internal node represents a comparson. Ignore control, movement, and all other operatons, ust see comparson Each leaf represents one possble result (a permutaton of the elements n sorted order). The heght of the tree (.e., longest path) s the lower bound. Decson Tree Model <1,2,3> 1:2 > 2:3 1:3 > > 1:3 <1,3,2> <3,1,2> > <2,1,3> 2:3 > <2,3,1> <3,2,1> Internal node : ndcates comparson between a and a. suppose three elements < a1, a2, a3> wth nstance <6,8,5> Leaf node <π(1), π(2), π(3)> ndcates orderng a π(1) a π(2) a π(3). Path of bold lnes ndcates sortng path for <6,8,5>. There are total 3!=6 possble permutatons (paths). Decson Tree Model The longest path s the worst case number of comparsons. The length of the longest path s the heght of the decson tree. Theorem 8.1: Any comparson sort algorthm requres Ω(nlg n) comparsons n the worst case. Proof: Suppose heght of a decson tree s h, and number of paths (,e,, permutatons) s n!. Snce a bnary tree of heght h has at most 2 h leaves, n! 2 h, so h lg (n!) Ω(nlg( g n) )(By equaton 3.18). That s to say: any comparson sort n the worst case needs at least nlg n comparsons. QuckSort Desgn Follows the dvde-and-conquer paradgm. Dvde: Partton (separate) the array A[p..r] nto two (possbly empty) subarrays A[p..q 1] and A[q+1..r]. Each element n A[p..q 1] < A[q]. A[q] < each element n A[q+1..r]. Index q s computed as part of the parttonng procedure. Conquer: Sort the two subarrays by recursve calls to qucksort. Combne: The subarrays are sorted n place no work s needed to combne them. How do the dvde and combne steps of qucksort compare wth those of merge sort?

Pseudocode Qucksort(A, p, r) f p<rthen Partton(A, p, r) q := Partton(A, p, r); x, := A[r], p 1; Qucksort(A, p, q 1); for := p to r 1 do Qucksort(A, q + 1, r) f A[] then := + 1; A[p..r] A[] A[] A[ +1] A[r]; 5 return + 1 Partton A[p..q 1] A[q+1..r] 5 5 5 Example p r ntally: 2 5 8 3 9 4 1 7 10 6 note: pvot (x) = 6 next teraton: 2 5 8 3 9 4 1 7 10 6 Partton(A, p, r) x, := A[r], p 1; next teraton: 2 5 8 3 9 4 1 7 10 6 for := p to r 1 do f A[] then next teraton: := + 1; 2 5 8 3 9 4 1 7 10 6 A[] A[] A[ + 1] A[r]; []; next teraton: 2 5 3 8 9 4 1 7 10 6 return + 1 Example (Contnued) Parttonng next teraton: 2 5 3 8 9 4 1 7 10 6 next teraton: 2 5 3 8 9 4 1 7 10 6 next teraton: 2 5 3 4 9 8 1 7 10 6 Partton(A, p, r) x, := A[r], p 1; next teraton: 2 5 3 4 1 8 9 7 10 6 for := p to r 1 do f A[] then next teraton: 2 5 3 4 1 8 9 7 10 6 := + 1; A[] A[] next teraton: 2 5 3 4 1 8 9 7 10 6 A[ + 1] A[r]; [] return + 1 after fnal swap: 2 5 3 4 1 6 9 7 10 8 Select the last element A[r] n the subarray A[p..r] as the pvot the element around whch to partton. As the procedure executes, the array s parttoned nto four (possbly empty) regons. 1. A[p.. ] All entres n ths regon are < pvot. 2. A[+1.. 1] All entres n ths regon are > pvot. 3. A[r] = pvot. 4. A[..r 1] Not known how they compare to pvot. The above hold before each teraton of the for loop, and consttute a loop nvarant. (4 s not part of the loop.)

Correctness of Partton Use loop nvarant. Intalzaton: Before frst teraton A[p..] and A[+1.. 1] are empty Conds. 1 and 2 are satsfed (trvally). r s the ndex of the pvot Partton(A, p, r) Cond. 3 s satsfed. x, := A[r], p 1; for := p to r 1 do f A[] then Mantenance: Case 1: A[] > x Increment only. Loop Invarant s mantaned. := + 1; A[] A[] A[ + 1] A[r]; return + 1 Correctness of Partton Case 1: p r >x x > x p r x > x Correctness of Partton Case 2: A[] Increment Increment Condton 2 s mantaned. Swap A[] and A[] A[r] s unaltered. Condton 1 s mantaned. Condton 3 s mantaned. p r x > x p r x Correctness of Partton Termnaton: When the loop termnates, = r, so all elements n A are parttoned nto one of the three cases: A[p..] pvot A[+1.. 1] > pvot A[r] = pvot The last two lnes swap A[+1] and A[r]. Pvot moves from the end of the array to between the two subarrays. Thus, procedure partton correctly performs the dvde step. > x

Complexty of Partton ParttonTme(n) s gven by the number of teratons n the for loop. Θ(n) : n = r p +1 1. Partton(A, p, r) x, := A[r], p 1; for := p to r 1 do f A[] then := + 1; A[] A[] A[ + 1] A[r]; return + 1 Qucksort Overvew To sort a[left...rght]: 1. f left < rght: 11 1.1. Partton a[left...rght] such that: t all a[left...p-1] are less than a[p], and all a[p+1...rght] are >= a[p] 1.2. Qucksort a[left...p-1] 1.3. Qucksort a[p+1...rght] 2. Termnate Parttonng n Qucksort A key step n the Qucksort algorthm s parttonng the array We choose some (any) number p n the array to use as a pvot We partton the array nto three parts: numbers less than p p p numbers greater than or equal to p Alternatve Parttonng Choose an array value (say, the frst) to use as the pvot Startng from the left end, fnd the frst element that s greater than or equal to the pvot Searchng backward from the rght end, fnd the frst element that s less than the pvot Interchange (swap) these two elements Repeat, searchng from where we left off, untl done

Alternatve Parttonng To partton a[left...rght]: 1. Set pvot = a[left], l = left + 1, r = rght; 2. whle l < r, do 2.1. whle l < rght & a[l] < pvot, set l = l + 1 22 2.2. whle r > left & a[r] >= pvot, set r = r - 1 2.3. f l < r, swap a[l] and a[r] 3. Set a[left] = a[r], a[r] = pvot 4. Termnate Example of parttonng choose pvot: 4 3 6 9 2 4 3 1 2 1 8 9 3 5 6 search: 4 3 6 9 2 4 3 1 2 1 8 9 3 5 6 swap: 4 3 3 9 2 4 3 1 2 1 8 9 6 5 6 search: 4 3 3 9 2 4 3 1 2 1 8 9 6 5 6 swap: 4 331 2 4 3 1 2 9 8 9 6 5 6 search: 4 33124 3 1 2 9 8 9 6 5 6 swap: 4 33122 3 1 4 9 8 9 6 5 6 search: 4 3 3 1 2 2 3 1 4 9 8 9 6 5 6 swap wth pvot: 1 3 3 1 2 2 3 4 4 9 8 9 6 5 6 Partton Implementaton (Java) Qucksort Implementaton (Java) statc nt Partton(nt[] a, nt left, nt rght) { nt p = a[left], l = left + 1, r = rght; whle (l < r) { whle (l < rght && a[l] < p) l++; whle (r > left && a[r] >= p) r--; f (l < r) { nt temp = a[l]; a[l] = a[r]; a[r] = temp; a[left] = a[r]; a[r] [] = p; return r; statc vod Qucksort(nt[] array, nt left, nt rght) { f (left < rght) { nt p = Partton(array, left, rght); Qucksort(array, left, p - 1); Qucksort(array, p + 1, rght);

Analyss of qucksort best case Parttonng at varous levels Suppose each partton operaton dvdes the array almost exactly n half Then the depth of the recurson n log 2 n Because that s how many tmes we can halve n We note that Each partton s lnear over ts subarray All the parttons at one level cover the array Best Case Analyss We cut the array sze n half each tme So the depth of the recurson n log 2 n At each level of the recurson, all the parttons at that level do work that s lnear n n O(log 2 n) * O(n) = O(n log 2 n) Hence n the best case, qucksort has tme complexty O(n log 2 n) What about the worst case? Worst case In the worst case, parttonng always dvdes the sze n array nto these three parts: A length one part, contanng the pvot tself A length zero part, and Al length n-1 part, contanng everythng else We don t recur on the zero-length part Recurrng on the length n-1 part requres (n the worst case) recurrng to depth n-1

Worst case parttonng Worst case for qucksort In the worst case, recurson may be n levels deep (for an array of sze n) ) But the parttonng work done at each level s stll n O(n) * O(n) = O(n 2 ) So worst case for Qucksort s O(n 2 ) When does ths happen? There are many arrangements that could make ths happen Here are two common cases: When the array s already sorted When the array s nversely sorted (sorted n the opposte order) Typcal case for qucksort If the array s sorted to begn wth, Qucksort s terrble: O(n 2 ) It s possble to construct other bad cases However, Qucksort s usually O(n log 2 n) The constants are so good that Qucksort s generally the faster algorthm. Most real-world sortng s done by Qucksort Pckng a better pvot Before, we pcked the frst element of the subarray to use as a pvot If the array s already sorted, ths results n O(n 2 ) behavor It s no better f we pck the last element We could do an optmal qucksort (guaranteed O(n log n)) f we always pcked a pvot value that exactly cuts the array n half Such a value s called a medan: half of the values n the array are larger, half are smaller The easest way to fnd the medan s to sort the array and pck the value n the mddle (!)

Medan of three Qucksort for Small Arrays Obvously, t doesn t make sense to sort the array n order to fnd the medan to use as a pvot. Instead, compare ust three elements of our (sub)array the frst, the last, and the mddle Take the medan (mddle value) of these three as the pvot It s possble (but not easy) to construct cases whch wll make ths technque O(n 2 ) For very small arrays (N<= 20), qucksort does not perform as well as nserton sort A good cutoff range s N=10 Swtchng to nserton sort for small arrays can save about 15% n the runnng tme Mergesort vs Qucksort Both run n O(n lgn) Compared wth Qucksort, Mergesort has less number of comparsons but larger number of movng elements In Java, an element comparson s expensve but movng elements s cheap. Therefore, Mergesort s used n the standard Java lbrary for generc sortng Mergesort vs Qucksort In C++, copyng obects can be expensve whle comparng obects often s relatvely cheap. Therefore, qucksort s the sortng routne commonly used n C++ lbrares