Sorting. There exist sorting algorithms which have shown to be more efficient in practice.

Sorting Next to storing and retrieving data, sorting of data is one of the more common algorithmic tasks, with many different ways to perform it. Whenever we perform a web search and/or view statistics at some website, the presented data has most likely been sorted in some way. In this lecture and in the following lectures we will examine several different ways of sorting. The following are some reasons for investigating several of the different algorithms (as opposed to one or two, or the best algorithm). There exist very simply understood algorithms which, although for large data sets behave poorly, perform well for small amounts of data, or when the range of the data is sufficiently small. There exist sorting algorithms which have shown to be more efficient in practice. There are still yet other algorithms which work better in specific situations; for example, when the data is mostly sorted, or unsorted data needs to be merged into a sorted list (for example, adding names to a phonebook). 1

O(n) Sorting Algorithms: Counting Sort and Radix Sort Counting Sort. Counting sort is primarily used on data that is sorted by integer values which fall into a relatively small range (compared to the amount of random access memory available on a computer). Without loss of generality, we can assume the range of integer values is [0 : m], for some m 0. Now given array a[0 : n 1] the idea is to define an array of lists l[0 : m], scan a, and, for i = 0, 1,..., n 1 store element a[i] in list l[v(a[i])], where v is the function that computes an array element s sorting value. The sorted list can then be obtained by scanning the lists of l one-by-one in increasing order, and placing the encountered objects in a final list. Both steps require Θ(n) steps. Counting Sort is most commonly used on an array a of integers. One reason for this is that objects of a general type are often sorted on very large integers, which makes Counting Sort infeasible. For example, if an array of Employees are sorted based on social security number, then these values range in the tens of millions. When using an array of integers, l can be replaced by an integer array f, where f[i] represents the frequency of the number of elements of a that are equal to i. Example 1. Perform Counting Sort one the elements 9, 3, 3, 10, 5, 10, 3, 4, 9, 10, 1, 3, 5, 2, 4, 9, 9. 2

Radix Sort. Radix Sort can be applied to an array of integers for which each integer is represented by k bits, and the time needed to access a single bit is O(1). The algorithm works in k stages. At the beginning of stage i, 1 i n, it is assumed that the integers are stored in some array. The elements are then scanned one-by-one, with elements having i th least significant bit equal to j (j = 0, 1) being placed in array b j (keeping the same order as in a). The round ends by rewriting a as the elements of b 0 followed by the elements of b 1. Assuming that k is held constant, the complexity of radix sort is Θ(kn) = Θ(n). Example 2. Perform Radix Sort on the elements 9, 13, 10, 5, 10, 3, 4, 9, 1, 5, 2. 3

Insertion Sort. Insertion Sort represents a doubly iterative way of sorting data in an array. Step 1. sort the first item in the array. Step i + 1: assume the first i elements of the array are sorted. Move item i + 1 left to its appropriate location so that the first i + 1 items are sorted. Example 3. Use insertion sort on the array 43, 6, 72, 50, 44, 36, 21, 32, 47. 4

Code for Insertion Sort Applied to an Array of Integers: //Sorting in place the elements a[left:right]. void insertion_sort(int[] array, int left, int right) { int i, j,tmp; } //Attempt to move element i to the left. for(i=left+1; i <= right; i++) { //Move left until finding the proper location of a[i]. for(j=i; j > left; j--) { if(a[j] < a[j-1]) { tmp = a[j-1]; a[j-1] = a[j]; a[j] = tmp; } else break; //found the right location for a[i] } } 5

Expected Running Time of an Algorithm When analyzing the running time of an algorithm, sometimes we are interested in its expected running time or average running time. We denote this by the function Tave(n), and it represents the running time averaged over all possible inputs of size n. Moreover, a very useful tool for analyzing an algorithm s average running time is the notion of a random variable. We will usually denote random variables by using capital letters, such as X, Y, Z, I, T, etc.. Moreover, a random variable X has a domain of elements that X can assume at any given time. For example, if we let X denote the value showing face up after rolling a six-sided die, then the domain of X is dom(x) = {1, 2, 3, 4, 5, 6}. In this case we call X a numerical random variable because its domain consists of a set of numbers. Finally, each domain value x is assigned a probability p(x), which indicates the likelihood that X will assume the value x. Moreover, we require that p(x) = 1. We call these probabilities the probability distribution of X. x dom(x) Example 4. Let random variable X be observed as the value of the two-bit binary number whose most significant bit is obtained by tossing a coin that has a probability of 1/4 of landing heads, and whose least significant bit is obtained by tossing a fair coin. Provide the domain of X and its associated probability distribution. An important statistic of a numerical random variable X is its average value or expectation. This is denoted by E[X], and is defined by E[X] = x dom(x) xp(x). Hence, E[X] is just the sum of the domain values of X weighted by their probabilities. 6

Example 5. Compute E[X] for the random variable of Example 4. Notice that we can obtain new random variables by combining existing ones. For example, If X and Y are numerical random variables, then Z 1 = X + Y, Z 2 = X Y, Z 3 = XY, and Z 4 = X/Y are also random variables. Moreover, for sums of random variables, there is a very convenient way of computing the expectation. The proof of the following result can be found in most probability textbooks. Theorem 1. If X and Y are numerical random variables, then E[X +Y ] = E[X]+E[Y ]. In general, if X 1, X 2,..., X n are numerical random variables, then E[X 1 + X 2 + + X n ] = E[X 1 ] + E[X 2 ] + + E[X n ]. Example 6. Let X be the outcome of rolling six-sided die 1, while Y is the result of rolling six-sided die 2. Verify that E[X + Y ] = E[X] + E[Y ]. 7

A key insight to the problem of sorting is to realize that every array of n objects (which have some ordering property) represents a permutation of the integers 1, 2,..., n. An n-permutation is simply an ordered arrangement of the first n positive integers. In other words, if σ is an n-permutation, then we write σ = (σ(1) σ(2) σ(n)), where σ(i) and σ(j) are positive integers less than n + 1, and distinct if and only if i j. Example 7. What is the permutation associated with the unsorted array of Example 3? An inversion of a permuation σ is a pair (i, j) such that i < j and σ(j) < σ(i). For example, the permutation (1 4 5 3 6 2) has six inversions. 8

Observation. Every successful comparison of insertion sort reduces the number of inversions by 1. Thus, the number of steps (comparisons) needed for insertion sort is directly proportional to the number of inversions possessed by the array. Lemma 1. The average number of inversions possessed by a random permutation is n(n 1) 4. Proof. For each i and j, with 1 i < j n, Let I ij be an indicator random variable that equals 1 iff, when an n-permutation is constructed at random, index i occurs to the right of index j. Then E[I ij ] = pr(i ij = 1) = 0.5. Also, the number of inversions of the random permutation equals Therefore, the expected number of inversions is E[ 1 i<j n 1 i<j n 1 i<j n I ij ] = 1 2 = 1 ( n 2 2 I ij. 1 i<j n ) = E[I ij ] = n(n 1). 4 Theorem 2. The average and worst-case for insertion sort is O(n 2 ). Proof. The worst case occurs when the array is sorted in reverse order. In this case the array has the maximum number of inversions, which is Θ(n 2 ). Moreover, Lemma 1 established that, even for the average case (where we assume every permutation is equally likely), the number of inversions is n(n 1) 4 = Θ(n 2 ). 9

Introduction to Divide and Conquer Algorithms There exist many problems that can be solved using a divide-and-conquer algorithm. A divide-andconquer algorithm A follows these general guidelines. Divide. Algorithm A divides original problem into one or more subproblems of a smaller size. Conquer. Each subproblem is solved by making a recursive call to A. Combine. Finally, A combines the subproblem solutions into a final solution for the original problem. Some problems that can be solved using a divide-and-conquer algorithm: Binary Search: Locating an element in a sorted array. Quicksort and Mergesort: Sorting an array. Order Statistics: finding the k th least or greatest element of an array. Geometric Algorithms: finding the convex hull of a set of points; finding two points in a plane that are closest. Matrix Operations: matrix inversion, Fast-Fourier Transform, matrix multiplication. Maximum Subsequence Sum: find the maximum sum of any subsequence in a sequence of integers. A divide-and-conquer recurrence relation is used to define the running time T (n) of a divide an conquer algorithm. They take the general form T (n) = at (n/b) + f(n), where a represents the number of subproblems, n/b gives the size of each subproblem, while f(n) provides the number of steps needed to divide the input and combine the solutions. Example 6 provides one approach for solving these recurrence relations. 10

Mergesort is an algorithm that sorts an array of elements by splitting it into two halves, a left [] and a right [], recursively sorting both halves (again using Mergesort), and then merging the two sorted halves into one sorted array. Example 8. Demonstrate Mergesort using the array 5, 8, 6, 2, 7, 1, 0, 9, 3, 4, 6. 11

Example 9. Provide a divdide-and-conquer recurrence relation for Mergesort. Make a recursion tree for the relation and use it to analyze the growth of T (n). 12

Quicksort Quicksort is considered in practice to be the most efficient sorting algorithm for arrays of data stored in local memory. Quicksort is a divide-and-conquer algorithm which works in the following manner. Let a[] be the array to be sorted. 1. if a[] is an array with 5 or fewer elements, then sort array using Insertion Sort; 2. find an array element M (called the pivot) which is a good candidate for splitting a[] into two subarrays, a left [] and a right [], such that x M for every x a left [] and x M for every x a right []; 3. swap the elements of a[] so that the elements x M move to the left side of a[] to form a left [], and the elements x M move to the right side of a[] to form a right []; 4. a[] is now of the form a 0, a 1,..., a i 1, a i = M, a i+1, a i+2,..., a n 1 where a j M for every j i 1, and a j M for every j i; 5. let a left = a[0 : (i 1)] and a right = a[i + 1 : (n 1)]. After both a left and a right have been sorted using Quicksort, the entire array a[] will be sorted. Note that in the future we refer to the algorithm that partitions a[] into a left [] and a right [] (relative to some median M), the Partition algorithm. Finding an element to split the array. A median for an array a[] is an element M a[] which splits the array into two equal pieces, where piece 1 (respectively 2) consists of elements all of which are less than or equal to (respectively, greater than or equal to) M. Although finding the median M of a[] would satisfy step 2 of quicksort, in practice finding the median of the entire array seems too costly. On the other hand, a compromise between speed and accuracy that seems to work in practice is to use the median-of-three rule which is take the median of the three elements a[0], a[n 1], a[ n 1 ]. 2 Swapping elements of a[]. Once a pivot M has been selected, it can be swapped with the last element of the array (of course, in the median-of-three strategy, it might be the last element). The remaining elements a[0] through a[n 2] can be swapped using two markers, left and right, which respectively begin on the left and right sides of the array. Both markers move toward the center of the array. A marker stops when it encounters an element which should be on the other side of the array. For example, marker left will stop when it encounters an element x for which x M. When both markers stop, they swap elements at those points, unless they have crossed one another, in which case the process terminates. 13

Example 10. Demonstrate the Quicksort algorithm using the array 5, 8, 6, 2, 7, 1, 0, 9, 3, 4, 6. 14

Complexity of Quicksort. The time complexity (i.e. number of steps T (n) for an array size of n comparables) of Quicksort depends on how the median is chosen. Later in this lecture we demonstrate how to find an exact median in O(n) steps. Using this approach quicksort has a worstcase complexity of O(n log n). On the other hand, if the median is chosen randomly, then it can be shown that quicksort has O(n log n) average-case complexity, but O(n 2 ) worst-case complexity. In practice, the median-of-three approach gives empirically faster running times than both the exact and random approaches. Example 11. Verify that, if an exact median for an array of n comparables can be found in O(n) steps, then Quicksort has a worst-case complexity of O(n log n). Example 12. Show that the average size of a left [] is n 1 2 when the input to Quicksort is n distinct elements, and the median M is randomly chosen from one of the elements. 15

General Solution to the Divide and Conquer Recurrence Relation Theorem 3. Let n equal a positive power of b, then the asymptotic growth of the general solution to the recurrence relation T (n) = at (n/b) + f(n) is given by T (n) = Θ(n log b a ) + log b n 1 j=0 a j f(n/b j ). 16

Example 13. Use Theorem 3 to determine the running time of a divide-and-conquer algorithm that requires three recursive calls (each with input-size n/2) and 4n 2 steps that include dividing the input, and using the three solutions to obtain the final solution to the orginal problem. Other sorting algorithms: Shellsort: similar to insertion sort, but the sorting is done iteratively on subarrays of the original arrays; the elements of the subarrays are equidistant from one another, with that distance converging to one on the final sort. Bubblesort: Similar to Insertion Sort, but the swapping of elements begins at the end of the array, rather than the front of the array. 17

Exercises. 1. If Counting Sort is used to sort an array of integers that fall within the interval [ 5000, 10000], then how large of an auxiliary array should one use? Explain. 2. What is the running time of Insertion Sort if all elements are equal? Explain. 3. Sort 13,1,4,5,8,7,9,11,3 using Insertion Sort. 4. Sort 13,1,4,5,8,7,9,11,3 using Radix Sort. 5. Suppose we generate a three-bit binary number in the following manner. The least significant bit is generated by tossing a fair coin (e.g. heads yields a 1 bit, while tails yields a 0 bit). The middle bit is obtained by tossing a coin that has probability 0.60 of landing heads (again, heads yields a 1 bit). Finally, the leading bit is obtained by multiplying the first two bits. Let X denote the decimal value of the generated number. State the domain of X, and provide a probability distribution for the domain. Verify that your distribution adds to one. 6. Compute E[X] for the random variable X of the previous problem. 7. Let X be a random bit that has a probability of 0.5 of equaling 1; Y a random bit that has probability 0.75 of equaling 1; Then let Z = XY. Finally, let W = X + Y + Z. Compute E[W ] in two different ways: i) by determing the domain and probability distribution for W, and ii) by using Theorem 1. 8. Consider the linear search algorithm with scans through an array a to determine if a contains an element x. Assuming a has a size of n, we say that the algorithm require i steps if x is located at index i; i.e. a[i] = x, for i = 0, 1,..., n 1. Furthermore, we say the algorithm requires n steps if x is not found in a. Suppose that we know the following statistics about linear search: 60% of all searches fail to locate the element x in a. Moreover, when x is found in a, it is equally likely to be in any of the array locations. Letting S denote the number of steps needed for a linear search over an array of size n, use the above facts to determine i) the domain of S, ii) a probability distribution for the domain of S, and iii) E[S]. 9. An n-permutation is an ordering of the numbers 0, 1, 2,..., n 1, in which each number occurs exactly once. For example, 4, 3, 1, 2, 0 is a 5-permutation. Assume there is a function called rand int(i,j) which, on inputs i j, returns a randomly chosen integer from the set {i, i + 1,..., j}. Now consider the following algorithm which, on input n, generates a random n- permutation within an array a[0],..., a[n 1]. To assign a[i] it calls rand int(0,n-1) until rand int returns a value that was not assigned to a[0], a[1],..., a[i 1]. The code for this algorithm is provided as follows. int a[n]; //Initialize a[] for(i=0; i < n; i++) a[i] = UNDEFINED; for(i=0; i < n; i++) 18

{ while(a[i] == UNDEFINED) { m = rand_int(0,n-1); } } //Check if m has already been used for(j=0; j < i; j++) if(a[j] == m) break; if(i == j) //m has not previously been used a[i] == m; What is the expected running time of this algorithm? Explain and show work. 10. Repeat the previous problem, but now assume that, rather than checking each a[0],..., a[i 1] to see if the current random value m has already been used, an array called used is provided so that m has been used iff used[m] evaluates to true. In other words, the used array is initialized so that each of its values is set false, and then used[m] is set to true when m is first returned by rand int(0,n-1). Re-write the code of the previous problem and adapt it to this new algorithm, and analyze its expected running time. 11. This problem provides an even better approach to generating a random permutation. The algorithm starts by assigning a[i] the value i, for i = 0, 1,..., n 1. It then iterates n times so that, on iteration i, i = 0, 1,..., n 1, it swaps a[i] with a[k], where k is randomly chosen from the set {i, i + 1,..., n 1}. Prove that this algorithm yields a random n-permutation written in a[]. What is its expected running time? 12. Sort 13,1,4,5,8,7,9,11,3 using Mergesort. Assume the base case is an array of size 2 or less. 13. Perform the partitioning algorithm on 13,1,4,5,8,7,9,11,3 using the median-of-three heuristic to find the pivot. 14. Why is the asymptotic running time of Mergesort independent of the initial ordering of the array elements? Is the same true for Quicksort if the selected pivot is an exact median? What if the selected pivot is obtained using the median-of-three heuristic? 15. Suppose we swap elements a[i] and a[i + k] which were originally out of order (inverted). Under what conditions will only one inversion be removed? Under what conditions will 2k 1 inversions be removed? Argue that 2k 1 is the most number of inversions that can ever be removed. 16. What is the worst-case running time of Quicksort if the pivot is chosen as the first element in the array. Explain. 17. List all the inversions that occur in the array of 13,1,4,5,8,7,9,11,3. 19

18. Use the formula T (n) = log b n 1 j=0 a j f( n b j ) + Θ(nlog b a ), to determine the asymptotic growth of T (n) is of the following. T (n) = T (n/2) + 1 T (n) = 3T (n/3) + n 20

Exercise Hints and Answers. 1. Since there are 15,001 possible values for an array element, an auxiliary array of size 15,001 is needed. then how large of an auxiliary array should one use? Explain. 2. Linear since it requires zero swaps. 3. 1,3,4,5,7,8,9,11,13 4. After Round 1, the numbers should be ordered as 00100, 01000, 10011, 00001, 00101, 00111, 01001, 01011, 00011. After Round 2: 00100, 01000, 00001, 00101, 010001, 10011, 00111, 01011, 00011. Continue with Rounds 3, 4, and 5, using the 3rd, 4th, and 5th least significant bits of the numbers. 5. Domain of X = {0, 1, 2, 7}. p(0) = (0.5)(0.4) = 0.2, p(1) = (0.5)(0.4) = 0.2, p(2) = (0.5)(0.6) = 0.3, p(7) = (0.5)(0.6) = 0.3. 6. E[X] = (0)(0.2) + (1)(0.2) + (2)(0.3) + (7)(0.3) = 2.9. 7. Domain of W = {0, 1, 2, 3}. P (0) = (0.5)(0.25) = 1/8, p(1) = (0.5)(0.25) + (0.5)(0.75) = 1/2, p(2) = 0, p(3) = (0.5)(0.75) = 3/8. E[W ] = (0)(1/8) + (1)(1/2) + (2)(0) + (3)(3/8) = 13/8. Also, using the linearity of expectation, we have E[W ] = E[X] + E[Y ] + E[Z]. We have E[X] = 1/2, E[Y ] = 3/4, and E[Z] = 3/8. Thus, E[W ] = 1/2 + 3/4 + 3/8 = 13/8. 8. Domain of S = {0, 1,..., n}. The probability distribution is p(i) = 0.4/n, i = 0,..., n 1, and p(n) = 0.6. Using the linearity of expectation we have n 1 E[S] = (0.4/n) i + (n)(0.6) = (0.2)(n 1) + (0.6)n = 0.8n 0.2. i=0 Therefore, on average about 80% of the array needs to be scanned. 9. Let T be the running time of the algorithm. Let S i denote the number of calls to rand() that are needed in order to generate the i th number of the permutation. Then T = (1)S 1 + 2S 2 + + ns n. This is true since each call to rand() when generating the i the permutation will require an average of Θ(i) steps within the inner-most loop to check if the generated number has been used. Now when generating the i th permutation number using rand(), there is a probability of p i = (i 1)/n that a number will be generated which is already in the permutation. Then the probabiliy of not generating a repeat number is thus 1 (n i + 1)/n = (n i + 1)/n, and the expected number of calls that will be needed to obtain a non-repeat is E[S i ] = n/(n i+1). Thus, by linearity of expectation, E[T ] = n n i i=1. The asymptotic growth of this sum n i+1 can be obtained by evaluating n n x 1 dx, which yields E[T ] = n x+1 Θ(n2 log n). 10. Same analysis as previous problem, but now we have T = S 1 + + S n, since the returned value of a call to rand() can be checked in O(1) steps as to whether or not it is a repeat. This yields a simplified expectation of E[T ] = n n 1 i=1. But the sum in this expression is the n i+1 harmonic series. Therefore E[T ] = Θ(n log n). 21

11. First notice that the only operations performed on a are swaps between two different array entries. Thus, if the array begins with 0,..., n 1, then it will end with a permutation of 0,..., n 1. Moreover, all values are equally likely to be placed at position i, for all i = 0,..., n 1. The expected running time is Θ(n) since the algorithm is accomplished by performing n swaps and n calls to a random-number generator. 12. 1, 3, 4, 5, 7, 8, 9, 11, 13 13. Pivot is median of (13, 8, 3), which equals 8. a left = 7, 1, 4, 5, 3, a right = 9, 11, 13. Final pivot location is at index 5. 14. Regardless of the ordering, all the merge steps must still be performed in the Mergesort algorithm. The merging step will always take Θ(n 1 + n 2 ), where n 1 is the length of the left sorted array, and n 2 is the length of the right sorted array. The same is true for Quicksort if the median is used as the pivot. Again, the partitioning step will always require Θ(n) steps, where n is the size of the subarray that is to be partitioned. If median-of-three heuristic is used, then it is possible to obtain unfavorable pivots, that lead to an O(n 2 ) worst-case running time. Verify this for an array of size eight. 15. Only one inversion removed: a[i] > a[i + 1] < a[i + 2] < < a[i + k 1] > a[i + k]. 2k 1 inversions removed: a[i],..., a[i + k] is in reversed order. 16. O(n 2 ). Consider what happens if array is already sorted. 17. List all the inversions that occur in the array of 13,1,4,5,8,7,9,11,3. (13, 1),..., (13, 3), (4, 3), (5, 3),..., (11, 3), (8, 7). 18. Use the formula T (n) = log b n 1 j=0 to determine the asymptotic growth of T (n). T (n) = log n 1 j=0 1 j + Θ(n 0 ) = Θ(log n) T (n) = log 3 n 1 j=0 3 j (n/3 j ) + Θ(n 1 ) = Θ(n log n) a j f( n b j ) + Θ(nlog b a ), 22