Sorting In this class we will consider three sorting algorithms, that is, algorithms that will take as input an array of items, and then rearrange (sort) those items in increasing order within the array. This of course means that for the items stored in the array there must be some notion of order (one item is less than or equal to another). For numbers (int, short, long, float, double, etc.) this is the standard <. The algorithms we will use are perhaps the easiest to program; however, they are not the most efficient. All require work, which in the worst case grows as the square of the size of the input array. That is, if you increase the size of the array by a factor of, the amount of work increases by a factor of. Think of what this means for some array sizes such as 10, 100, 1000, 10000, 100000, etc. Bubble Sort For this sorting algorithm, successive pairs of elements are compared starting at the first element and moving toward the last element. If two elements are out of order, they are swapped. Going through the array moves the largest element to the last position, but not much can be said about the others. So we do it again, moving the second largest element to the next-to-last position. Repeating this enough times will place each element in its proper position in a sorted array. How many is enough? If the array has length 2, you would need to do it only once. With 3, you would have to do it twice. For length you would need to do it 1 times. Following is the pseudo code for doing this on an array of length m. Repeat m-1 times: //Outer Loop For each index i from 0 to m 2 do //Inner loop if A[i] > A[i+1] swap A[i] and A[i+1] This can easily be implemented with two nested for statements. Once implemented you will see that with a slight variation in the code, the work can be cut roughly in half; however, the total work will still grow as the square of the size of the array. Selection Sort This algorithm can work in one of two ways, looking for maximum (minimum) elements in the array and placing them starting at the right end (left end). Once an array element is placed in its proper position, it is not considered again. Each time we look for a maximum (minimum) element, the size of the portion of the array to be considered is one less than in the previous consideration. Note: Because we will
need to swap maximum (minimum) elements, it is the index of those elements that we must find, and not the actual values of the array elements. Following is the pseudo code for doing this on an array of length m, using minimum values. For each index i from 0 to m-2 do min index of minimum value from A[i] to A[m-1] swap A[min] and A[i] Even though it may not appear so, there are two loops here. Insertion Sort Insertion sort is based on the assumption that A[0] through A[i-1] is already sorted. This being the case, we can insert A[i] into its proper location (indexed at x) by shifting elements A[x] through A[i-1] one location to the right and inserting the value of A[i] into A[x]. How do we find x? We look to the left of A[i] and find the first element that is less than or equal to A[i]. If this occurs at A[y], then x = y + 1. If we go off the array ( y= -1), then the new element goes in A[0]. How do we get this started, since the assumption is that A[0] through A[i-1] is already sorted? This is an inductive method. A[0] through A[0] is certainly sorted. So, we insert A[1]. Then A[0] through A[1] is sorted, so that we can insert A[2]. Repeat this one step at a time until A[m-1] (the last element in the array) is inserted, and the entire array is sorted. Following is the pseudo-code for accomplishing this. For each i from 1 to m-1 do key A[i] //save the element we want to insert j i 1 //start looking to the left while j >= 0 and A[j] > key //What stops this loop? A[j+1] A[j] //move A[j] one to the right j j-1 //next j A[j+1] key Unlike the previous two algorithms, the work required here depends heavily on how nearly sorted the original array is to start with. For example, if A is already sorted, this will look at each element once, and not have to shift elements to the right. The term nearly sorted is measured by the number of pairs of elements that can be found which are out of order with respect to each other. For example 2, 3, 4, 8, 7, 6, 12 has pairs (8,7), (8,6), and (7,6) that are out of order, and required shifting is very small. If an array of length m is in decreasing order to start with, then the elements in every pair of are out of order with each other. How many pairs are there? This would be a worst-case scenario.
Big Oh Notation for Complexity We have considered three algorithms for sorting an array of items. In our case we looked at integers and discussed how this might be adapted to strings using one of the String methods, comparetoignorecase. These sorting techniques can be done on arrays whose elements are comparable with either an operator > or a method such as comparetoignorecase, which returns a value that can be used to determine relative size. Following is code for the three alogrithms, written as Java methods: Bubble Sort, Selection Sort, and Insertion Sort. Bubble Sort public static void bubblesort(int[] A) int i, j; for(i=1; i<a.length; i++) for(j=0; j<a.length-i; j++) if(a[j]>a[j+1]) swaparrayelements(a,j,j+1); Note that this has been slightly modified, as we indicated could be, by changing A.length-1 to A.length-i in the inner loop. This reduces the number of comparisons required in the loop by one for each iteration of the outer loop. This is okay since each time the next largest element is moved to its proper place in a sorted list, and so there is no need to compare it again to elements with higher indices. We will see that this cuts the work roughly in half. There is another change that can be made to this algorithm by observing the following. If in the processing of the inner loop, no swap is done, there is really no reason to continue the outer loop further. So, we can add a boolean variable to the method and break from the outer loop if after any completion of the inner loop, no swap has been done. The code for this is as follows:
public static void bubblesort(int[] A) boolean swap = true; int i, j; for(i=1; i<a.length; i++) swap = false; //no swaps have been done for(j=0; j<a.length-i; j++) if(a[j]>a[j+1]) swaparrayelements(a,j,j+1); swap = true; //must do outer loop again if(!swap) break; //break from outer loop What would this do if the array is already sorted when the process began? What if it is in reverse order to begin with? At this point we shall consider the cost of running Bubble Sort on an array of size n by counting the number of element comparisons that are done. Let s consider the version that does not check for swaps. Note that the outer loop iterates n-1 times. The first time, the element comparison is done n-1 times. During the second iteration of the outer loop, the element comparison is done n-2 times. Each time the outer loop iterates, the number of element comparisons in the inner loop goes down by 1. So, the total number of element comparisons is: 1+2+3+ + 1=. It is well known that =, a fact that can be shown using mathematical induction. Thus we see that the total number of element comparisons is. As n gets large, which of these two terms dominates the growth? If we divide this by, we obtain the expression. As n gets larger and larger, this number gets closer to ½. So the total number of element comparisons can be bounded above by some where k is some constant. In this case since we are subtracting, we could actually use ½ for k. Review the analysis above and then consider what happens if we use the version of Bubble Sort that takes into account whether or not one or more swaps occurred during one complete execution of the inner loop. How does the analysis change? If the array is sorted to start with, then the inner loop will execute only once resulting in n-1 element comparisons. If the array is in reverse order to start with, then the outer loop will run n-1 times and for corresponding runs of the inner loop, there will be n-1, n-2,, 1 element comparisons, leading to the same analysis above for the case when swaps aren t considered.
Selection Sort The code for Selection Sort is: //This method should use selection sort to sort the //array A. public static void selectionsort(int[] A) int i; int indexofmin=0; //set minimum to first element for(i=0; i<a.length-1; i++) indexofmin = minindex(a,i,a.length-1); swaparrayelements(a,i,indexofmin); And the code for minindex is: public static int minindex(int[] A, int f, int l) int i; int min = f; //assume A[f] is minimum for(i=f+1; i <= l; i++) if(a[i]<a[min]) min = i; return min; Again, this involves an outer loop, which runs for n-1 iterations. Within this loop is a call to a method that finds the index of the minimum value over a specific portion of the array. In particular, if i is the current index of the outer loop, then the call to minindex results in searching the array from index i+1 to A.length -1 which results in n-1-i element comparisons. Again, there are n-1, n-2, n-3, 1 comparisons each time respectively, and the analysis results in comparisons. Insertion Sort For the last method, the code for Insertion Sort is: (starts next page)
public static void insertionsort(int[] A) int key; int i,j; for(i=1; i<a.length; i++) key = A[i]; j = i-1; while(j>=0 && A[j]>key) A[j+1] = A[j]; j--; A[j+1]=key; The number of element comparisons done in this method is heavily dependent on how nearly sorted the array is to start with. There are two loops, the outer of which runs n-1 times, where n is the length of the array. Within that loop, are three replacement instructions and one while loop. If the array is sorted to start with, the while loop, which is pre-test, will make at most one element comparison, A[j]>key, and terminate. Note that if j < 0, it will also terminate and not make an element comparison. So, for each iteration of the outer loop there is no looping that takes place inside, and we end with only n-1 element comparisons total, one for each iteration of the outer loop. If the array is in reverse order to begin with, then for each iteration of the outer loop, the inner loop will necessarily run until j < 0, shifting elements to the right each time. If the index of the outer loop is i, then there will be i element comparisons, resulting in a total of 1 + 2 + 3 + n-1 element comparisons and shifts, which gives the same results as Bubble Sort (without swap variable) and Selection Sort. For this sorting routine we can be more specific regarding the amount of work. The above analysis showed the best case (sorted to start with), and the worst case (reverse order to start with). We can say more. In an array A, let s call the pair, a transposition if < >, that is, they are out of order with each other. The amount of work required by Insertion Sort to shift elements is proportional to the number of transpositions in the array. For example, if the array is sorted already, then there are no transpositions; however, we must go through it once to establish this. So the work is +1=. If the array is in reverse order to begin with, then every pair of elements is a transposition. The number of pairs in an array of length n is = =. The symbol (,2 is also used) stands for the number of ways 2 things can be chosen from n things.
Big Oh Notation To generalize this a bit, suppose you have a polynomial in the variable n, say = + + + + h >0. If you look at /, then as n increases, the ratio approaches, the coefficient of the highest order term. How big does n need to be so that / is close to? It depends on the sizes of the other coefficients, but remember they are all divided by a positive power of n, so they will eventually get close to zero. (We aren t defining here what we mean by close to. That comes up in studying limits.) In this case we say that = (read is Big Oh of ). Definition Let and be two functions defined on the non-negative integers. Then = if there is a constant c > 0 and a positive integer N such that if n > N, then 0. Since we stipulate that n>n in order for this inequality to hold, we theoretically are implying that what happens with some fixed finite number of beginning values doesn t matter; however, from a computing point of view, they could matter, if the total amount of work for the first finitely many is significantly large. Now we will relate this to our sorting routines and analysis described above. Bubble (without the swap variable): comparisons, or. Bubble(with swap variable): Best case is and worst is. Selection Sort: Always. Insertion Sort: Best case is and worst is.