KF5008 Algorithm Efficiency; Sorting and Searching Algorithms;

Efficiency: Principles An algorithm is a step-by-step procedure for solving a stated problem. The algorithm will be performed by a processor (which may be human, mechanical, or electronic). The algorithm must be expressed in steps that theprocessor is capable of performing. The algorithm must eventually terminate.

Efficiency: time and space complexity Given several algorithms to solve the same problem, which algorithm is best? Given an algorithm, is it feasible to use it at all? In other words, is it efficient enough to be usable in practice? How much time does the algorithm require? How much space (memory) does the algorithm require? In general, both time and space requirements depend on the algorithm s input (typically the size of the input).

Example: efficiency 35 30 CPU time, milliseconds 25 20 15 10 5 0 10 20 30 40 50 number of items to be processed Hypothetically compare two sorting algorithms: Time taken by Algorithm B grows more slowly than algorithm A.

Efficiency: measuring time Should we measure time in seconds? + is useful in practice depends on language, compiler, and processor. Should we count algorithm steps? + does not depend on compiler or processor depends on granularity of steps. Should we count characteristic operations? (e.g., arithmetic ops in mathematical algorithms, comparisons in searching algorithms) + depends only on the algorithm itself + measures the algorithm s intrinsic efficiency.

Algorithm efficiency: effect of halving problem at each step n versus log(n) 45 40 35 30 n, log(n) 25 20 15 10 5 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 n

Effect of halving problem at each step n log 2 (n) Linear search among n items takes O(n) steps Binary search among n items takes O(log 2 n) steps 2 1 4 2 8 3 16 4 32 5 64 6 128 7 256 8 512 9 1024 10

Searching and Sorting Why do we talk about searching and sorting together? It s because it s easier to search for something in a list that is already sorted. How would you search for a word in an unsorted dictionary? You would have to check every single word! Sorting methods: Bubble Sort Selection Sort Insertion Sort Merge Sort Others...

Bubble Sort Conceptually the simplest sorting algorithm Also the least efficient Work through the list, looking at each pair of items in turn Swap them if they are the wrong way round At the end of the first pass, the largest item will be in its correct place, so we then need to repeat the process for all but the last element of the list...and so on, until the list has been sorted.

Bubble Sort Code public static void bubblesort(int arr[]) { int out, in; for (out = arr.length - 1; out > 1; out--) { // outer loop (backward) for (in = 0; in < out; in++) { // inner loop (forward) if (arr[in] > arr[in + 1]) // out of order? swap(arr, in, in + 1); // swap them } } } public static void swap(int[] arr, int i, int j) { int tmp; tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp; }

Bubble Sort Results Input: 33 76 28 30 22 65 71 30 97 87 step 1: 33 28 30 22 65 71 30 76 87 97 step 2: 28 30 22 33 65 30 71 76 87 97 step 3: 28 22 30 33 30 65 71 76 87 97 step 4: 22 28 30 30 33 65 71 76 87 97 step 5: 22 28 30 30 33 65 71 76 87 97 step 6: 22 28 30 30 33 65 71 76 87 97 step 7: 22 28 30 30 33 65 71 76 87 97 step 8: 22 28 30 30 33 65 71 76 87 97 sorted: 22 28 30 30 33 65 71 76 87 97

Bubble Sort Efficiency If there are N items in the array, then N 1 comparisons are made on the first pass, N 2 on the second pass and so on. This gives a total of N 1 + N 2 + this is N (N 1)/2 about N 2 + 1 comparisons; What about swaps? If the data is random, swaps will be carried out about 1 2 of the time, so there will be about N 2 /4 comparisons So the time the algorithm takes is proportional to N 2 We say the algorithm is O(N 2 )

Selection Sort The next simplest algorithm to bubble sort Work through the list, finding the smallest item Swap it with the item in the first position Next time, we can start with the second item and repeat the process on the rest of the list...and so on.

Selection Sort Code public static void selectionsort(int arr[]) { int out, in, min; for (out = 0; out < arr.length - 1; out++) { // outer loop min = out; // minimum for (in = out + 1; in < arr.length; in++) { // inner loop if (arr[in] < arr[min]) // if min greater, min = in; // we have a new min } // end for (in) swap(arr, out, min); //see Bubble sort code } // end for(out) } // end selectionsort()

Selection Sort Results input: 68 56 47 88 50 44 68 85 74 11 step 1: 11 56 47 88 50 44 68 85 74 68 step 2: 11 44 47 88 50 56 68 85 74 68 step 3: 11 44 47 88 50 56 68 85 74 68 step 4: 11 44 47 50 88 56 68 85 74 68 step 5: 11 44 47 50 56 88 68 85 74 68 step 6: 11 44 47 50 56 68 88 85 74 68 step 7: 11 44 47 50 56 68 68 85 74 88 step 8: 11 44 47 50 56 68 68 74 85 88 step 9: 11 44 47 50 56 68 68 74 85 88 sorted: 11 44 47 50 56 68 68 74 85 88

Selection Sort Efficiency The number of comparisons is, once again, N (N 1)/2 This is O(N 2 ), just as in bubble sort But the number of swaps required is less than the number of items in the list This is O(N) If N is large, the comparisons will dominate so the algorithm as a whole is still O(N 2 ) But it is faster than bubble sort...and may be much faster, if the swaps take a lot longer than comparisons do

Insertion Sort The next most complex sort Imagine that the list is partially sorted; that is, the first few items in the list are sorted among themselves, though they may need to be moved in order to insert items from the unsorted part of the list Consider the first unsorted item Store it outside of the list Compare it with each of the items in the sorted part of the list, starting with the highest Move the item concerned to the right, until the correct place is found....and so on.

Insertion Sort Code public static void insertionsort(int arr[]) { int in, out; for (out = 1; out < arr.length; out++) { // out is dividing line int temp = arr[out]; // remove marked item in = out; // start shifts at out while (in > 0 && arr[in - 1] >= temp) { // until one is smaller, arr[in] = arr[in - 1]; // shift item right, in--; // go left one position } arr[in] = temp; // insert marked item } // end for } // end insertionsort()

Insertion Sort Results input: 40 53 54 2 11 35 33 40 70 49 step 1: 40 53 54 2 11 35 33 40 70 49 step 2: 40 53 54 2 11 35 33 40 70 49 step 3: 2 40 53 54 11 35 33 40 70 49 step 4: 2 11 40 53 54 35 33 40 70 49 step 5: 2 11 35 40 53 54 33 40 70 49 step 6: 2 11 33 35 40 53 54 40 70 49 step 7: 2 11 33 35 40 40 53 54 70 49 step 8: 2 11 33 35 40 40 53 54 70 49 step 9: 2 11 33 35 40 40 49 53 54 70 sorted: 2 11 33 35 40 40 49 53 54 70

Insertion Sort Efficiency This algorithm copies rather than swaps. On the first pass, a maximum number of 1comparisons is required, on the second 2, and so on. This is 1 + 2 + 3 + + (N-1) Once again this is equal to N (N 1)/2 However, for random data the number of comparisons needed will be about 1 2 the maximum: N (N 1)/4 The number of copies is approximately equal to the number of comparisons. Again, O(N 2 )

Insertion Sort Efficiency Often significantly faster than Selection Sort. However, particularly poor if the list is sorted into reverse order (then no better than Bubble Sort).

Can We Do Better? Yes! Other (cleverer) algorthms Merge sort Quick sort

Merge Sort Conceptually very simple: Divide the sorted list into two sublists of about half its size. Sort each of the two sublists. Merge the two lists together. Devised by John von Neumann, 1945. Inherently recursive (it calls itself).

Recursion, Recursive Functions Definition (joke): recursion (n.) see recursion. A method which is defined in terms of itself. The classic example is factorial: 0! = 1 If N > 0, N! = N (N 1)!

Recursive Factorial in Java public static long factorial(int n) { if (n==0) return 1; else return n * factorial(n-1); }

Recursion Brief Guide The problem consists of: A stopping case (which doesn t involve recursion) A recursive call to a reduced version of the problem. That means it s closer to the stopping case. The stopping case must be reachable. What do you think happens if it s not? [The computer will (eventually) run out of heap space]

Merge Sort Code (1) // Mergesort algorithm. // parameter a is an array of int items. public static void mergesort(int[] a) { int[] tmparray = new int[a.length]; mergesort(a, tmparray, 0, a.length 1); }

Merge Sort Code (2) //Helper method that makes recursive calls. // a is an array of int items. // tmparray an array to place the merged result. // left = the left-most index of the subarray. // right = the right-most index of the subarray. private static void mergesort(int[] a, int[] tmparray, int left, int right) { if (left < right) { int center = (left + right) / 2; mergesort(a, tmparray, left, center); mergesort(a, tmparray, center + 1, right); merge(a, tmparray, left, center + 1, right); } }

Merge Sort Code (3) //Helper method t0 merge 2 sorted halves of a subarray. // a is an array of int items. // tmparray an array in to place the merged result. // leftpos = the left-most index of the subarray. // rightpos = index of the start of the second half. // rightend = the right-most index of the subarray. private static void merge(int[] a, int[] tmparray, int leftpos, int rightpos, int rightend) { int leftend = rightpos - 1; int tmppos = leftpos; int numelements = rightend - leftpos + 1; //ctd...

Merge Sort Code (4) // Main loop while (leftpos <= leftend && rightpos <= rightend) if (a[leftpos] <= (a[rightpos])) tmparray[tmppos++] = a[leftpos++]; else tmparray[tmppos++] = a[rightpos++]; while (leftpos <= leftend) // Copy rest of first half tmparray[tmppos++] = a[leftpos++]; while (rightpos <= rightend) // Copy rest of right half tmparray[tmppos++] = a[rightpos++]; } // Copy tmparray back for (int i = 0; i < numelements; i++, rightend--) a[rightend] = tmparray[rightend];

Merge Sort Efficiency If the time taken to sort N items is T(N), it follows that: T(N)= 2T(N/2) + N...following on from the recursive nature of the algorithm. Merge Sort is significantly more efficient than the other methods we have seen, as it is O(N log(n)). This grows more slowly than N2. But it does have one disadvantage It takes up more memory: it has a copy of the array.

N 2 versus N log 2 (N) 450 400 350 300 250 200 N^2 N log(n) 150 100 50 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N N^2 N log(n) 5 25 11.61 6 36 15.51 7 49 19.65 8 64 24.00 9 81 28.53 10 100 33.22 11 121 38.05 12 144 43.02 13 169 48.11 14 196 53.30 15 225 58.60 16 256 64.00 17 289 69.49 18 324 75.06 19 361 80.71 20 400 86.44

Quick Sort Devised by Tony Hoare, c. 1960. Like merge sort, it divides the list into two but in a different way. Pick an element from the list it is called the pivot. Reorder the list so that those less than the pivot come before it, and those greater than the pivot come after it. Sort the two sublists either side of the pivot (the pivot is already in the correct place). Depends on good selection of the pivot ideally it should be the median value in the list.

Quick Sort Code (1) public static void quicksort(int arr[]) { quicksort(arr, 0, arr.length - 1); }

Quick Sort Code (2) public static void quicksort(int arr[], int start, int end) { int i = start; // index of left-to-right scan int k = end; // index of right-to-left scan if (end - start >= 1) { // check that there are at least two elements int pivot = arr[start]; // set the pivot as the first // element in the partition

Quick Sort Code (3) while (k > i) { // the scan indices from left and right have not met } while (arr[i] <= pivot && i <= end && k > i) // from the left, look for the first i++; // element greater than the pivot while (arr[k] > pivot && k >= start && k >= i) // from the right, look for the first k--; // element not greater than the pivot if (k > i) // if the left seek index is still smaller //than the right index, //swap the corresponding elements swap(arr, i, k);

Quick Sort Code (4) } swap(arr, start, k); // after indices have crossed // the last element in the left partition // with the pivot quicksort(arr, start, k - 1); // recursively quicksort the left partition quicksort(arr, k + 1, end); // recursively quicksort the right partition } else { // if only one element in the partition, do no // sorting return; // the array is sorted, so exit }

Comparison of sorting algorithms Algorithm Num comparisons Time complexity Space complexity Selection sort ~ N2/2 O(N2) O(1) Insertion sort ~ N2/4 O(N2) O(1) Merge sort ~ N log2 N O(N log N) O(N) Quick sort ~ N log2 N ~ N2/2 O(N log N) O(N2) O(log N) O(N) - See also http://www.sorting-algorithms.com for animations of sorting algorithms - Google 'sorting algorithm dances' for some interesting and informative YouTube clips!

Built-in Sorting Methods Java includes some sorting methods in its API: Collections.sort(Collection c) Arrays.sort(array) These use a sligthtly modified version of merge sort. They sort Comparable items.

Searching In an unsorted array, we would potentially have to check each element until we found the value we were looking for. This is clearly proportional to the size of the list: O(N) But, if the array is sorted, we can use a binary search. First, look at the middle element. If it s the one we re looking for, fine! If it s greater than the one we re looking for, repeat the process on the sublist containing the smaller elements. If it s less, repeat the process on the sublist containing the bigger elements. And so on until it s found.

Binary Search Code (1) public static int binarysearch(int[] array, int item) { int first = 0; int last = array.length - 1; int middle = 0; boolean found = false; //Loop until item found or end of list. while (first <= last &&!found) { //Find the middle item. middle = (first + last) /2;

Binary Search Code (2) } //Compare the middle item to the search item. if(array[middle] == item) found = true; else { // repeat on the appropriate part of the list if(array[middle] > item) last = middle -1; else first = middle + 1; } } if (found) return middle; else return -1;

Binary Search Efficiency As the solution space is being halved at each pass, this algorithm executes in logarithmic time: O(log N) BUT the array must be sorted first. Worth it? Remember the dictionary. We need to search far more often than we need to sort.