ITP21 - Foundations of IT 1 Analysis of Algorithms Analysis of algorithms Analysis of algorithms is concerned with quantifying the efficiency of algorithms. The analysis may consider a variety of situations: best case performance, average case performance, worst case performance The analysis may quantify different aspects of performance: time complexity (eg, no. of comparisons, as in previous lecture) space complexity (amount of storage required) The word complexity on its own usually refers to worst case time complexity. 2
ITP21 - Foundations of IT 2 Analysis of algorithms A measure of efficiency of an algorithms can be used: to compare different algorithms. Given different algorithmic solutions to a problem one may wish to adopt the most efficient with respect to a give kind of complexity, eg the most efficient in time or in space or in a trade-off of the two. to evaluate how good a proposed algorithmic solution is. In the cases where it is possible to determine the (lower) complexity of a problem (in the best/average/worst case), eg. "sorting can not be done in less than n log n", say, one can understand if the solution at hand is optimal or can be improved - and how much. 3 Analysis of algorithms What measure of efficiency? what we are interested in is an abstract measure capturing the efficiency of the algorithm with respect to the dimension of the problem, ie a function eg how many operations/memory do I need for searching a name in a 100, 1 000, 100 000,... entries phone book? a measure unit of time consumption/memory requirements is needed, straightforward for space in terms of memory occupancy, for time could be in terms of elementary operations, comparisons say. we want the measure to abstract from details due, for instance, to technological issues, like the use of a faster processor, and to provide a general/simple description of efficiency of the algorithm at hand. 4
ITP21 - Foundations of IT 3 Analysis of algorithms We will study the order of magnitude growth of the efficiency function as far as the dimension of the problem grows. The order of magnitude classifies growth functions in different classes of complexity, e.g. linear growth: the complexity grows as much as the dimension of the problem, linear search needs "about" n operations to search in a list of n items. polynomial growth: the complexity grows as a polynomial in the dimension of the problem, "about" n 2 operations are needed to sort n items by a specific sorting algorithm. We will also talk about asymptotic complexity, ie the complexity function up to the limit of the dimension of the problem. 5 Order of Magnitude - Order n Sequential Search If we have a list of 2 items, the worst case scenario is that we will take 2 comparisons to find the item we are looking for. The item is not, or is the last, in the list. Given a list [7,8] and we are looking for 8 comparison 1: compare 8 against 1st number in the list (7) comparison 2: compare 8 against 2nd number in the list (8) The number 8 is then found in 2 comparisons. Say we have a list of 3 items, 4 items, 5 items our worst case scenarios are.. 3 comparisons, 4 comparisons, 5 comparisons respectively 6
ITP21 - Foundations of IT 4 Order of Magnitude - Order n Sequential Search (continued) What this indicates is that as the length of the list grows, so does the amount of work involved in finding a value. The work grows at the same rate as the size of the list. Anything that varies as a k * n + w (with k, w constants, in our case 1 and 0 respectively) where n is the size of our list, is said to be of order of magnitude n, written (n) (Theta of n). ote: also an algorithm requiring 3 comparisons for each item would belong to the (n) class of complexity Sequential search is therefore a (n) algorithm (an order-n algorithm) in terms of complexity. This is called linear complexity. 7 Order of Magnitude - Order ln(n) Binary Search a slight aside: The number x of times a number n can be halved and not go below 1 is called the logarithm (ln) of n to the base 2, written ln(n) = x. If n is a power of 2, this means that 2 x = n. Eg: with n=16, we can do 4 such divisions: 16/2 = 8 8/2 = 4 4/2 =2 2/2 = 1 That is, ln(16) = 4, i.e. 16 = 2x2x2x2, i.e. 2 4 = 16. Suppose we are doing a binary search on n items. What s our worst case scenario? The item is not in the list or it is found when the list is reduced to a single item, i.e. it is the number of times the list of length n can be halved, i.e. ln(n). 8
f(n) ITP21 - Foundations of IT 5 Order of Magnitude - Order ln(n) Binary Search (continued) So given the following list lengths, we have the following worst cases of comparisons required: n number times list can be split 8 3 2 3 = 8 16 4 2 3 = 16 32 5 2 5 = 32 64 6 2 6 = 64 128 7 2 7 = 128 We see that number of times list can be split grows much more slowly than n. Since the number of times a list of length n can be split is equal to ln (n), then a binary search is said to have an order of magnitude (ln(n)), (also (log 2 (n)). This is called logarithmic complexity. 9 Linear vs. logarithmic complexity 128 1 4 0 1 2 0 n 1 0 0 8 0 6 0 4 0 7 20 ln(n) 0 0 50 100 150 n 128 10
ITP21 - Foundations of IT 6 Big theta ( ) notation What does all this mean? (n) encompasses a class of algorithms which will require a time/space that grows proportionally with the dimension of the problem, eg. 1 000 000 x n and n both have linear complexity, but they can behave quite differently (eg. in terms of memory requirements). However, as explained, we are interested here in a "general" measure of complexity, also called asymptotical complexity. (ln(n)) encompasses algorithms that require a time/space which "grows much slowly" with respect to the dimension of the problem, eg. from from 1 to 7 when the dimension passes from 2 to 128. This is an ideal condition. (n ln(n)) is another quite popular class. It contains, for instance, several sorting algorithms. These algorithms grow a bit more than (n), eg. when n = 128, n x ln(n) = 896. 11 ow we re all clued up on algorithms and complexity, let s go on to look at another algorithm, and its complexity.. 12
ITP21 - Foundations of IT 7 Insert sort The problem: given a list of items, sort it into ascending order. Constraint: we can swap items around within the list, but we cannot copy the list elsewhere: we must sort in place. The algorithm (insert sort): Divide the list into two portions the part that has been sorted the part that remains to be sorted Go through the unsorted portion: insert each item into its correct location within the sorted portion. This may require other items in the list to be shuffled around to make space for the item being inserted. 13 Insert sort in action Start of algorithm. Sorted portion of list will be marked in colour. 15 3 91 68 2 25 31 32 16 4 21 15 62 Assign 2 to, the starting position of the unordered portion. Select the th list entry as the pivot (next ) 14
ITP21 - Foundations of IT 8 Insert sort in action: inner loop 3 15 91 68 2 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? Yes! Move this value into the gap (next ) 15 Insert sort in action: inner loop 3 15 91 68 2 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? o. Move the pivot into the remaining gap. Add 1 to, then select the th value as the pivot (next ) 16
ITP21 - Foundations of IT 9 Insert sort in action: outer loop 91 3 15 68 2 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? o. Move the pivot into the remaining gap. Add 1 to, then select the th value as the pivot (next ) 17 Insert sort in action: outer loop 68 3 15 91 2 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? Yes. Move this value into the gap (next ) 18
ITP21 - Foundations of IT 10 Insert sort in action: inner loop 68 3 15 91 2 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? o. Move the pivot into the remaining gap. Add 1 to, then select the th value as the pivot (ext ) 19 Insert sort in action: outer loop 2 3 15 68 91 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? Yes. Move this value into the gap. (ext ) 20
ITP21 - Foundations of IT 11 Insert sort in action: inner loop 2 3 15 68 91 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? Yes. Move this value into the gap. (ext ) 21 Insert sort in action: inner loop 2 3 15 68 91 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? Yes. Move this value into the gap. (ext ) 22
ITP21 - Foundations of IT 12 Insert sort in action: inner loop 2 3 15 68 91 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? Yes. Move this value into the gap. (ext ) 23 Insert sort in action: inner loop 2 3 15 68 91 25 31 32 16 4 21 15 62 Is there a value greater than the pivot before the gap? o. Move the pivot into the remaining gap. Add 1 to, then select the th value as the pivot (ext ) 24
ITP21 - Foundations of IT 13 Insert sort in action: outer loop 25 2 3 15 68 91 31 32 16 4 21 15 62 25 Insert sort in action: inner loop And so on until 26
ITP21 - Foundations of IT 14 Insert sort in action: final outcome 2 3 4 15 15 16 21 25 31 32 62 68 91 27 Insert sort Algorithm: Insert_Sort (List) := 2 // marks the start of the unsorted portion while ( <= length of List) do ( ) (Pivot_Item := -th item in List) // is now a gap position! while ( There is an item I before the gap that is > Pivot_Item ) do ( Swap the item I one position ahead into the gap ) (Copy back Pivot_Item in the list into the gap) := +1 28
ITP21 - Foundations of IT 15 Complexity of insert sort Insert sort needs to do a lot of comparisons and shuffling around of list items (many items moved several times). There are two nested cycles: all the items have to be considered as pivot positions, for each Pivot position, it may need to shift ahead all the -1 preceding items Indeed the worst case is the inversely ordered - descending - list: every pivot has to go through all the so-far ordered portion of the list. The preceding items are 1 the first time, 2 the second one,... -1 the last one, ie. about n/2 on the average. So n times n/2 operations, i.e. about n 2 /2 times. Another interesting way of computing it is the young Gauss s sum up to n: 1 + 2 + 3 + + (n-1) + n = ((n+1) * n )/2 = n^2 / 2 + n/2 Insert sort belongs to (n 2 ), ie. a higher complexity than any of the other algorithms seen so far. 29 Complexity of insert sort The complexity of (n 2 ), ie. the time (number of operations) or space required grows as n 2 for a list of length n. With n=128, about 16384 operations are required. (n k ) (with k integer) is the polynomial complexity class. It encompasses algorithms with complexity (n 2 ). Often algorithms in this class can be useful in practice, although they may rapidly become too demanding (when k grows, check n 10 with n=128). ote: there is no point in comparing Insert sort with the algorithms we looked at earlier: it is solving a different problem (sorting, instead of searching.) 30
f(n) ITP21 - Foundations of IT 16 Polynomial vs. linear complexity 400 350 300 250 n 200 n^2 150 100 50 0 0 5 10 15 20 n ote the different scale used for the two axes! 31 The complexity of a problem Different algorithms for solving a problem may have different complexities. Is there a "target" complexity? Are there cases in which one cannot hope to find a better algorithm? The complexity of a PROBLEM is defined as a better (smaller) complexity, i.e. the minimal number of operations/the minimal amount of memory, that can be proved to be necessary for solving that problem. For instance, it is possible to prove that the problem of sorting in place has a complexity of (n ln(n)), i.e. the fastest sorting algorithm cannot have a complexity lower than (n ln(n)). ote: the complexity of a problem could be determined even without knowing any solving algorithm. 32
ITP21 - Foundations of IT 17 The complexity of a problem It follows that Insertion sort is not optimal and other sorting algorithm can perform better. For example, Merge sort is a faster algorithm with complexity (n ln(n)). https://en.wikipedia.org/wiki/merge_sort Aside: if we were allowed to copy parts of the list, we could get a faster sorting algorithm ( Postman s sort ). The complexity of the searching in a list problem is (ln(n)), hence Binary search is optimal in time complexity. 33 Big O notation The complexity for the problems of sorting and searching has been determined by researchers who have demonstrated the lowest possible complexity of any algorithm which would solve each of these problems. The (lowest possible) complexity of many problems is not known. For those problems a faster solution than the ones we know could in principle be invented. In fact, for most problems, the best we can say about their complexity is to give an upper bound. The upper bound is the complexity of the fastest known algorithm for solving the given problem. The notation used for this is O(f(n)). (where f(n) is some mathematical function). If a problem is in O(f(n)), that means that the complexity of the fastest known algorithm for solving that problem is in (f(n)). 34
ITP21 - Foundations of IT 18 Just to summarise what we ve seen so far. Searching: sequential search : complexity (n) binary search: complexity (ln(n)) Sorting: insert sort: complexity (n 2 ) notation used when we have exact knowledge of the complexity level Big O notation used when we don t know the exact complexity, but only an upper bound (e.g. the fastest known algorithm for a problem) Tractable problems: complexity low enough to be feasible (polynomial or better) Intractable problems: complexity too high to be feasible P problems: tractable, polynomial complexity or better P problems: intractable, no polynomial solutions so far. 35