Extended Introduction to Computer Science CS1001.py Lecture 7: Basic Algorithms: Sorting, Merge; Time Complexity and the O( ) notation

Extended Introduction to Computer Science CS1001.py Lecture 7: Basic Algorithms: Sorting, Merge; Time Complexity and the O( ) notation Instructors: Daniel Deutch, Amir Rubinstein Teaching Assistants: Michal Kleinbort, Amir Gilad School of Computer Science Tel-Aviv University Winter Semester 2017-8 http://tau-cs1001-py.wikidot.com

Lecture 6: Highlights Integer exponentiation - Naïve method vs. (fast) iterated squaring Basic algorithms: - Sequential vs. Binary search 2

Lecture 7: Plan Basic algorithms (cont.): Sorting using selection sort Merging sorted lists Complexity of algorithms The O( ) notation a formal definition for complexity Worst / best case analysis Tractable and intractable algorithms 3

20 (reminder)

Binary Search the code (reminder) def binary_search(lst, key): """ iterative binary search. lst must be sorted """ n= len(lst) left = 0 right = n-1 while left <= right : middle = (right + left)//2 # middle rounded down if key == lst[middle]: # item found return middle elif key < lst[middle]: # item not in top half right = middle-1 else: # item not in bottom half left = middle+1 #print(key, "not found") return None 14

import time Binary Search - Real Time Measurements repeat = 20 #repeat execution several times, for more significant results for n in [10**6, 2*10**6, 4*10**6]: print("n=", n) L = [i for i in range(n)] key = -1 # why? t0 = time.clock() for i in range(repeat): res = sequential_search(l, key) t1 = time.clock() print("sequential search:", t1-t0) t0 = time.clock() for i in range(repeat): res = binary_search(l, key) t1 = time.clock() print("binary search:", t1-t0) 17

Binary Search - Real Time Measurements n= 1000000 sequential search: 2.9088399801573757 binary search: 0.0005532383071873426 n= 2000000 sequential search: 5.7504573815927795 binary search: 0.0005583582503900786 n= 4000000 sequential search: 11.536035866908783 binary search: 0.0005953356179659863 How would the results change if we searched an element that does exist in the list? Does it depend on where in the list the element is found? 18

Logarithmic vs. Linear Time Algorithms n log(n) Created using https://graphsketch.com/ Log: input x2 time + constant Linear: input x2 time x2 (approximately) 19

11 Sorting

The computational problem: The Sorting Problem Input: a set of elements Output: a sequence of the same elements, ordered by size Note that a computational problem is described in abstract terms, and is merely a desired relation between legal inputs and their outputs Technically, we will represent a sequence as a python list Possible algorithms? We will see at least 3 in this course, one today 12 These solutions employ different strategies, which has consequences in terms of efficiency, as we will see

The (abstract) algorithm: Selection sort Selection-Sort(lst of size n) for i=0 to n-1: find the minimum of lst[i:] swap it with the element at index i Implementation in code: def selection_sort(lst): ''' sort lst (in-place) ''' n = len(lst) for i in range(n): m_index = i #index of minimum for j in range(i+1,n): if lst[m_index] > lst[j]: m_index = j swap(lst, i, m_index) return None #no need to return lst?? def swap(lst, i, j): tmp = lst[i] lst[i] = lst[j] lst[j] = tmp 13

Selection Sort - Efficiency We will analyze in class the total number of iterations as a function of the list size, n. Then we will measure actual running time and see if it fits the analysis. def selection_sort(lst): ''' sort lst (in-place) ''' n = len(lst) for i in range(n): m_index = i #index of minimum for j in range(i+1,n): if lst[m_index] > lst[j]: m_index = j swap(lst, i, m_index) return None #no need to return lst?? 14

Selection Sort - Analysis As a measure for efficiency, we will look at the number of iterations. An underlying assumption: the time needed for all the (basic) operations in each iteration is bounded by some constant. Therefore each iteration takes a constant amount of time What is considered a basic operation? This is context dependent (discussion in class). Note that this assumption does not always hold (recall integer exponentiation as an example) So, how many iterations are needed, as a function of the input size? Input size in this case is the list s length, denoted n Does the result depend on the content of the list, or on its length only? Answers: in class and on board 15

Selection Sort (another version) Another, more Pythonic implementation: def selection_sort2(lst): n = len(lst) for i in range(n): m = min(lst[i:n]) m_index = lst.index(m) #find the index of the minimum lst[i], lst[m_index] = lst[m_index], lst[i] return None Is this version expected to be more efficient? 16

Selection Sort Actual Running Time import time import random for n in [1000,2000,4000]: lst = list(range(n)) # [0,1,2,,n-1] random.shuffle(lst) # balagan Output: t0 = time.clock() # stopper go! selection_sort(lst) t1= time.clock() # stopper end print("n=", n, t1-t0) n= 1000 0.06043896640494811 n= 2000 0.27381915858021066 n= 4000 1.0055912134084082 17 How does running time seem to change with input size?

Selection Sort - Efficiency n 2 n log(n) Quadratic: input x2 time x2 2 (approximately) 18

19 Merge

The computational problem: Merge Input: two sorted sequences of elements Output: one sorted sequence containing all elements in both sequences Possible algorithms? 20

Merge - possible algorithms Simply concatenate both lists and sort them all def merge_by_sort(a,b): """ merging two lists """ C = A + B selection_sort(c) return C However, this solution does not take advantage of the input lists being sorted already. 3 running indices, for input lists (A, B) and the output (C). At each iteration, select the minimal element from A or B and copy it to C. What happens when one of the lists is completed? 21

Example A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 22

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 23

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 24

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 25

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 6 26

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 6 7 27

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 6 7 12 28

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 6 7 12 15 29

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 6 7 12 15 16 30

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 6 7 12 15 16 20 31

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22 C 1 2 5 6 7 12 15 16 20 21 32

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22? C 1 2 5 6 7 12 15 16 20 21 22 33

A 2 5 7 12 15 20 23 25 B 1 6 16 21 22? C 1 2 5 6 7 12 15 16 20 21 22 23 34

A 2 5 7 12 15 20 23 25? B 1 6 16 21 22? C 1 2 5 6 7 12 15 16 20 21 22 23 25 35

def merge(a, B): ''' Merge list A of size n and list B of size m A and B must be sorted! ''' n = len(a) m = len(b) C = Code a=0; b=0; c=0 while a<n and b<m: #more element in both A and B if A[a] < B[b]: C[c] a = a+1 else: C[c] = B[b] c = c+1 if else: :#A was completed while : C[c] = B[b] b = b+1 c = c+1 #B was completed : return 36

def merge(a, B): ''' Merge list A of size n and list B of size m A and B must be sorted! ''' n = len(a) m = len(b) C = [0 for i in range(n+m)] Code a=0; b=0; c=0 while a<n and b<m: #more element in both A and B if A[a] < B[b]: C[c] = A[a] a+=1 else: C[c] = B[b] b+=1 c+=1 if a==n: #A was completed while b<m: C[c] = B[b] b+=1 c+=1 else: #B was completed while a<n: C[c] = A[a] a+=1 c+=1 return C C[c:] = A[a:]+B[b:] #append remaining elements one of the lists A or B is non-empty 37

Merge - analysis Again, we will look at the number of iterations. So, how many iterations are needed, as a function of the input size? Denote: A =n, B =m Does the answer depend on the content of the lists, or on their length only? Compare to merge_by_sort we saw earlier: def merge_by_sort(a,b): """ merging two lists """ C = A + B selection_sort(c) return C 38

Merge Actual Running Times for merge_func in [merge_by_sort, merge]: print(merge_func. name ) for n in [1000,2000,4000]: lst1 = [random.choice(range(10000)) for i in range(n)] lst1 = sorted(lst1) lst2 = [random.choice(range(10000)) for i in range(n)] lst2 = sorted(lst2) t0 = time.clock() #Stopper go! merge_func(lst1,lst2) t1 = time.clock() #Stopper end print("n=", n, t1-t0) Note: we chose n=m for simplicity. merge_by_sort n= 1000 0.23752907143226304 n= 2000 1.019045996069333 n= 4000 3.790595452774026 merge n= 1000 0.0009043048767365391 n= 2000 0.0017817907024495483 n= 4000 0.0037136004715678794 Consistent with the theoretical analysis: - merge_by_sort is quadratic - merge is linear 39

Merge_by_python_sort merge_by_sort n= 1000 0.23752907143226304 n= 2000 1.019045996069333 n= 4000 3.790595452774026 merge_by_python_sort n= 1000 0.00017488256188968876 n= 2000 0.00042435560944209527 n= 4000 0.0007657397797764531 merge n= 1000 0.0009043048767365391 n= 2000 0.0017817907024495483 n= 4000 0.0037136004715678794 def merge_by_python_sort(a,b): return sorted(a+b) Python s sort function So, python s sorted does a farely nice job! However we will not get into the interiors of sorted (at least not now). 40

41 Crash Intro to Complexity

Time Complexity: A Crash Intro A problem is a general computational question: description of parameters (input) description of solution (output) An algorithm is a step-by-step procedure, a recipe can be represented e.g. as a computer program (but also in natural languages, diagrams, animations, etc.) an abstract notion Efficient algorithms are usually preferred fastest time complexity most economical in terms of memory space complexity Time complexity analysis: measured in terms of operations, not actual timings We want to say something about the algorithm, not a specific machine/execution/programming language implementation expressed as a function of the problem size We will be interested in how the number of operations changes with input size 42

Growth Rate We will be interested in how the number of operations changes with input size. In most cases, we will not care about the exact function, but in its order, or growth rate (e.g., logarithmic, linear, quadratic, etc.) Sometimes we will only be interested/able to give an upper bound for this growth rate. We will, however, strive to make this upper bound as tight (=low) as we can. In this course, we will almost always be able to give tight upper bounds. We need some formal definition for growth rate upper bound. 43

Big O Notation We say that a function f(n) is O(g(n)) if there is a constant c such that for large enough n, f(n) c g(n) We denote this as f(n) = O(g(n)) In our context, f(n) will usually denote the number of operation an algorithm performs on an input of size n (a number with n bits, a list with n elements, etc.). sometimes f(n) will denote the number of memory cells required by the algorithm 44

Big O Notation Visualized f n = O(g n ) c g(n) f(n) 45

Big O Notation - Examples Examples: 3n + 7 = O n 3n + 7 = O(n 2 ) * 3n + 7 O( n) 5n log 2 n = O(n log n) 6log 2 n = O(n) * 2log 2 n + 12 = O(n) * 1000 n log 2 n = O(n 2 ) * 3 n 2 n [where did the log base disappear?] 2 n/100 O(n 100 ) 46 * not the tightest possible bound

The Asymptotic nature of Big O Consider the two functions f(n) = 10nlog 2 n + 1, and g(n) = n 2 (2 + sin (n)/3) + 2. It is not hard to verify that f(n) = O(g n ). Yet, for small values of n, f(n) > g(n), as can be seen in the following plot: f(n) g(n) 47

The Asymptotic nature of Big O (CONT.) But for large enough n, indeed f(n) < g(n), as can be seen in the next plot: g(n) f(n) Also, remember that for big O, f(n) may be larger than g(n), as long as there is a constant c such that f(n) < c g(n). 48

Complexity Hierarchy Polynomial constant logarithmic Poly-logarithmic linear O(1) O(logn) O(log 2 n) O(n) Unless asked to prove formally, You can use this hierarchical orderings as facts. O(nlogn) quadratic O(n 2 ) exponential O(2 n ) O(3 n ) We ll meet this guy later in the course 49

O(1) Do you understand the meaning of this? 50

Worst / Best Case Complexity In many cases, for the same size of input, the content of the input itself affects the complexity. We may separate between worst case and best case complexity. Examples we have seen? Examples in which this is not the case? - binary search - selection sort T wwwww n = max {tttt I : I = n} T bbbb n = min {tttt I : I = n} Note that this statement is completely nonsense: "The best time complexity is when n is very small " 51

Average Complexity Often the average complexity is more informative (e.g. when the worst case is rather rare). However analyzing it is usually more complicated, and requires some knowledge on the distribution of input probability. Assuming distribution is uniform: T aaaaaaa n = tttt I : I = n) I: I = n Examples from our course you will encounter soon: - Quicksort runs on average in O(nlogn) - Hash table chains are of average length O(n/m) 52

53 Some Previous Results All these results refer to worst case scenarios. Algorithms on integers: Addition of two n-bit integers takes O(n) iterations Multiplication of two n-bit integers takes O(n 2 ) iterations Naïve integer exponentiation a b where b = n bits takes O(2 n ) multiplications (the number of iterations depends on the size of the multiplied numbers) Iterated squaring for a b where b = n bits takes O(n) multiplications (the number of iterations depends on the size of the multiplied numbers) Algorithms on lists: Binary search on a sorted list of length n takes O(llll) iterations Selection Sort on a list of length n takes O(n 2 ) iterations Merging of 2 sorted lists of sizes n and m takes O(n + m) iterations Algorithms on strings: Palindrome checking on a string of length n takes O(n) iterations

54 Input Size - Clarifications We measure running time (or computational complexity) as a function of the input size. For integers, input size is the number of bits in the representation of the number in the computer. we normally count the number of "simple" bit operations (such as adding or multiplying two bits). For lists/strings/dictionaries/other collections, the input size is typically the number of elements in the collection. we normally count the number of "simple" list element operations (such as comparisons, assignments), and often ignore the size of each element. When the number of relevant operations in each iteration in bound by some constant, we can count iterations instead.

Tractability - Basic Distinction: How would execution time for a very fast, modern processor (10 10 ops per second, say) vary for a task with the following time complexities and n = input sizes? 10 20 30 40 50 60 1.0E-09 2.0E-09 3.0E-09 4.0E-09 5.0E-09 6.0E-09 n seconds seconds seconds seconds seconds seconds n 2 1.0E-08 4.0E-08 9.0E-08 1.6E-07 2.5E-07 3.6E-07 seconds seconds seconds seconds seconds seconds n 3 1.0E-07 8.0E-07 2.7E-06 6.4E-06 1.3E-05 2.2E-05 seconds seconds seconds seconds seconds seconds n 5 1.0E-05 0.00032 0.00243 0.01024 0.03125 0.07776 seconds seconds seconds seconds seconds seconds 2 n 1.02E-07 1.05E-04 0.107 1.833 1.303 0.64 seconds seconds seconds minutes days years 3 n 5.9E-06 0.35 5.72 38.55 22764 1.34E+09 seconds seconds hours years centuries centuries Modified from Garey and Johnson's classical book 55 Polynomial time = tractable. Exponential time = intractable.

Time Complexity - What is tractable in Practice? A polynomial-time algorithm is good. n 100 is polynomial, hence good. An exponential-time algorithm is bad. 2 n/100 is exponential, hence bad. Yet for input of size n = 4000, the n 100 time algorithm takes more than 10 35 centuries on the above mentioned machine, while the 2 n/100 algorithm runs in just under two minutes. 56

Time Complexity - Advice Trust, but check! Don't just mumble "polynomial-time algorithms are good", "exponential-time algorithms are bad" because the lecturer told you so. Asymptotic run time and the O notation are important, and in most cases help clarify and simplify the analysis. But when faced with a concrete task on a specific problem size, you may be far away from "the asymptotic". In addition, constants hidden in the O notation may have unexpected impact on actual running time. 57

Time Complexity Advice (cont.) We will employ both asymptotic analysis and direct measurements of the actual running time. For direct measurements, we will use either the time package and the time.clock() function. Or the timeit package and the timeit.timeit() function. Both have some deficiencies, yet are highly useful for our needs. 58