Searching Algorithms/Time Analysis CSE21 Fall 2017, Day 8 Oct 16, 2017 https://sites.google.com/a/eng.ucsd.edu/cse21-fall-2017-miles-jones/
(MinSort) loop invariant induction Loop invariant: After the t th iteration of the loop, the first t elements are: 1. The smallest t elements of the original list. 2. In sorted order. Inductive Hypothesis: Suppose that for some t 0, the loop invariant is true after t iterations. Show that it is true after t + 1 iterations:
(MinSort) loop invariant induction Loop invariant: After the t th iteration of the loop, the first t elements are: 1. The smallest t elements of the original list. 2. In sorted order. Inductive Hypothesis: Suppose that for some t 0, the loop invariant is true after t iterations. Show that it is true after t + 1 iterations: During the (t + 1)st iteration, the algorithm sets a m to be the minimum value of a t+1,, a n. Swap a m and a t+1. So, after the (t + 1)st iteration, a t+1 is the minimum value of a t+1,, a n.
(MinSort) loop invariant induction Show that it is true after t + 1 iterations: During the (t + 1)st iteration, the algorithm sets a m to be the minimum value of a t+1,, a n. Swap a m and a t+1.so, after the (t + 1)st iteration, a t+1 is the minimum value of a t+1,, a n. By the inductive hypothesis, after t iterations, a 1,, a t are: 1. The smallest t elements of the original list. 2. In sorted order. So, we have that a 1,, a t are all less than a t+1 and a t+1 is less than all of a t+1,, a n : 1. a 1,, a t+1 are the smallest t + 1 elements of the original list. Since a 1,, a t are in sorted order and a 1,, a t are all less than a t+1 : 2. a 1,, a t+1 are in sorted order. Therefore: for any 0 t n, the loop invariant is true after t iterations.
(MinSort) loop invariant conclusion Therefore: for any 0 t n, the loop invariant is true after t iterations. In particular: the loop invariant is true after n iterations and so after the algorithm terminates, the first n elements are: 1. The smallest n elements 2. In sorted order. This is exactly the algorithm specification!!!!
2 Proving Loop Invariants Induction variable (k): the number of times through the loop. Base case: Need to show the statement holds for k=0, before the loop Inductive step: Let k be a positive integer. Need help? See website for extra practice problems on Induction hypothesis: Suppose the statement invariants and holds induction. after k-1 times through the loop. Need to show that the statement holds after k times through the loop.
Why sort? A TA facing a stack of exams needs to input all 400 scores into a spreadsheet where the students are listed in alphabetical order. OR You want to find all the duplicate values in a long list.
Sorting helps with searching Two searching algorithms: One that works for any data, sorted or not One that is much faster, but relies on the data being sorted
The search problem Given an array A[1], A[2], A[3],..., A[n] and a target value x, find an index j where x = A[j] or determine that no such index exists because x is not in the array; Return j=0 if no index exists
Example Suppose our array is A[1]=2, A[2]=5, A[3]=0, A[4]=1. If x = 1, what j would the search problem find? A) j=2 B) j=4 C) j=0 D) no such j exists
Searching an unsorted array If you needed to find a DVD on this shelf, where the DVDs are in no particular order, how would you proceed?
Linear Search Search through the array one index at a time, asking at each position. ''Is this my target value?'' If you answer yes, return the position where you found it.
Searching: Specification WHAT Rosen page 194 Given a list a 1, a 2,..., a n and a target value x find an index j where x=a j or determine that there is no such index. Values can be any type. For simplicity, use integers.
HOW reverse engineering Pseudocode: procedure mystery (x: integer, a 1, a 2,..., a n : distinct integers ) i := 1 while (i <= n and x a i ) i := i+1 if i <=n then location := i else location := 0 return location { location is the subscript of the term that equals x, or is 0 if x is not found } In groups, describe in English (high-level), what the algorithm is doing.
Correctness of Linear Search: WHY What's the loop invariant? Try to write it down yourself, first.
Proof of correctness Why is this algorithm correct? Easy case: if the algorithm returns some i in the range 1 through n, it must be because x= A[i] Remaining case:?
Invariant for main loop Invariant: If after the loop, i=k, then????
Invariant for main loop Invariant: If after the loop, i=k, then x is not among A[1],,...A[k-1] Base case: k=1; x is not among the empty set Induction step. Assume i=k+1. Then in the previous iteration, i was k. Thus, x is not among the first k-1 elements of the array. In the current iteration, we compared x to A[k]. Since we incremented i, x was also not A[k].
If Linear Search returns 0 Linear search only returns 0 if i = n+1 Then the invariant says x is not in the first n positions of the array But this means x is not in the array, so 0 is the correct answer
How fast is linear search? Suppose you have an unsorted array and you are searching for a target value x that you know is in the array somewhere. What position for x will cause linear search to take the longest amount of time? A) x is the first element of the array B) x is the last element of the array C) x is somewhere in the middle of the array
How fast is linear search? Suppose you have an unsorted array and you are searching for a target value x that you know is in the array somewhere. What position for x will cause linear search to take the longest amount of time? A) x is the first element of the array B) x is the last element of the array C) x is somewhere in the middle of the array
How fast is linear search? Say our array A has n elements and we are searching for x. The time it takes to find x (or determine it is not present) depends on the number of probes, that is the number of entries we have to retrieve and compare to x.
How many probes? Worst-case scenario: Best-case scenario: On average:
How many probes? Worst-case scenario: The target is the last element in the array or not present at all. n probes Best-case scenario: The target is the first element in the array. 1 probe On average: We'd expect to have to search about half of the array until we find the target. about n/2 probes
Searching a sorted array How would you search through a pile of alphabetized papers to find the one with your name on it?
Using order in search Suppose our list is sorted in increasing order. If we probe the list at position m, what do we learn in each of these three cases? x = a m x < a m A.if x is in the list, it occurs BEFORE position m B.if x is in the list, it occurs AFTER position m C. x occurs at position m D. x is not in the list x > a m
How does having a sorted array help us? Suppose we probe A at position i. What do we learn? If x=a[m], we are done. If x<a[m], then if x is in the array, it is in position p for p<m. m-1 positions remain to be checked If x>a[m], then if x is in the array, it is in position p for p>m. n-m positions remain to be checked
How do we choose where to probe? If our array has 100 elements and we probe at position 5, it is most likely that the target is in the second half, so we'll have to search 95 entries in the worst case. How do we make the worst case as favorable as possible?
How do we choose where to probe? If our array has 100 elements and we probe at position 5, it is most likely that the target is in the second half, so we'll have to search 95 entries in the worst case. How do we make the worst case as favorable as possible? Probe in the middle. If we are searching the subarray A[i],, A[j], probe at index i + j 2
Binary Search: The Approach Probe in the middle of the array. Based on what you find there, determine which half to search in next. Continue until the target is found or you can be sure the target is not in the array. (When can you be sure the target is not in the array?) Return the location of the target, or 0 if it's not in the array
Binary Search: HOW Starting at the middle of the list and based on what you find, determine which half to search next. Continue until target is found or we are sure that it isn t there. procedure binary search (x: integer, a 1, a 2,..., a n : increasing integers ) { location is the subscript of the term that equals x, or is 0 if x is not found }
Binary Search: HOW Starting at the middle of the list and based on what you find, determine which half to search next. Continue until target is found or we are sure that it isn t there. procedure binary search (x: integer, a 1, a 2,..., a n : increasing integers ) i := 1 j := n Location:=0 while i<j A. i<n B. j<n C. i<j D. j<i E.None of the above m := floor( (i+j)/2 ) if x = a m then location:= m if x > a m then i:=m+1 if x < a m then j:=m-1 return location { location is the subscript of the term that equals x, or is 0 if x is not found }
Binary Search: WHY procedure binary search (x: integer, a 1, a 2,..., a n : increasing integers ) i := 1 j := n Location:=0 while i<j m := floor( (i+j)/2 ) if x = a m then location:= m if x > a m then i:=m+1 if x < a m then j:=m-1 return location
Binary Search is Telling the Truth? (correctness) We must show two things: 1) If binary search returns some nonzero position p, then a p = x. 2) If binary search returns 0, then there is no position p for which a p = x.
1) If it returns nonzero p, then a p = x. procedure binary search (x: integer, a 1, a 2,..., a n : increasing integers ) i := 1 j := n Location:=0 while i<j m := floor( (i+j)/2 ) if x = a m then location:= m if x > a m then i:=m+1 if x < a m then j:=m-1 return location
What is the contrapositive of (2)? 2) If binary search returns 0, then there is no position p for which a p =x.
What is the contrapositive? 2) If binary search returns 0, then there is no position p for which a p =x. Contrapositive: If there is a position p for which a p =x, then binary search doesn't return 0.
Loop Invariant Want to show: If there is a position p for which a p =x, then binary search doesn't return 0. Loop invariant: Suppose there is a position p for which a p =x, then: i p j
Invariant: procedure binary search (x: integer, a 1, a 2,..., a n : increasing integers ) i := 1 j := n Location:=0 while i<j m := floor( (i+j)/2 ) if x = a m then location:= m if x > a m then i:=m+1 if x < a m then j:=m-1 return location
Invariant: Proof
Binary Search Terminates With each probe, you reduce the size of the subarray you are searching to at most half of its previous size. Number of probes Max size of subarray 1 n/2 2 n/4 3 n/8 4 n/16......
Binary Search Terminates With each probe, you reduce the size of the subarray you are searching to at most half of its previous size. Number of probes Max size of subarray 1 n/2 2 n/4 3 n/8 4 n/16...... w
Binary Search Terminates How many probes w do we need to make sure our search has terminated? That is, what value of w is sure to make the subarray size less than 1? n < 1 2 w n < 2 w log n < w The smallest integer w that is sure to be greater than log(n) is w = log n + 1
Comparison with Linear Search Therefore, we have shown that it takes at most log n + 1 probes for binary search to terminate. This is the worst case. By comparison, linear search takes as many as n probes.even in the average case, linear search takes about n/2 probes.
The cost of binary search While binary search is faster than linear search, it depends upon your array being sorted. What if you have an unsorted array you need to search? Should you sort it first so you can use the better search?
If you need to search the same array many times, it becomes more worthwhile to invest the initial time in sorting your array. The cost of binary search There's a tradeoff between the cost of sorting an array and the benefit of being able to search faster.
General questions to ask about algorithms 1) What problem are we solving? SPECIFICATION 2) How do we solve the problem? ALGORITHM DESCRIPTION 3) Why do these steps solve the problem? CORRECTNESS 4) When do we get an answer? RUNNING TIME PERFORMANCE
Counting comparisons: WHEN Measure Time Number of operations Comparisons of list elements! For selection sort (MinSort), how many times do we have to compare the values of some pair of list elements?
Selection Sort (MinSort) Pseudocode Rosen page 203, exercises 41-42 procedure selection sort(a 1, a 2,..., a n : real numbers with n >=2 ) for i := 1 to n-1 m := i for j:= i+1 to n if ( a j < a m ) then m := j interchange a i and a m { a 1,..., a n is in increasing order} How many times do we compare pairs of elements? A. n B. log n C. n! n 2 D. E. n 2
Selection Sort (MinSort) Pseudocode Rosen page 203, exercises 41-42 procedure selection sort(a 1, a 2,..., a n : real numbers with n >=2 ) for i := 1 to n-1 m := i for j:= i+1 to n if ( a j < a m ) then m := j interchange a i and a m { a 1,..., a n is in increasing order} In the body of the outer loop, when i=1, how many times do we compare pairs of elements? A. 1 B. n C. n-1 D. n+1 E. None of the above.
Selection Sort (MinSort) Pseudocode Rosen page 203, exercises 41-42 procedure selection sort(a 1, a 2,..., a n : real numbers with n >=2 ) for i := 1 to n-1 m := i for j:= i+1 to n if ( a j < a m ) then m := j interchange a i and a m { a 1,..., a n is in increasing order} i = 1: n-1 comparisons i = 2: n-2 comparisons... i = n-1: 1 comparison Total number of comparisons: 1+2+...+(n-1)
Sum of consecutive integers X = 1 + 2 + 3 + + (n-3) + (n-2) + (n-1) X = (n-1) + (n-2) + (n-3) + + 3 + 2 + 1 2X = n + n + n + + n + n + n 2X = n(n-1) X = n(n-1)/2
Counting operations When do we get an answer? RUNNING TIME PERFORMANCE Counting number of times list elements are compared
Runtime performance Algorithm: problem solving strategy as a sequence of steps Examples of steps - Comparing list elements (which is larger?) - Accessing a position in a list (probe for value) - Arithmetic operation (+, -, *, ) - etc. "Single step" depends on context
Runtime performance How long does a single step take? Some factors - Hardware - Software Let s discuss and list factors that could impact how long a single step takes
Runtime performance How long does a single step take? Some factors - Hardware (CPU, climate (!), cache ) - Software (programming language, compiler)
Runtime performance The time our program takes will depend on Input size Number of steps the algorithm requires Time for each of these steps on our system
Runtime performance Ignore what we can't control Goal: Estimate time as a function of the size of the input, n Focus on how time scales for large inputs
Ignore what we can't control Rate of growth Focus on how time scales for large inputs
Selection Sort (MinSort) Performance (Again) Rosen page 210, example 5 Number of comparisons of list elements (n-1) + (n-2) + + (1) = n(n-1)/2 Sum of positive integers up to (n-1) Rewrite this expression using big-o notation: A. O(n) B. O(n(n-1)) C. O(n 2 ) D. O(1/2) E. None of the above
Exercise: Why is 1+2+3+4+ n = n 2?????
Linear Search: HOW Starting at the beginning of the list, compare items one by one with x until find it or reach the end procedure linear search (x: integer, a 1, a 2,..., a n : distinct integers ) i := 1 while (i <= n and x a i ) i := i+1 if i <=n then location := i else location := 0 return location { location is the subscript of the term that equals x, or is 0 if x is not found }
How fast is Linear Search: WHEN Rosen page 220, part of example 2 The time it takes to find x (or determine it is not present) depends on the number of probes, that is the number of list entries we have to retrieve and compare to x How many probes do we make when doing Linear Search on a list of size n if x happens to equal the first element in the list? if x happens to equal the last element in the list? if x happens to equal an element somewhere in the middle of the list? if x doesn't equal any element in the list?
How fast is Linear Search: WHEN Best case: 1 probe target appears first Worst case: n probes target appears last or not at all Average case: n/2 probes target appears in the middle, Running time (expect to have to search about depends on more than size array... of more the input! on half of the expected value later in the Rosen p. 220
Binary Search: WHEN procedure binary search (x: integer, a 1, a 2,..., a n : increasing integers ) i := 1 j := n while i<j m := floor( (i+j)/2 ) if x > a m then i := m+1 else j := m if x=a i else location := 0 return location then location := i Each time divide size of "list in play" by 2 { location is the subscript of the term that equals x, or is 0 if x is not found }
Rosen page 220, example 3 How fast is Binary Search: WHEN Number of comparisons (probes) depends on number of iterations of loop, hence size of set. After iterations of loop (Max) size of list "in play" 0 n 1 n/2 2 n/4 3 n/8?? 1
Rosen page 220, example 3 How fast is Binary Search: WHEN Number of comparisons (probes) depends on number of iterations of loop, hence size of set. After iterations of loop (Max) size of list "in play" 0 n 1 n/2 2 n/4 3 n/8 ceil(log 2 n) 1 Rewrite this formula in order notation: A. B. C. D. E. None of the above
Comparing linear search and binary search Rosen pages 220-221 Linear Search Binary Search Assumptions None Sorted list # probes in * best case or * worst case * average case?? Best case analysis depends on whether we check if midpoint agrees with target right away or wait until list size gets to 1 (as in Rosen)
Comparing linear search and binary search Rosen pages 220-221 Linear Search Binary Search Assumptions None Sorted list # probes in * best case or * worst case * average case??
Comparing linear search and binary search Rosen pages 220-221 Linear Search Binary Search Assumptions None Sorted list # probes in * best case or * worst case * average case?? Is it worth it to sort our list first?