ENGI 4892: Data Structures Assignment 2 SOLUTIONS Due May 30 in class at 1:00 Total marks: 50 Notes: You will lose marks for giving only the final answer for any question on this assignment. Some steps of justification are required for all questions. Marks will be deducted for lack of neatness and for lack of clarity in your explanations. 1. [6] Determine the asymptotic complexity of the following functions. You must provide justification for each step. Valid justifications include: The definition of big-o The 6 big-o facts, including fact 2a (summation of a constant number of terms) and the observation that an order k polynomial is O(n k ). Ordering of the common complexity classes (e.g. if f(n) = O(n) then f(n) is also O(2 n )). You should provide answers in the smallest and simplest possible terms. (a) f(n) = 103(n 3 2) = 103n 3 206 = O(n 3 ) (Complexity of polynomial) (b) f(n) = log 2 n 2 + ln n + 21 = 2 log 2 n + ln n + 21 = O(lg n) + O(lg n) + 21 (Fact 6) = O(lg n) + O(lg n) + O(1) (Fact 5) = O(lg n) + O(1) (Fact 2) = O(lg n) + O(lg n) (O(1) is O(lg n)) = O(lg n) (Fact 2) (c) f(n) = 4n 2 + n lg n + n = 4n 2 + n lg n + n = O(n 2 ) + O(n lg n) + O(n) (Fact 5) = O(n 2 ) + O(n 2 ) + O(n 2 ) (n lg n and n are both O(n 2 )) = O(n 2 ) (Fact 2a) 1
2. [15] Prove the following statements. You must provide justification for each step. You may utilize the technique described on slide 10 of the first set of notes on complexity, or any other mathematically valid argument. Note that k is a constant. (a) n ik is O(n k+1 ) Since i n for this summation, we can say the following, We can simplify the RHS, Thus, n k = n k i k n n k 1 = n k n = n k+1 (b) i k n k+1 This satisfies the definition of big-o notation for constants c = N = 1. n k (n+1) 2k is O(n k ) We must find constants c and N such that, Divide both sides by n k n k (n + 1) 2k cnk for n N 1 (n + 1) 2k c Since (n + 1) 2k is monotonically increasing with n we know the LHS will always be less than 1. Thus, the definition of big-o is satisfied for all n by choosing c = 1 and N = 1. 2 2k 2 2k (c) 2 (lg n)0.3 is O(n 2 ) 2 (lg n)0.3 2 lg n = n Therefore 2 (lg n)0.3 is O(n). Since n is O(n 2 ) we can say that 2 (lg n)0.3 is also O(n 2 ) by Fact 1. (d) n! is O(n n ) n! = 1 2 3 n n n n n = n n In the expression above we have replaced all terms with n. This creates a function g(n) such that n! g(n). Since we have n such terms this function g(n) = n n. Thus, the definition of big-o is satisfied directly for c = 1 and N = 1. 2
(e) If f 1 (n) is O(g 1 (n)) and f 2 (n) is O(g 2 (n)) then f 1 (n)f 2 (n) is O(g 1 (n)g 2 (n)). To prove this statement we may assume the following (the antecedent): f 1 (n) c 1 g 1 (n) for n N 1 f 2 (n) c 2 g 2 (n) for n N 2 We can unify the conditions under which these two statements are true by choosing the largest value of N i for both. f 1 (n) c 1 g 1 (n) for n max(n 1, N 2 ) f 2 (n) c 2 g 2 (n) for n max(n 1, N 2 ) Since the above two inequalities hold for all the same values of n we can combine them. f 1 (n)f 2 (n) c 1 c 2 g 1 g 2 (n) for n max(n 1, N 2 ) This satisfies the definition of big-o allowing us to state that f 1 (n)f 2 (n) is O(g 1 (n)g 2 (n)). 3
3. [9] Determine the exact number of assignment statements executed for the following three code snippets as a function of n. Also, give the complexity of each in big-o notation. (a) for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[i][j] = b[i][j] + c[i][j]; assignments: 2n 2 + 2n + 1, asymptotic complexity: O(n 2 ) (b) for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = a[i][j] = 0; k < n; k++) a[i][j] += b[i][k] * c[k][j]; assignments: 2n 3 + 3n 2 + 2n + 1, asymptotic complexity: O(n 3 ) (c) for (i = 0; i < n - 1; i++) for (j = i+1; j < n; j++) { tmp = a[i][j]; a[i][j] = a[j][i]; a[j][i] = tmp; } assignments: 2n 2 1, asymptotic complexity: O(n 2 ) 4
4. [8] Determine the final value of count as a function of n for the following three code snippets. (a) [2] for (count = 0, i = 1; i <= n; i++) for (j = 1; j <= n; j++) count++; (b) [2] We need only count the number of times that count++ is executed. The outer loop executes n times. For each execution of the outer loop, the inner is also executed n times. Therefore, count++ is executed n 2 times. for (count = 0, i = 1; i <= n; i++) for (j = 1; j <= i; j++) count++; Here the outer loop executes n times, but the inner loop executes i times. We need to sum up the number of executions of the inner loop (which is also the number of times count++ is executed). n(n + 1) i = = n 2 /2 + n/2 2 (c) [4] To analyze the code below you may assume that n is a power of 2. Also, you may find it useful to utilize the formula for a geometric series: k=m for (count = 0, i = 1; i <= n; i *= 2) for (j = 1; j <= i; j++) count++; ar k = a(rm r n+1 ) 1 r The straightforward approach used in part (b) will not work here because i doubles for each execution of the outer loop. The progression of i is 1, 2, 4, 8,... n. We can also describe this progression as 2 0, 2 1, 2 2, 2 3,... 2 m = n. We introduce a quantity called k which represents the increasing exponent values in this progression (i.e. k follows the sequence 0, 1, 2,...m). The point of introducing k is to have a quantity that increases by one at each step. This will allow us to use the familiar summation identities. Just like in part (b) we need to sum the number of executions of the inner loop, which is i. In this case i = 2 k and we sum over all values of k. The only remaining question is the final value of k. This is defined as m. Since 2 m = n, we know that m = log 2 n. log 2 n k=0 This is a geometric series with a = 1 and r = 2. We utilize the given formula, simplify and obtain 2n 1. 2 k 5
5. [10] Recall the code snippet discussed in class for finding the length of the longest increasing sequence within an array. The following is a variation on this code which differs in several ways. Firstly, the variable i will be used to select different possibilities for the end of an increasing subsequence rather than the beginning (k is used to indicate the beginning). Also, both i and k decrease within their respective loops, as opposed to increasing. In particular, i will decrease so as to jump one past the beginning of the increasing subsequence just found. This makes this code more efficient than the version presented in class. 1. for (i = n-1, length = 1; i >= 0;) { 2. for (k = i; k >= 1 && a[k-1] < a[k]; k--) 3. ; 4. int curlength = i - k + 1; 5. if (length < curlength) 6. length = curlength; 7. i = k - 1; 8. } In answering the following questions you may assume that n > 1. (a) [4] Determine the exact number of assignment statements executed by this code as a function of n when the input array is sorted in increasing order. The first two initialization statements in line 1 will always be executed. We then enter the body of the outer loop where we have an additional assignment in the initialization of the inner for loop. The condition a[k-1] < a[k] will always be true. Hence, the inner loop will decrement k until k >= 1 is no longer true. This will require i decrement operations on the first (and only) execution of the outer loop when i = n 1. Considering only those assignment statements mentioned so far the total count would be 3 + n 1 = n + 2. We have two additional assignments in the body of the outer loop that will be executed (lines 4 and 7). The assignment statement on line 6 will also be executed since the if condition is satisfied on the first pass. Thus our total becomes as follows: n + 2 + 3 = n + 5 Since the inner loop has been completed, we know that k = 0. On line 7 i will be set to 1. Thus, the outer loop condition will not be satisfied and the program will terminate. Thus, the total number of assignments is n + 5. (b) [4] Determine the exact number of assignment statements executed by this code as a function of n when the input array is sorted in decreasing order (or if all entries are the same). In this case there are no increasing subsequences greater than length 1. Thus, the condition a[k-1] < a[k] will always be false. This means that the inner loop contributes only one assignment statement (the initialization) for every execution of the outer loop. Also, the condition on line 5 will never be satisfied since we will never find a subsequence with length greater than 1. Therefore we have two initial assignment statements on line 1 and then the body of the loop is entered. The loop will be entered n times and on each iteration there will be three assignment statements executed: 6
k=i int curlength = i - k + 1; i = k - 1; Hence the total is, 2 + 3n (c) [1] What is the worst-case value of the input array for large values of n. The number of assignment statements executed for a decreasing order of input array elements (or all elements being equal) is greatest. As shown above, this number 3n+2 is clearly greater than for an increasing order where the total is n + 5 for n 2. If there are some increasing subsequences (e.g. [10, 1, 2, 3, 2, 1]) then the performance will be better (i.e. fewer statements executed) than for a pure decreasing order since line 7 essentially allows us to skip over a subsequence after finding it. 6. [3] One algorithm able to solve the subset-sum problem works as follows. Each possible combination of entries from the input set (treated as a list) is denoted as a binary number. For example, for the input set {1, 2, 3, 4, 2} we would utilize 5-digit binary numbers and the number 01001 would indicate a combination involving the 2 and the 2 (upon finding such a combination the algorithm would stop and return a positive answer). For an input set of size n, give the asymptotic complexity of this algorithm and justify your answer. This algorithm will test all combinations of these numbers. We will assume that testing whether the sum of n numbers is 0 has a cost of O(n) (see note below). So the question then becomes how many combinations are there? Since there is a unique binary number for every combination, we simply have to count the number of binary numbers. With n bits we can specify 2 n numbers. Thus, the overall complexity of this algorithm is O(n2 n ). Note: In the analysis above we have not considered the precision of these numbers. A more careful analysis would introduce another variable giving the number of bits required to represent each number of the input set. 7