CS10b Data Structures and Algorithms Due: Thursday, January 0th Assignment 1 (concept): Solutions Note, throughout Exercises 1 to 4, n denotes the input size of a problem. 1. (10%) Rank the following functions by asymptotic growth rate in non-decreasing order: ( 3 )n, 64 1, n 3, 0.0001n, 10000n, logn, logn, nlogn, n n, 1000, n, n logn, n, logn, n 100, 4 n, logn 3, n n, n 3 logn Answer: 64 1, 1000, logn, logn, logn 3, n, logn, 10000n, nlogn, 0.0001n, n logn, n 3, n 3 logn, n 100, ( 3 )n, n, n n, 4 n, n n. Remark: In the above ranking, if a function f(n) preceeds another function g(n) then f(n) O(g(n)) holds. Consequently, there are several valid answers. Indeed, several of the above ranked functions f(n), g(n) satisfy f(n) Θ(g(n)), which implies that both f(n) O(g(n)) and g(n) O(f(n)) hold. Note: (1) logn = logn, logn 3 = 3logn () Grouping these functions into the following classes can help to clarify their orders. (a) Constant: 64 1, 1000 (b) Logarithmic: logn, logn, logn 3 (c) Linear: n, logn, 10000n (d) nlogn (e) Polynomial: 0.0001n, n 3, n 100 (f) n logn (g) n 3 logn (h) Exponential: ( 3 )n, n, n n, 4 n (i) n n Observe that the functions in the class (e) are not equivalent for the f(n) Θ(g(n)) relation. The same observation is true for the class (h).. (15%) Use the definition of Big-Oh to prove that 0.001n 3 1000n logn 100n+5 is O(n 3 ). To show that 0.001n 3 1000n logn 100n+5 is O(n 3 ), we must find constants c > 0 and n 0 1 such that 0.001n 3 1000n logn 100n+5 cn 3, n n 0. (1) 1
Inequality (1) is equivalent to the inequality below. cn 3 0.001n 3 +1000n logn+100n 5 0 () Inequality () must be true if the following inequality holds (where 1000n logn and 100n are dropped since they are positive, and 5 is multiplied by n 3 ). cn 3 0.001n 3 5n 3 0 (3) Inequality (3) is equivalent to (c 0.001 5)n 3 0, which is true n n 0 when c = 5.001 and n 0 = 1. Now, we have found c = 5.001 and n 0 = 1 such that 0.001n 3 1000n logn 100n+5 cn 3 for all n n 0. Therefore, we have proved that 0.001n 3 1000n logn 100n+5 is O(n 3 ). Remark: The pair of c and n 0 is not unique. For instance, when c = 1 and n 0 =, Inequality (1) always holds for n n 0. 3. (15%) Use the definition of Big-Oh to prove that n 1+0.001 is not O(n). We prove that n 1+0.001 is not O(n) by contradiction. Suppose that n 1+0.001 is O(n), which means that we can find c > 0 and n 0 1 such that n 1+0.001 cn, n n 0. (4) By dividing both sides of the inequality of (4) by n (n 1) we obtain the following: c n 0.001 (5) Inequality (5) can not be true since c must be a constant but n 0.001 is unbounded. In fact, as soon as n > e 1000log(c) we have c < n 0.001. This is a contradiction with the assumption that we can find such a constant c. Therefore, n 1+0.001 is not O(n). 4. (10%)UsethedefinitionofBig-Ohtoprovethatiff(n)isO(g(n)),thenaf(n)isO(g(n)), for any positive constant a. From the fact that f(n) is O(g(n)) we can find constants c f > 0 and n 0f 1 such that f(n) c f g(n), n n 0f. (6) By multiplying both sides of the inequality of (7) by a (a > 0), we get af(n) ac f g(n), n n 0f. (7) Let c = ac f and n 0 = n 0f. We show that we can get constants c > 0 and n 0 1 such that af(n) cg(n) for all n n 0. By the definition of Big-Oh, this proves that if f(n) is O(g(n)), then af(n) is O(g(n)), for any positive constant a.
5. (5%) We want to know how many students are taking both CS10 and CS11 this term. Let A and B be the class lists of CS10 and CS11. Each of A and B consists of unique student IDs of the corresponding class. To keep it simple, we assume that the two classes have the same number of students, denoted by n. 5.1 Write an algorithm in pseudocode to count the number of students who are taking both CS10 and CS11 this term. 5. Compute the worst case running time T(n) of your algorithm with respect to the class size n. 5.3 Give the best Big-Oh complexity characterization of T(n). Solution 1: 5.1 Algorithm countcommon(a, B, n) Input: Two integer arrays A and B with both size of n Output: Number of common elements in A and B e 0 //number of common elements for i 0 to n 1 do for j 1 to n 1 do if B[j] = A[i] then e = e+1 break return e 5. The worst case occurs when there are no common elements in A and B. In such case, every element in A needs to be compared with every element in B. This algorithm involves a nested for loop. We analyze the inner-most for loop first. In each iteration of the inner for loop, only a number of constant c operations are performed (mainly one comparison). The number of iterations of the inner for loop is n. Thus, the total number of operations performed in this loop is cn. As for the outer (or first) for loop, the number of iterations is again n. In each iteration of the outer for loop, it performs the work of the inner loop. Therefore, the total work done by the outer for loop is n cn = cn. Consequently T(n) = cn +c where c is the number of operations for initializing e and returning e at the end. 5.3 T(n) = cn +c is O(n ). (Proof is straightforward.) 3
Solution : 5.1 Algorithm countcommon(a, B, n) Input: Two integer arrays A and B with both size of n Output: Number of common elements in A and B Sort B by merge-sort or quick-sort e 0 //number of common elements for i 0 to n 1 do b binarysearch(a[i], B) if b null then e = e+1 return e 5. The worst case happens when there are no common elements in A and B. First, sorting the elements in B can be done in cnlogn by either merge-sort or quick-sort. Then, we can search in B for each element of A via binary-search, which takes n c logn = c nlogn since the cost for each binary search is c logn. In total T(n) = cnlogn+c nlogn. 5.3 T(n) = cnlogn+c nlogn is O(nlogn). Remark: If both of the input arrays A and B are already sorted, one can achieve O(n) for Algorithm countcommon by comparing the elements of A and B in sequence, akin to the merge operation in merge-sort. 6. (5%) In the mathematical discipline of linear algebra, a matrix of the form l 1,1 0 l,1 l,. L = l 3,1 l 3,.......... l n,1 l n,... l n,n 1 l n,n (8) is called lower triangular matrix. For example, the following matrix is lower triangular. 1 0 0 8 0 (9) 4 9 7 Provided that l i,i 0 for 1 i n, a matrix equation in the form Lx = b, where x = [x 1,...,x n ] T and b = [b 1,...,b n ] T, is very easy to solve by an iterative process 4
called forward substitution, reported as follows. The matrix equation Lx = b can be written as a system of linear equations l 1,1 x 1 = b 1 l,1 x 1 + l, x = b...... l n,1 x 1 + l n, x + + l n,n x n = b n (10) Observe that the first equation (l 1,1 x 1 = b 1 ) only involves x 1, and thus one can solve for x 1 directly. The second equation only involves x 1 and x, and thus can be solved once one substitutes in the already solved value for x 1. Continuing in this way, the k-th equation only involves x 1,...,x k, and one can solve for x k using the previously solved values for x 1,...,x k 1. The resulting formulas are: x 1 = b 1, x = b l,1 x 1,, x n = b n n 1 i=1 l n,ix i (11) l 1,1 l n,n l, Please refer to http://en.wikipedia.org/wiki/triangular matrix for a general description about triangular matrices and the forward substitution process. The following algorithm, ForwardSubstitution 1, is a straight-forward realization of the forward substitution process reported above. Algorithm ForwardSubstitution 1(L, b) Input: -dimensional array L[1,...,n][1,...,n] encoding the triangular matrix in Expression (1) such that L[i][j] = l i,j ; An array b[1,...,n] where b[i] = b i. Output: An array x[1,...,n] such that x[i] = x i as given in Expressions (11). x[1] b[1]/l[1][1] for i to n do v 0 for j 1 to i 1 do v v +L[i][j] x[j] x[i] (b[i] v)/l[i][i] 6.1 Prove that the time complexity of Algorithm ForwardSubstitution 1 is O(n ). For this algorithm, there are no difference between best case and worst case. It uses a nested for loop. We analyze the inner-most for loop first. In every iteration of this loop a constant number c 1 of operations is performed and the loop is repeated for i 1 times. Thus, the total number of operations performed by this loop is c 1 (i 1). As for the first for loop, in every iteration it performs a constant number of operations c to reset v to 0, c 1 (i 1) operations by the second for block, and a constant number of operations c 3 for calculating the 5
value of x i. The first loop changes the values of i from to n, thus the total number of operations performed in this loop is n (c +c 1 (i 1)+c 3 ). i= In total, the running time T(n) of Algorithm ForwardSubstitution 1 is n (c +c 1 (i 1)+c 3 )+c 0, i= where c 0 is the number of operations for calculating x 1. T(n) can be further simplified as follows. T(n) = c (n 1)+c 1 (1++3+ +n 1)+c 3 (n 1)+c 0 = c (n 1)+c 1 ( n(n+1) 1)+c 3 (n 1)+c 0 = c 1 n +( c 1 +c +c 3 )n+c 0 c 1 c c 3 Therefore, Algorithm ForwardSubstitution 1 runs in O(n ). 6. (optional, 5% bonus) Can Algorithm ForwardSubstitution 1 be improved to achieve O(n) time complexity? Adjust your reasoning regarding your answer. Answer: If none of the coefficients of the triangular matrix L is zero, then the cost for computing x 1,x,...,x n lies in Θ(n ). Indeed, under this assumption, the account (performed in the solution of the previous question) for the number of additions, subtractions, multiplications and divisions is not over-estimated. In fact, this amount of arithmetic operations is a sharp estimate. Therefore, under the hypothesis that L[i][j] 0 for 1 i n, 1 j i, Algorithm ForwardSubstitution 1 runs in Θ(n ) arithmetic operations. Another argument is to observe that the number of coefficients in the triangular matrix L is n(n+1). Therefore, we need Θ(n ) read operations for accessing all the entries of the input data set. Considering the amount of data that a given algorithm needs to read and write is often used as an argument for proving that this algorithm requires at least a given number of elementary operations. 6