CS302 Topic: Algorithm Analysis Thursday, Sept. 22, 2005
Announcements Lab 3 (Stock Charts with graphical objects) is due this Friday, Sept. 23!! Lab 4 now available (Stock Reports); due Friday, Oct. 7 (2 weeks) Don t procrastinate!! Procrastination is like a credit card: it s a lot of fun until you get the bill. -- Christopher Parker
Check your knowledge of C++ & libs: Assume: a. Stack() b. Stack(int size); c. Stack(const Stack &); d. Stack &operator= (const Stack &) Which type of constructor gets called? Stack *stack1 = new Stack(10); Answer: b #8
Check your knowledge of C++ & libs: Assume: a. Stack() b. Stack(int size); c. Stack(const Stack &); d. Stack &operator= (const Stack &) Which type of constructor gets called? Stack stack2; Answer: a #9
Check your knowledge of C++ & libs: Assume: a. Stack() b. Stack(int size); c. Stack(const Stack &); d. Stack &operator= (const Stack &) Which type of constructor gets called? Stack stack3 = stack2; Answer: c #10
Check your knowledge of C++ & libs: Assume: a. Stack() b. Stack(int size); c. Stack(const Stack &); d. Stack &operator= (const Stack &) Which type of constructor gets called? Stack stack4(stack2); Answer: c #11
Check your knowledge of C++ & libs: Assume: a. Stack() b. Stack(int size); c. Stack(const Stack &); d. Stack &operator = (const Stack &) Which type of constructor gets called? stack4 = stack2; Answer: d #12
Analysis of Algorithms The theoretical study of computer program performance and resource usage What s also important (besides performance/resource usage)? Modularity Correctness Maintainability Functionality Robustness User-friendliness Programmer time Simplicity Reliability
Why study algorithms and performance? Analysis helps us understand scalability Performance often draws the line between what is feasible and what is impossible Algorithmic mathematics provides a language for talking about program behavior The lessons of program performance generalize to other computing resources Speed is fun!
The problem of sorting Input: sequence < a1, a2,..., an > of numbers < a 1, a 2,..., a n > Output: permutation such that a a a 1 2... n Example: Input: 8 2 4 9 3 6 Output: 2 3 4 6 8 9
Insertion sort pseudocode INSERTION-SORT(A,n) A[1.. n] for j 2 to n do key A[j] i j 1 while i > 0 and A[i] > key do A[i +1] A[i] i i 1 A[i +1] = key A: 1 i j n sorted key
Example of insertion sort 8 2 4 9 3 6 2 8 4 9 3 6 2 4 8 9 3 6 2 4 8 9 3 6 2 3 4 8 9 6 2 3 4 6 8 9 done
Running time Running time depends on the input; an already-sorted sequence is easier to sort Parameterize the running time by the size of the input, since short sequences are easier to sort than long ones Generally, we seek an upper bounds on the running time, since everybody likes a guarantee
Kinds of analyses Worst-case: (usually) T(n) = maximum time of algorithm on any input of size n Average case: (sometimes) T(n) = expected time of algorithm over all inputs of size n Need assumption of statistical distribution of inputs Best case: (bogus) Cheat with a slow algorithm that works fast on some input
Machine-independent time What is insertion sort s worst-case time? It depends on the speed of our computer Relative speed (on the same machine) Absolute speed (on different machines) BIGIG IDEA: Ignore machine-dependent constants Look at growth of T(n) as n Asymptotic Analysis
Θ-notation (tight upper bound) Θ(n 2 ): Pronounce it: Theta of n 2 Math: Θ(g(n)) = {f(n): there exist positive constants c 1, c 2, and n 0, such that 0 c 1 g(n) f(n) c 2 g(n) for all n n 0 } Engineering: Drop low-order terms; ignore leading constants Example: 3n 3 + 90n 2 4n + 6046 = Θ(n 3 )
Asymptotic performance When n gets large enough, a Θ(n 2 ) algorithm always beats a Θ(n 3 ) algorithm We shouldn t ignore asymptotically slower algorithms, however. real-world design situations often call for a careful balancing of engineering objectives Asymptotic analysis is a useful tool to help structure our thinking
Closer look at mathematical definition of Θ(n) Θ(g(n)) = {f(n): there exist positive constants c 1, c 2, and n 0, such that 0 c 1 g(n) f(n) c 2 g(n) for all n n 0 } c 2 g(n) f(n) c 1 g(n) n 0 n f(n) = Θ(g(n))
O-notation (upper bound may not be tight upper bound) O(n 2 ): Pronounce it: Big-Oh of n 2 or Order n 2 Math: O(g(n)) = {f(n): there exist positive constants c and n 0, such that 0 f(n) cg(n) for all n n 0 }
Closer look at mathematical definition of O(n) O(g(n)) = {f(n): there exist positive constants c and n 0, such that 0 f(n) cg(n) for all n n 0 } cg(n) f(n) n 0 f(n) = O(g(n)) n
Insertion sort analysis Worst case: Input reverse sorted n 2 ( ) ( ) ( ) Tn= Θ j=θ n j= 2 1 [Recall arithmetic series: ] k = 1 Average case: All permutations equally likely n k = n n+ =Θ n 2 2 ( 1) ( ) n 2 ( ) ( /2) ( ) Tn= Θ j =Θ n j= 2 Is insertion sort a fast sorting algorithm? Moderately so, for small n Not at all, for large n
Merge sort MERGE-SORT A[1..n] 1. If n = 1, we re done. 2. Recursively sort A[ 1.. n/2 ] and A[ n/2 +1.. n ] 3. Merge the 2 sorted arrays Subroutine MERGE From the head of each array, output the smallest value and increment the pointer to that array; continue until reach end of both arrays
Merging two sorted arrays 20 12 13 11 7 9 2 1 1
Merging two sorted arrays 20 12 13 11 7 9 2 1 20 12 13 11 7 9 2 1 2
Merging two sorted arrays 20 12 13 11 7 9 2 1 20 12 13 11 7 9 2 20 12 13 11 7 9 1 2 7
Merging two sorted arrays 20 12 20 12 20 12 20 12 13 11 13 11 13 11 13 11 7 9 7 9 7 9 9 2 1 2 1 2 7 9
Merging two sorted arrays 20 12 20 12 20 12 20 12 20 12 13 11 13 11 13 11 13 11 13 11 7 9 7 9 7 9 9 2 1 2 Etc. 1 2 7 9 11 Time = Θ(n) to merge a total of n elements (i.e., linear time)
Analyzing merge sort T(n) Θ(1) 2 T(n/2) Θ(n) MERGE-SORT A[1..n] 1.If n = 1, we re done. 2.Recursively sort A[ 1.. n/2 ] and A[ n/2 +1.. n ] 3. Merge the 2 sorted arrays Sloppiness: Should be T[ n/2 ] + T[ n/2 ], but it turns out not to matter asymptotically Means constant time
Recurrence for merge sort Recurrence: an equation defined in terms of itself For merge sort: Tn ( ) Θ (1) if n = 1 = 2 Tn ( /2) + Θ ( n) if n> 1 We usually omit stating the base case when T(n) = Θ(1) for sufficiently small n, but only when it has no effect on the asymptotic solution to the recurrence
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 T(n)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 T(n/2) T(n/2)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 /2 /2 T(n/4) T(n/4) T(n/4) T(n/4)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 /2 /2 Θ(1)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1) #leaves = n Θ(n)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1) #leaves = n Θ(n)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1) #leaves = n Θ(n)
Solving recurrences: The recursion tree Solve T(n) = 2T(n/2) +, for constant c >0 h = lg n /2 /2 Θ(1) #leaves = n Θ(n) Total = Θ(n lg n)
Differences in asymptotic complexity Θ(n lg n) grows more slowly than Θ(n 2 ) Therefore, merge sort asymptotically beats insertion sort in the worst case In practice, merge sort beats insertion sort for n > 30 (or so)
To get a feel for differences in run time, watch this animation Compare Θ(n 2 ) algorithm with Θ(n log n) (and algorithms run on sequential versus parallel machines): http://www.cs.rit.edu/~atk/java/sorting/sorting.html
Options for solving recurrences Recursion tree Proof by induction (beyond scope of this class) Master method (beyond scope of this class) Solves recurrences of the form: Tn ( ) = atnb ( / ) + f ( n) Learn common solutions, such as: Recurrence: Solution: Tn ( ) = Tn ( /2) + d Tn ( ) = O(log n) Tn ( ) = Tn ( /2) + n Tn ( ) = On ( ) Tn ( ) = 2 Tn ( /2) + d Tn ( ) = On ( ) Tn ( ) = 2 Tn ( /2) + n Tn ( ) = On ( log n) Tn ( ) = Tn ( 1) + d Tn ( ) = On ( )
Typical running times of algorithms Θ(1): The running time of the program is constant (i.e., independent of the size of the input) Example: Find the 5 th element in an array Θ(log n): Typically achieved by dividing the problem into smaller segments and only looking at one input element in each segment Example: Binary search Θ(n): Typically achieved by examining each input element once Example: Find the minimum element
Typical running times of algorithms (con t.) Θ(n log n): Typically achieved by dividing the problem into subproblems, solving the subproblems independently, and then combining the results. Here, each element in subproblem must be examined. Example: Merge sort Θ(n 2 ): Typically achieved by examining all pairs of data elements Example: Insertion sort Θ(n 3 ): Often achieved by combining algorithms with a mixture of the previous running times Example: Matrix Multiplication
That s all, folks. Questions?