Berner Fachhochschule - Technik und Informatik Data Structures and Algorithms Topic 1: Algorithm Analysis Philipp Locher FS 2018
Outline Course and Textbook Overview Analysis of Algorithm Pseudo-Code and Primitive Operations Growth Rate and Big-Oh Notation Topic 1: Algorithm Analysis Page 2
Outline Course and Textbook Overview Analysis of Algorithm Pseudo-Code and Primitive Operations Growth Rate and Big-Oh Notation Topic 1: Algorithm Analysis Page 3
Evaluation Project (50%) In groups of two Java implementation of selected topics Implementation and short presentation Written exam, 1.5h (50%) Last week in semester Open book E: > 50% and project, exam > 20% 4 ECTS https://prof.ti.bfh.ch/lhp2/algo Topic 1: Algorithm Analysis Page 4
Topic 1: Algorithm Analysis Page 5
Algorithm Design I Part I. Fundamental Tools 1. Algorithm Analysis 1.1 Methodologies for Analyzing Algorithms 1.2 Asymptotic Notation 1.3 A Quick Mathematical Review 1.4 Case Studies in Algorithm Analysis 1.5 Amortization 1.6 Experimentation 1.7 Exercises 2. Basic Data Structures 2.1 Stacks and Queues 2.2 Vectors, Lists, and Sequences 2.3 Trees 2.4 Priority Queues and Heaps 2.5 Dictionaries and Hash Tables 2.6 Java Example: Heap 2.7 Exercises 3. Search Trees and Skip Lists Topic 1: Algorithm Analysis Page 6
Algorithm Design II 3.1 Ordered Dictionaries and Binary Search Trees 3.2 AVL Trees 3.3 Bounded-Depth Search Trees 3.4 Splay Trees 3.5 Skip Lists 3.6 Java Example: AVL and Red-Black Trees 3.7 Exercises 4. Sorting, Sets, and Selection 4.1 Merge-Sort 4.2 The Set Abstract Data Type 4.3 Quick-Sort 4.4 A Lower Bound on Comparison-Based Sorting 4.5 Bucket-Sort and Radix-Sort 4.6 Comparison of Sorting Algorithms 4.7 Selection 4.8 Java Example: In-Place Quick-Sort 4.9 Exercises 5. Fundamental Techniques 5.1 The Greedy Method Topic 1: Algorithm Analysis Page 7
Algorithm Design III 5.2 Divide-and-Conquer 5.3 Dynamic Programming 5.4 Exercises Part II. Graph Algorithms 1. Graphs 1.1 The Graph Abstract Data Type 1.2 Data Structures for Graphs 1.3 Graph Traversal 1.4 Directed Graphs 1.5 Java Example: Depth-First Search 1.6 Exercises 2. Weighted Graphs 2.1 Single-Source Shortest Paths 2.2 All-Pairs Shortest Paths 2.3 Minimum Spanning Trees 2.4 Java Example: Dijkstra s Algorithm 2.5 Exercises 3. Network Flow and Matching Topic 1: Algorithm Analysis Page 8
Algorithm Design IV 3.1 Flows and Cuts 3.2 Maximum Flow 3.3 Maximum Bipartite Matching 3.4 Minimum-Cost Flow 3.5 Java Example: Minimum-Cost Flow 3.6 Exercises Part III. Internet Algorithmics 1. Text Processing 1.1 Strings and Pattern Matching Algorithms 1.2 Tries 1.3 Text Compression 1.4 Text Similarity Testing 1.5 Exercises 2. Number Theory and Cryptography 2.1 Fundamental Algorithms Involving Numbers 2.2 Cryptographic Computations 2.3 Information Security Algorithms and Protocols 2.4 The Fast Fourier Transform Topic 1: Algorithm Analysis Page 9
Algorithm Design V 2.5 Java Example: FFT 2.6 Exercises 3. Network Algorithms 3.1 Complexity Measures and Models 3.2 Fundamental Distributed Algorithms 3.3 Broadcast and Unicast Routing 3.4 Multicast Routing 3.5 Exercises Part IV. Additional Topics 1. Computational Geometry 1.1 Range Trees 1.2 Priority Search Trees 1.3 Quadtrees and k-d Trees 1.4 The Plane Sweep Technique 1.5 Convex Hulls 1.6 Java Example: Convex Hull 1.7 Exercises 2. NP-Completeness Topic 1: Algorithm Analysis Page 10
Algorithm Design VI 2.1 P and NP 2.2 NP-Completeness 2.3 Important NP-Complete Problems 2.4 Approximation Algorithms 2.5 Backtracking and Branch-and-Bound 2.6 Exercises 3. Algorithmic Frameworks 3.1 External-Memory Algorithms 3.2 Parallel Algorithms 3.3 Online Algorithms 3.4 Exercises Topic 1: Algorithm Analysis Page 11
Outline Course and Textbook Overview Analysis of Algorithm Pseudo-Code and Primitive Operations Growth Rate and Big-Oh Notation Topic 1: Algorithm Analysis Page 12
Algorithms An algorithm is a step-by-step procedure for solving a problem in a finite amount of time Most algorithms transform input objects into output objects The running time (and the memory consumption) of an algorithm typically grows with the input size The average case running time is often difficult to determine We focus on the worst case running time Easier to analyse Crucial to applications such as games, finance, robotics Sometimes, it is also worth to study the best case running time Topic 1: Algorithm Analysis Page 13
Running Time: Example 100 80 Running Time 60 40 20 Best case Average case Worst case 0 1000 2000 3000 4000 5000 Input Size Topic 1: Algorithm Analysis Page 14
Problem Classes I There are feasible problems, which can be solved by an algorithm efficiently Examples: sorting a list, draw a line between two points, decide whether n is prime, etc. There are computable problems, which can be solved by an algorithm, but not efficiently Example: finding the shortest solution for large sliding puzzles There are undecidable problems, which can not be solved by an algorithm Example: for a given a description of a program and an input, decide whether the program finishes running or will run forever (halting problem) Topic 1: Algorithm Analysis Page 15
Problem Classes II All Problems Computable Problems Feasible Problems Topic 1: Algorithm Analysis Page 16
Experimental Studies There are two ways to analyse the running time (complexity) of an algorithm: Experimental studies Theoretical analysis Experiments usually involve the following major steps: Write a program implementing the algorithm Run the program with inputs of varying size n and composition Use a method like System.currentTimeMillis() to get an accurate measure of the actual running time t(n) Plot the results Topic 1: Algorithm Analysis Page 17
Experimental Studies: Example 10000 9000 Running Time (ms) 8000 7000 6000 5000 4000 3000 2000 1000 0 1000 2000 3000 4000 5000 Input Size Topic 1: Algorithm Analysis Page 18
Experimental Setup I To setup and conduct an experiment, perform each of the following steps with care 1. Choosing the question Estimate the asymptotic running time in the average case Test which of two algorithm is faster If an algorithm depends on parameters, find the best values For approximation algorithms, evaluate the quality of the approximation 2. Deciding what to measure Running time (system dependent) Number of memory references Number of primitive (e.g. arithmetic) operations Number of calls of a specific function 3. Generating test data Enough samples for statistically significant results Topic 1: Algorithm Analysis Page 19
Experimental Setup II Samples of varying sizes Representative of the kind of data expected in practice 4. Coding the solution and performing the experiment Reproducible results Keep note of the details of the computational environment 5. Evaluating the test results statistically Topic 1: Algorithm Analysis Page 20
Limits of Experiments It is necessary to implement the algorithm, which may be difficult Results may not be indicative of the running time on other inputs not included in the experiment The running times may depend strongly on the chosen setting (hardware, software, programming language) In order to compare two algorithms, the same hardware and software environments must be used Topic 1: Algorithm Analysis Page 21
Theoretical Analysis Uses a high-level description of the algorithm instead of an implementation Describes the running time (complexity) as a function of the input size n: T (n) = number of primitive operations Allows to evaluate the speed of an algorithm independently of hardware software programming language actual implementation Takes into account all possible inputs Topic 1: Algorithm Analysis Page 22
Outline Course and Textbook Overview Analysis of Algorithm Pseudo-Code and Primitive Operations Growth Rate and Big-Oh Notation Topic 1: Algorithm Analysis Page 23
Pseudo-Code Pseudo-code allows a high-level description of an algorithm More structured than English prose, but less detailed than a program Preferred notation for describing algorithms Hides program design issues and details of a particular language Topic 1: Algorithm Analysis Page 24
Pseudo-Code: Example Find the maximum element of an array: Algorithm arraymax(a, n) Input: array A of n integers Output: maximum element of A currentmax A[0] for i 1 to n 1 do if A[i] > currentmax then currentmax A[i] return currentmax Topic 1: Algorithm Analysis Page 25
Pseudo-Code: Example (simplified) Find the maximum element of an array: Algorithm arraymax(a, n) currentmax A[0] for i 1 to n 1 do if A[i] > currentmax then currentmax A[i] return currentmax Topic 1: Algorithm Analysis Page 26
Details of Pseudo-Code I The rules of using pseudo-code are very flexible, but following some guidelines is recommended Declaration Algorithm algorithmname(arg 1, arg 2,...) Input:... Output:... Control flow if... then... [else...] while... do... repeat... until... for... do... Indentation replaces braces {} resp. begin... end Topic 1: Algorithm Analysis Page 27
Details of Pseudo-Code II Expressions Assignments: Equality testing: = Superscript like n 2 and other mathematical notation is allowed Array access: A[i 1] Calling another algorithm algorithmname(arg 1, arg 2,...) object.algorithmname(arg 1, arg 2,...), which is equivalent to algorithmname(object, arg 1, arg 2,...) Return value return expression Topic 1: Algorithm Analysis Page 28
The Random Access Machine Model To study algorithms in pseudo-code analytically, we need a theoretical model of a computing machine A random access machine consists of a CPU a bank of memory cells, each of which can hold an arbitrary number (of any type and size) or character The size of the memory is unlimited Memory cells are numbered and accessing any memory cell takes 1 unit time (= random access) All primitive operations of the CPU take 1 unit time Topic 1: Algorithm Analysis Page 29
Primitive Operations The primitive operations of a random access machine are the once we allow in a pseudo-code algorithm Reading the value of a variable Assigning a value to a variable Evaluating an expression (arithmetic operations, comparisons) Indexing into an array Calling a method/function Returning from a method/function By inspecting the pseudo-code, we can count the number of primitive operations as a function of the input size T (n) = number of primitive operations for input of size n Largely independent from the programming language Topic 1: Algorithm Analysis Page 30
Primitive Operations: Example What is the running time of arraymax? Algorithm arraymax(a, n) currentmax A[0] for i 1 to n 1 do if A[i] > currentmax then currentmax A[i] return currentmax Topic 1: Algorithm Analysis Page 31
Primitive Operations: Example Algorithm arraymax(a, n) currentmax A[0] for i 1 to n 1 do if A[i] > currentmax then currentmax A[i] return currentmax # Operations 2 1 4 (n 1) 3 (n 1) 2 Hence, the running time of arraymax is T (n) = 7n 2. Or did we forget something? Topic 1: Algorithm Analysis Page 32
Primitive Operations: Example (completed) Algorithm arraymax(a, n) currentmax A[0] for i 1 to n 1 do {i n 1} if A[i] > currentmax then currentmax A[i] {i i + 1} return currentmax # Operations 2 1 4 n 4 (n 1) 3 (n 1) 3 (n 1) 2 Hence, the running time of arraymax is T (n) = 14n 5. Topic 1: Algorithm Analysis Page 33
Worst Case vs. Best Case For a given input size n, the running time of an algorithm may depend on other aspects of the input Sorting a list of size n (the list may already be sorted) Finding an element in a list of size n (the element may be at the beginning of the list) Usually, if an algorithm contains if-then-else statements, it can have different running times Worst case: maximal number of primitive operations Best case: minimal number of primitive operations Example: arraymax Worst case: T (n) = 14n 5 Best case: T (n) = 11n 2 Topic 1: Algorithm Analysis Page 34
Outline Course and Textbook Overview Analysis of Algorithm Pseudo-Code and Primitive Operations Growth Rate and Big-Oh Notation Topic 1: Algorithm Analysis Page 35
Math You Need to Review Logarithms/exponentials: a = b c log b a = c b log b a = a Properties of logarithms log b ac = log b a + log b c a log b c = log b a log b c log b a c = c log b a Properties of exponentials b a+c = b a b c b a c = (b a )/(b c ) b ac = (b a ) c = (b c ) a b (ac ) Base change: log b a = log c a log c b and bc = a c log a b Computer science: base b = 2, i.e. log a = log 2 a Topic 1: Algorithm Analysis Page 36
Estimating Actual Running Times In reality, the time to execute a primitive operation differs Let a be the time taken by the fastest primitive operation Let b be the time taken by the slowest primitive operation If t(n) denotes the actual running time of the algorithm on a concrete machine for a problem of size n, we get a T (n) t(n) b T (n) Changing the hardware/software environment Affects the constants a and b Therefore, affects t(n) only by a constant factor Topic 1: Algorithm Analysis Page 37
Asymptotic Growth Rates of Running Times The constants a and b do not change the asymptotic growth rate of t(n) Therefore, we can consider the asymptotic growth rate of T (n) as being the characteristic measure for evaluating an algorithm s running time We will use the terms number of primitive operations and running time interchangeably and denote them by T (n) The asymptotic growth rate of T (n) is an intrinsic property of the algorithm which is independent of its implementation Topic 1: Algorithm Analysis Page 38
Running Time Examples 1600 1400 1200 1000 800 600 400 200 2 n 2n 2 10n log(n) 20 n 10 n log(n) 10 20 30 40 50 Topic 1: Algorithm Analysis Page 39
Polynomial Running Times Feasible algorithms are those with polynomial (or better) running times: T (n) n d, d > 0 Linear: T (n) n Quadratic: T (n) n 2 Cubic: T (n) n 3 The asymptotic growth rate is not affected by constants, factors, or lower-order terms T (n) = 15n + 8 is linear T (n) = 100n 2 + 100000 is quadratic T (n) = 5n 3 + 5n 2 2n + 35 is cubic In a log-log chart, a polynomial running time turns out to be a straight line Topic 1: Algorithm Analysis Page 40
Log-Log Chart 10 12 superpolynomial cubic Running Time T(n) 10 10 10 8 10 6 10 4 100 quadratic linear sub- Polynomial 1 1 10 100 1000 10 4 10 5 10 6 10 7 Input Size n Topic 1: Algorithm Analysis Page 41
Constants and Factors 10 12 Running Time T(n) 10 10 10 8 10 6 10 4 n + 10 4 n 100n 10 2 1 1 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Input Size n Topic 1: Algorithm Analysis Page 42
Big-Oh Notation The so-called big-oh notation is useful to better compare and classify the asymptotic growth rates of different functions Given functions f (n) and g(n), we say that f (n) is O(g(n)), if there is a constant c > 0 and an integer n 0 1 such that Example 1: 2n + 10 is O(n) f (n) c g(n), for n n 0 2n + 10 c n n 10/(c 2), e.g. pick c = 3 and n 0 = 10 Example 2: n 2 is not O(n) n 2 c n n c, which can not be satisfied since c is a constant Topic 1: Algorithm Analysis Page 43
Big-Oh Example 1 1000 Running Time T(n) 100 10 2n +10 3n n 1 1 10 100 1000 Input Size n Topic 1: Algorithm Analysis Page 44
Big-Oh Example 2 10000 10n n Running Time T(n) 1000 100 100n n 2 10 1 1 10 100 1000 Input Size n Topic 1: Algorithm Analysis Page 45
Typical Running Times Constant: O(1) Logarithmic: O(log n) Linear: O(n) Quadratic: O(n 2 ) Cubic: O(n 3 ) Polynomial: O(n d ), d > 0 Exponential: O(b n ), b > 1 Others: O( n), O(n log n) Growth rate order: 1 log n n n n log n n 2 n 3 2 n 3 n Topic 1: Algorithm Analysis Page 46
Simplification Rules Constants, factors, and lower-order terms can be dropped 1. If f is a function and c > 0 a constant, then O(c f ) = O(f ) 2. If f and g are functions, then O(f + g) = max(o(f ), O(g)) 3. If f (n) = a d n d +... + a 2 n 2 + a 1 n + a 0 is a polynomial of degree d, then f (n) is O(n d ) Example: f (n) = 5n 4 + 2n 2 + 5 log n 3 is O(n 4 ) Indicate the smallest possible function, e.g., 2n + 3 is O(n) instead of 2n + 3 is O(2n) 4n 3 + 3n is O(n 3 ) instead of 4n 3 + 3n is O(n 3 + n) Topic 1: Algorithm Analysis Page 47
Growth Rate Comparison n log n n n n log n n 2 n 3 2 n 2 1 1.4 2 2 4 8 4 4 2 2 4 8 16 64 16 8 3 2.8 8 24 64 512 256 16 4 4 16 64 256 4,096 65,536 32 5 5.7 32 160 1,024 32,768 4.29 10 9 64 6 8 64 384 4,096 262,144 1.84 10 19 128 7 11 128 896 16,384 2,097,152 3.40 10 38 256 8 16 256 2,048 65,536 16,777,216 1.15 10 77 512 9 23 512 4,608 262,144 1.34 10 8 1.34 10 154 1,024 10 32 1,024 10,240 1,048,576 1.07 10 9 1.79 10 308 Atoms in the universe: ca. 10 80 Topic 1: Algorithm Analysis Page 48
The Importance of Asymptotics 1 operation/µs 256 ops./µs Running Time Maximum Problem Size n New Size 400n 2, 500 150, 000 9, 000, 000 256n 20n log n 4, 096 166, 666 7, 826, 087 256n log n 7+log n 2n 2 707 5, 477 42, 426 16n n 4 31 88 244 4n 2 n 19 25 31 n + 8 1 second 1 minute 1 hour Topic 1: Algorithm Analysis Page 49
Summary The asymptotic analysis of an algorithm determines the running time (complexity) in big-oh notation To perform the asymptotic analysis Find the worst-case number of primitive operations T (n) Express this function with big-oh notation Example: arraymax We determine that algorithm arraymax executes at most T (n) = 14n 5 primitive operations We say that algorithm arraymax runs in O(n) time Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations Topic 1: Algorithm Analysis Page 50