Checking for duplicates Maximum density Battling computers and algorithms Barometer Instructions Big O expressions. John Edgar 2

Similar documents
Algorithm Performance. (the Big-O)

Measuring algorithm efficiency

Assignment 1 Clarifications

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

Algorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

Algorithmic Complexity

Algorithm efficiency can be measured in terms of: Time Space Other resources such as processors, network packets, etc.

Chapter 2: Complexity Analysis

What is an algorithm?

Scientific Computing. Algorithm Analysis

Algorithmic Analysis. Go go Big O(h)!

Algorithm Analysis. CENG 707 Data Structures and Algorithms

How do we compare algorithms meaningfully? (i.e.) the same algorithm will 1) run at different speeds 2) require different amounts of space

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

Algorithm Efficiency, Big O Notation, and Javadoc

CS 261 Data Structures. Big-Oh Analysis: A Review

10/5/2016. Comparing Algorithms. Analyzing Code ( worst case ) Example. Analyzing Code. Binary Search. Linear Search

UNIT 1 ANALYSIS OF ALGORITHMS

Introduction to the Analysis of Algorithms. Algorithm

Lecture 5: Running Time Evaluation

LECTURE 9 Data Structures: A systematic way of organizing and accessing data. --No single data structure works well for ALL purposes.

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

[ 11.2, 11.3, 11.4] Analysis of Algorithms. Complexity of Algorithms. 400 lecture note # Overview

Assignment 1 (concept): Solutions

Analysis of Algorithm. Chapter 2

asymptotic growth rate or order compare two functions, but ignore constant factors, small inputs

Introduction to Analysis of Algorithms

CS240 Fall Mike Lam, Professor. Algorithm Analysis

Data Structures Lecture 8

Comparison of x with an entry in the array

Outline and Reading. Analysis of Algorithms 1

Choice of C++ as Language

CS:3330 (22c:31) Algorithms

Algorithm Analysis. This is based on Chapter 4 of the text.

Algorithm Analysis. Spring Semester 2007 Programming and Data Structure 1

CSE 373 APRIL 3 RD ALGORITHM ANALYSIS

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

Algorithm Efficiency and Big-O

RUNNING TIME ANALYSIS. Problem Solving with Computers-II

Algorithm Analysis. Gunnar Gotshalks. AlgAnalysis 1

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Algorithms A Look At Efficiency

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Spring 2016

CSE 146. Asymptotic Analysis Interview Question of the Day Homework 1 & Project 1 Work Session

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY

Algorithm. Algorithm Analysis. Algorithm. Algorithm. Analyzing Sorting Algorithms (Insertion Sort) Analyzing Algorithms 8/31/2017

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

Analysis of Algorithms. CSE Data Structures April 10, 2002

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Algorithms and Data Structures

Introduction to Computers & Programming

Lecture Notes on Sorting

CMPSCI 187: Programming With Data Structures. Lecture 5: Analysis of Algorithms Overview 16 September 2011

Recitation 9. Prelim Review

Analysis of Algorithms

Introduction to Computer Science

Computer Science Approach to problem solving

4.1 Performance. Running Time. The Challenge. Scientific Method. Reasons to Analyze Algorithms. Algorithmic Successes

Principles of Algorithm Analysis. Biostatistics 615/815

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Test 1 SOLUTIONS. June 10, Answer each question in the space provided or on the back of a page with an indication of where to find the answer.

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Remember, to manage anything, we need to measure it. So, how do we compare two implementations? ArrayBag vs. LinkedBag Which is more efficient?

9/10/2018 Algorithms & Data Structures Analysis of Algorithms. Siyuan Jiang, Sept

Outline. runtime of programs algorithm efficiency Big-O notation List interface Array lists

Chapter 6 INTRODUCTION TO DATA STRUCTURES AND ALGORITHMS

Computer Algorithms. Introduction to Algorithm

Algorithm Analysis. Big Oh

Elementary maths for GMT. Algorithm analysis Part I

Algorithm Analysis and Design

Sum this up for me. Let s write a method to calculate the sum from 1 to some n. Gauss also has a way of solving this. Which one is more efficient?

ASYMPTOTIC COMPLEXITY

Adam Blank Lecture 2 Winter 2017 CSE 332. Data Structures and Parallelism

Algorithm Analysis. Performance Factors

CSCI 261 Computer Science II

ALGORITHM ANALYSIS. cs2420 Introduction to Algorithms and Data Structures Spring 2015

Data structure and algorithm in Python

For searching and sorting algorithms, this is particularly dependent on the number of data elements.

Recall from Last Time: Big-Oh Notation

Algorithm Analysis. Review of Terminology Some Consequences of Growth Rates. The book s big example: Detecting anagrams. There be math here!

CS 3410 Ch 5 (DS*), 23 (IJP^)

IT 4043 Data Structures and Algorithms

Asymptotic Analysis of Algorithms

Programming II (CS300)

CS S-02 Algorithm Analysis 1

EE 368. Week 6 (Notes)

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Analysis of Algorithms Part I: Analyzing a pseudo-code

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

Lecture 2: Algorithm Analysis

Asymptotic Analysis Spring 2018 Discussion 7: February 27, 2018

CS126 Final Exam Review

Same idea, different implementation. Is it important?

Lecture 5 Sorting Arrays

ASYMPTOTIC COMPLEXITY

Computational biology course IST 2015/2016

CS240 Fall Mike Lam, Professor. Algorithm Analysis

Sorting. Bubble Sort. Selection Sort

ECE 122. Engineering Problem Solving Using Java

Algorithm. Lecture3: Algorithm Analysis. Empirical Analysis. Algorithm Performance

Transcription:

CMPT 125

Checking for duplicates Maximum density Battling computers and algorithms Barometer Instructions Big O expressions John Edgar 2

Write a function to determine if an array contains duplicates int duplicate_check(int a[], int n) { int i = n; while (i > 0) { i--; int j = i - 1; while (j >= 0) { if (a[i] == a[j]) { return 1; return 0; j--; Which statements run most frequently? The answer depends on the array elements What is the worst case? John Edgar 4

We often consider the worst case behaviour of an algorithm as a benchmark Guarantees performance under all circumstances Often, but not always, similar to average case behaviour Instead of timing an algorithm we can predict performance by counting the number of operations Performed by the algorithm in the worst case Derive total number of operations as a function of the input size (n)

int duplicate_check(int a[], int n) { 1 n+1 n n i+1 i i 1 int i = n; while (i > 0) { i--; int j = i - 1; while (j >= 0) { if (a[i] == a[j]) { return 1; return 0; j--; total: outside any loop: 2 outer loop: 3n + 1 inner loop: 3i + 1, for i from n-1 to 1 = 3/2 n 2 1/2n 3/2 n 2 + 5/2n + 3 John Edgar 6

Graph the check duplicate algorithm running times n Doubling the input size quadruples the running time A quadratic function time 10,000 108 25,000 640 50,000 2,427 100,000 8,841 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 0 20,000 40,000 60,000 80,000 100,000 John Edgar 7

Given a two dimensional, nxn, array of integers find the 10x10 swatch with the largest sum Resource management Finding brightest areas of photographs bright spots on Ceres

Simple approach Try every possible position for the top-left corner of the 10x10 swatch There are (n-10)x(n-10) of them Use a nested loop The values in each 10x10 swatch can be summed Retaining the largest A brute-force approach

N-10 N-10 10 int max10by10(int arr[n][n]) { int best = 0; for (int u_row = 0; u_row < N-10; u_row++) { for (int u_col = 0; u_col < N-10; u_col++) { int total = 0; for (int row = u_row; row < u_row+10; row++) { for (int col = u_col; col < u_col+10; col++) { total += arr[row][col]; 11 10 10 best = max(best, total); return best; Total: 348N 2-6956N + 34762 Approximate method: count barometer instructions, the most frequently occurring instructions Total = 31 x 10 x (N-10) x (N-10) = 310N 2 John Edgar 11

Empirical timing Run program on a real machine with various input sizes Plot a graph to determine the relationship Operation counting Assumes all elementary instructions execute in approximately the same amount of time Actual performance can depend on much more than just the choice of algorithm

Actual running time is affected by CPU speed Amount of RAM Specialized hardware (e.g., graphics card) Operating system System configuration (e.g., virtual memory) Programming Language Algorithm Implementation Other Programs

There can be many ways to solve a problem Different algorithms that produce the same result There are numerous sorting algorithms Compare algorithms by their behaviour for large input sizes, i.e., as n gets large On today s hardware, most algorithms perform quickly for small n Interested in growth rate as a function of n Sum an array: linear growth O(n) Check for duplicates: quadratic growth O(n 2 )

Express the number of operations in an algorithm as a function of n, the problem size Briefly Take the dominant term Remove the leading constant Put O( ) around it For example, f(n) = 348n 2 6956n + 34762 i.e. O(n 2 )

Given a function T(n) Say that T(N) = O(f(n)) if T(n) is at most a constant times f(n) Except perhaps for some small values of n Properties Constant factors don t matter Low-order terms don t matter Rules For any k and any function f(n), k*f(n) = O(f(n)) e.g., 5n = O(n) e.g., log a n = O(log b n) Do leading constants really not matter?

Of course leading constants matter Consider two algorithms f 1 (n) = 20n 2 f 2 (n) = 2n 2 Algorithm 2 runs ten times faster Let's consider machine speed If machine 1 is ten times faster than machine 2 it will run the same algorithm ten times faster Big O notation ignores leading constants It is a hardware independent analysis

Let's compare two algorithms on two computers Computer 1 Sunway TaihuLight Speed, 93 petaflops 93 * 10 15 floating point operations per second 93,000,000 gigaflops Fastest supercomputer 2016 Computer 2 Alienware Area 51 CPU Intel Core i7 6950X, 98 gigaflops 98 * 10 9 floating point operations per second 2016 Gaming PC The TaihuiLight is approximately 1,000,000 times faster

Let's compare two algorithms on two computers Algorithm 1 nested loop duplicate check f1(n) = 3/2n 2 + 5/2n + 3 Algorithm 2 a different duplicate check algorithm f2(n) = 30n * log 2 (n) + 5n + 4 Not only is the TaihuLight much faster but it's algorithm has a much smaller leading constant 3/2 versus 30

n TaihuLight Area 51 100,000 161 ns 514 µs 10 6 16 µs 6 ms 10 7 1.61 ms 72 ms 10 8 161 ms 819 ms 10 9 16 s 9 s 10 10 27 mins 102 s 10 11 45 hrs 19 mins 10 12 187 days 3.4 hrs 10 13 51 years 1.5 days Conclusions Area 51 running the algorithm with the smaller dominant term is faster For large values of n A slower computer with smaller O algorithm is faster The nlog(n) algorithm does not grow as fast as the n 2 algorithm The slower the function grows the faster the algorithm With n 2 an increase in size of 10x increases running time by 100x With nlog(n) an increase in size of 10x increases running time by just over 10x

We often use the worst-case behavior as a benchmark Derive the number of instructions as a function of the input size, n Can use the time command to measure for values of n Or count the number of operations Use Big O to express the growth rate Compare algorithms behaviour as n gets large Remove leading constants Hardware independent analysis John Edgar 23

Leading constants are affected by CPU speed Other tasks performed by the system Memory characteristics Program optimization Regardless of leading constants A O(n log(n)) algorithm will outperform a O(n 2 ) algorithm as n gets large John Edgar 24

A well designed and written algorithm can make the difference between software being usable or not n may be very large Google Or the number of times a task has to be performed may be very large Google again Or it may be necessary to have near instanteous response Real-time systems John Edgar 25

Algorithms can be optimized Implemented so as to improve performance Reducing the number of operations Replacing slower instructions with faster instructions Making more efficient of memory Instead of improving an inherently slow algorithm Consider if is there is a better algorithm With a smaller Big O running time Of course, no such algorithm might exist

Given an algorithm, how do you determine its Big O growth rate? The frequency of the algorithm s barometer instructions will be proportional to its Big-O running time Identify the most frequent instruction and count it Consider Loops Decisions Function calls

int max10by10(int arr[n][n]) { int best = 0; for (int u_row = 0; u_row < N-10; u_row++) { N-10 for (int u_col = 0; u_col < N-10; u_col++) { int total = 0; 10 for (int row = u_row; row < u_row+10; row++) { 10 for (int col = u_col; col < u_col+10; col++) { total += arr[row][col]; Barometer instructions best = max(best, total); return best; f(n) = 3 x 10 x 10 x (N-10) x (N-10) = O(N) 2 N-10 John Edgar 28

Function calls are not elementary instructions They must be substituted for the Big O running time int range(int arr[], int n) { int lo = min(arr, n); int hi = max(arr, n); return hi-lo; O(n) O(n) O(1) T(n) = O(n) + O(n) + O(1) = O(n)

If else is not an elementary operation Take the larger of the running times Remember... worst case analysis int search(int arr[], int n, int key) { if (!sorted(arr, n)) { O(n) return lsearch(arr, n, key); else { return bsearch(arr, n, key); T(n) = O(n) + max(o(n) + O(log n)) =O(n) + O(n) =O(n) O(n) O(log n)

Take the dominant term, remove the leading constant and put O( ) around it Lower order terms do not matter Constants do not matter

Powers of n are ordered according to their exponents i.e. n a = O(n b ) if and only if a b e.g. n 2 = O(n 3 ) but n 3 is not O(n 2 ) A logarithm grows more slowly than any positive power of n greater than 1 e.g. log 2 n = O(n 1/2 ) Though we say that log 2 n = O(log 2 n) Note that if log n is referred to it is assumed that the base is 2

Transitivity If f(n) = O(g(n)) and g(n) = O(h(n)) then f(n) = O(h(n)) Addition f(n) + g(n) = O(max(f(n), g(n))) Multiplication if f1(n) = O(g1(n)) and f2(n) = O(g2(n)) then f1(n) * f2(n) = O(g1(n) * g2(n)) An example (10 + 5n 2 )(10log 2 n + 1) + (5n + log 2 n)(10n + 2n log 2 n)

O(1) constant time The time is independent of n, e.g. array look-up O(logn) logarithmic time Usually the log is to the base 2, e.g. binary search O(n) linear time, e.g. linear search O(n log n), e.g. quicksort, mergesort O(n 2 ) quadratic time, e.g. selection sort O(n k ), where k is a constant polynomial O(2 n ) exponential time, very slow!

Number of Operations 450 Hard to see what is happening with n so small 400 350 300 250 200 log2n 5(log2n) 3(n+1)/4 n n(log2n) n((n-1)/2) n2 150 100 50 0 10 11 12 13 14 15 16 17 18 19 20 n John Edgar 36

Number of Operations n 2 and n(n-1)/2 are growing much faster than any of the others 12000 10000 8000 6000 4000 log2n 5(log2n) 3(n+1)/4 n n(log2n) n((n-1)/2) n2 2000 0 10 20 30 40 50 60 70 80 90 100 n John Edgar 37

Number of Operations Hmm! Let's try a logarithmic scale 1200000000000 1000000000000 800000000000 600000000000 400000000000 log2n 5(log2n) 3(n+1)/4 n n(log2n) n((n-1)/2) n2 200000000000 0 10 50 100 500 1000 5000 10000 50000 100000 500000 1000000 n John Edgar 38

Number of Operations Notice how clusters of growth rates start to emerge 1000000000000 100000000000 10000000000 1000000000 100000000 10000000 1000000 100000 10000 1000 100 log2n 5(log2n) 3(n+1)/4 n n(log2n) n((n-1)/2) n2 10 1 10 50 100 500 1000 5000 10000 50000 100000 500000 1000000 n John Edgar 39