4.2 and Searching pentrust.org Introduction to Programming in Java: An Interdisciplinary Approach Robert Sedgewick and Kevin Wayne Copyright 2002 2010 23/2/2012 15:04:54 pentrust.org pentrust.org shanghaiscrap.org shanghaiscrap.org De Beers problem. Rearrange items in ascending order. Applications. Statistics, databases, data compression, bioinformatics, computer graphics, scientific computing, (too numerous to list),... Insertion Sort Hauser Hong Hsu Hayes Haskell Hanley Hornet Hanley Haskell Hauser Hayes Hong Hornet Hsu The Bridge Table 14 15 1
Comparsions (millions) Insertion Sort Insertion Sort Insertion sort. Brute-force sorting solution. Move left-to-right through array. Exchange next element with larger elements to its left, one-by-one. Insertion sort. Brute-force sorting solution. Move left-to-right through array. Exchange next element with larger elements to its left, one-by-one. 16 17 Insertion Sort: Java Implementation Insertion Sort: Empirical Analysis public class Insertion { public static void sort(string[] a) { int = a.length; for (int i = 1; i < ; i++) for (int j = i; j > 0; j--) if (a[j-1].compareto(a[j]) > 0) exch(a, j-1, j); else break; private static void exch(string[] a, int i, int j) { String swap = a[i]; a[i] = a[j]; a[j] = swap; Observation. umber of compares depends on input family. Descending: ~ 2 / 2. Random: ~ 2 / 4. Ascending: ~. 1 0000000 Descendng Random 1 00000 Ascendi ng 1 000 10 0.1 0.001 1 000 1 0000 1 00000 1 000000 Input Size 18 19 Insertion Sort: Mathematical Analysis Challenge 1 Worst case. [descending] Iteration i requires i comparisons. Total = (0 + 1 + 2 +... + -1) ~ 2 / 2 compares. E F G H I J D C B A i Average case. [random] Iteration i requires i / 2 comparisons on average. Total = (0 + 1 + 2 +... + -1) / 2 ~ 2 / 4 compares Q. A credit card company sorts 10 million customer account numbers, for use with binary search. Using insertion sort, what kind of computer is needed? A. Toaster B. Cell phone C. Your laptop D. Supercomputer E. Google server farm A C D F H J E B I G i 20 21 2
Insertion Sort: Lesson Moore's Law Lesson. Supercomputer can't rescue a bad algorithm. Moore's law. Transistor density on a chip doubles every 2 years. Variants. Memory, disk space, bandwidth, computing power per $. Computer Per Second Thousand Million Billion laptop 10 7 instant 1 day 3 centuries super 10 12 instant 1 second 2 weeks http://en.wikipedia.org/wiki/moore's_law 22 23 Moore's Law and Algorithms Quadratic algorithms do not scale with technology. ew computer may be 10x as fast. But, has 10x as much memory so problem may be 10x bigger. With quadratic algorithm, takes 10x as long! Software inefficiency can always outpace Moore's Law. Moore's Law isn't a match for our bad coding. Jaron Lanier Lesson. eed linear (or linearithmic) algorithm to keep pace with Moore's law. 24 25 : Example algorithm. Divide array into two halves. Recursively sort each half. Merge two halves to make sorted whole. 26 27 3
Merging Merging Merging. Combine two pre-sorted lists into a sorted whole. How to merge efficiently? Use an auxiliary array. Merging. Combine two pre-sorted lists into a sorted whole. How to merge efficiently? Use an auxiliary array. String[] aux = new String[]; // merge into auxiliary array int i = lo, j = mid; for (int k = 0; k < ; k++) { if (i == mid) aux[k] = a[j++]; else if (j == hi) aux[k] = a[i++]; else if (a[j].compareto(a[i]) < 0) aux[k] = a[j++]; else aux[k] = a[i++]; // copy back for (int k = 0; k < ; k++) { a[lo + k] = aux[k]; 28 29 : Java Implementation : Mathematical Analysis public class Merge { public static void sort(string[] a) { sort(a, 0, a.length); Analysis. To mergesort array of size, mergesort two subarrays of size / 2, and merge them together using compares. we assume is a power of 2 // Sort a[lo, hi). public static void sort(string[] a, int lo, int hi) { int = hi - lo; if ( <= 1) return; // recursively sort left and right halves int mid = lo + /2; sort(a, lo, mid); sort(a, mid, hi); T( / 2) T( / 4) T( / 4) T() T( / 2) T( / 4) T( / 4) 2 ( / 2) 4 ( / 4) // merge sorted halves (see previous slide) T( / 2 k ) log 2... lo mid hi T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2) / 2 (2) 10 11 12 13 14 15 16 17 18 19 log 2 30 31 : Mathematical Analysis Challenge 2 Mathematical analysis. analysis worst comparisons log 2 Q. A credit card company sorts 10 million customer account numbers, for use with binary search. Using mergesort, what kind of computer is needed? average log 2 best 1/2 log 2 Validation. Theory agrees with observations. A. Toaster B. Cell phone C. Your laptop D. Supercomputer E. Google server farm 10,000 20 million 50 million actual 120 thousand 460 million 1,216 million predicted 133 thousand 485 million 1,279 million 32 33 4
Challenge 3 : Lesson Q. What's the fastest way to sort 1 million 32-bit integers? Lesson. Great algorithms can be more powerful than supercomputers. Computer Compares Per Second Insertion laptop 10 7 3 centuries 3 hours super 10 12 2 weeks instant = 1 billion 34 35 Insertion Sort: Empirical Analysis Insertion Sort: Observation Data analysis. Plot # comparisons vs. input size on log-log scale. 100000 Observe and tabulate running time for various values of. Data source: random numbers between 0 and 1. Machine: Apple G5 1.8GHz with 1.5GB memory running OS X. Timing: Skagen wristwatch. 10000 Actual 1000 Fitted 100 5,000 6.2 million 0.13 seconds 10,000 25 million 0.43 seconds 10 20,000 99 million 1.5 seconds 1 1000 10000 100000 1000000 400 million 5.6 seconds Input Size 80,000 1600 million 23 seconds slope Hypothesis. # comparisons grows quadratically with input size ~ 2 / 4. 49 50 Insertion Sort: Prediction and Verification Insertion Sort: Mathematical Analysis Experimental hypothesis. # comparisons ~ 2 /4. Mathematical analysis. Prediction. 400 million comparisons for =. Analysis Stddev Observations. 401.3 million 399.7 million 5.595 sec 5.573 sec Agrees. Worst Average Best 2 / 2 2 / 4-1/6 3/2-401.6 million 5.648 sec 400.0 million 5.632 sec Validation. Theory agrees with observations. Prediction. 10 billion comparisons for = 200,000. Actual Predicted 401.3 million 400 million Observation. 200,000 9.9997 billion 10.000 billion 200,000 9.997 billion 145 seconds Agrees. 51 52 5
: Preliminary Hypothesis : Prediction and Verification Experimental hypothesis. umber of comparisons ~ 20. Experimental hypothesis. umber of comparisons ~ 20. Prediction. 80 million comparisons for = 4 million. 1000 100 Insertion sort Observations. 4 million 4 million 82.7 million 82.7 million 3.13 sec 3.25 sec Agrees. 4 million 82.7 million 3.22 sec 10 1 Prediction. 400 million comparisons for = 20 million. 0.1 Observations. 1000 10000 100000 1000000 Input Size 20 million 50 million 460 million 1216 million 17.5 sec 45.9 sec ot quite. 53 54 6