Sorting and Algorithm Analysis

Similar documents
CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Sorting: The Big Picture. The steps of QuickSort. QuickSort Example. QuickSort Example. QuickSort Example. Recursive Quicksort

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Sorting. Sorted Original. index. index

Sorting. Sorting. Why Sort? Consistent Ordering

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Searching & Sorting. Definitions of Search and Sort. Linear Search in C++ Linear Search. Week 11. index to the item, or -1 if not found.

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Problem Set 3 Solutions

CS1100 Introduction to Programming

Programming in Fortran 90 : 2017/2018

Quicksort. Part 1: Understanding Quicksort

CS240: Programming in C. Lecture 12: Polymorphic Sorting

Priority queues and heaps Professors Clark F. Olson and Carol Zander

CE 221 Data Structures and Algorithms

Brave New World Pseudocode Reference

Design and Analysis of Algorithms

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

Machine Learning: Algorithms and Applications

An Optimal Algorithm for Prufer Codes *

More on Sorting: Quick Sort and Heap Sort

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Array transposition in CUDA shared memory

ELEC 377 Operating Systems. Week 6 Class 3

Outline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014

Mathematics 256 a course in differential equations for engineering students

CHAPTER 10: ALGORITHM DESIGN TECHNIQUES

CMPS 10 Introduction to Computer Science Lecture Notes

Parallelism for Nested Loops with Non-uniform and Flow Dependences

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Conditional Speculative Decimal Addition*

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Intro. Iterators. 1. Access

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Performance Evaluation of Information Retrieval Systems

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Esc101 Lecture 1 st April, 2008 Generating Permutation

Hierarchical clustering for gene expression data analysis

Support Vector Machines

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Private Information Retrieval (PIR)

Lecture 5: Multilayer Perceptrons

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

Parallel matrix-vector multiplication

Greedy Technique - Definition

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

Loop Transformations, Dependences, and Parallelization

CS 534: Computer Vision Model Fitting

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

S1 Note. Basis functions.

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

X- Chart Using ANOM Approach

Support Vector Machines

3D vector computer graphics

Computer models of motion: Iterative calculations

Terminal Window. 11. Section 7 Exercises Program Memory Exercise 7-1 Swap Values in an Array Working memory Global Memory. 2 nd call 3 rd call

USING GRAPHING SKILLS

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Hermite Splines in Lie Groups as Products of Geodesics

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function,

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Concurrent Apriori Data Mining Algorithms

Load Balancing for Hex-Cell Interconnection Network

Analysis of Continuous Beams in General

Wishing you all a Total Quality New Year!

Lecture 3: Computer Arithmetic: Multiplication and Division

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

A Binarization Algorithm specialized on Document Images and Photos

GSLM Operations Research II Fall 13/14

Pass by Reference vs. Pass by Value

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

The AVL Balance Condition. CSE 326: Data Structures. AVL Trees. The AVL Tree Data Structure. Is this an AVL Tree? Height of an AVL Tree

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Module Management Tool in Software Development Organizations

Machine Learning. Topic 6: Clustering

Summarizing Data using Bottom-k Sketches

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Smoothing Spline ANOVA for variable screening

On the Efficiency of Swap-Based Clustering

Classifier Selection Based on Data Complexity Measures *

A Geometric Approach for Multi-Degree Spline

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

2D Raster Graphics. Integer grid Sequential (left-right, top-down) scan. Computer Graphics

9. BASIC programming: Control and Repetition

Notes on Organizing Java Code: Packages, Visibility, and Scope

Transcription:

Unt 7 Sortng and Algorthm Analyss Computer Scence S-111 Harvard Unversty Davd G. Sullvan, Ph.D. Sortng an Array of Integers 0 1 2 n-2 n-1 arr 15 7 36 40 12 Ground rules: sort the values n ncreasng order sort n place, usng only a small amount of addtonal storage Termnology: poston: one of the memory locatons n the array element: one of the data tems stored n the array element : the element at poston Goal: mnmze the number of comparsons C and the number of moves M needed to sort the array. move = copyng an element from one poston to another example: arr[3] = arr[5];

Defnng a Class for our Sort Methods publc class Sort { publc statc vod bubblesort(nt[] arr) {... publc statc vod nsertonsort(nt[] arr) {...... Our Sort class s smply a collecton of methods lke Java s bult-n Math class. Because we never create Sort objects, all of the methods n the class must be statc. outsde the class, we nvoke them usng the class name: e.g., Sort.bubbleSort(arr) Defnng a Swap Method It would be helpful to have a method that swaps two elements of the array. Why won t the followng work? publc statc vod swap(nt a, nt b) { nt temp = a; a = b; b = temp;

An Incorrect Swap Method publc statc vod swap(nt a, nt b) { nt temp = a; a = b; b = temp; Trace through the followng lnes to see the problem: nt[] arr = {15, 7, ; swap(arr[0], arr[1]); stack heap arr 15 7... Ths method works: A Correct Swap Method publc statc vod swap(nt[] arr, nt a, nt b) { nt temp = arr[a]; arr[a] = arr[b]; arr[b] = temp; Trace through the followng wth a memory dagram to convnce yourself that t works: nt[] arr = {15, 7, ; swap(arr, 0, 1);

Selecton Sort Basc dea: consder the postons n the array from left to rght for each poston, fnd the element that belongs there and put t n place by swappng t wth the element that s currently there Example: 0 1 2 3 4 15 6 2 12 4 0 1 2 3 4 2 6 15 12 4 0 1 2 3 4 2 4 15 12 6 0 1 2 3 4 2 4 6 12 15 Why don t we need to consder poston 4? Selectng an Element When we consder poston, the elements n postons 0 through 1are already n ther fnal postons. 0 1 2 3 4 5 6 example for =3: 2 4 7 21 25 10 17 To select an element for poston : consder elements, +1,+2,,arr.length 1, and keep track of ndexmn, the ndex of the smallest element seen thus far 0 1 2 3 4 5 6 ndexmn: 3, 5 2 4 7 21 25 10 17 when we fnsh ths pass, ndexmn s the ndex of the element that belongs n poston. swap arr[] and arr[ndexmn]: 0 1 2 3 4 5 6 2 4 7 21 10 25 21 10 17

Implementaton of Selecton Sort Use a helper method to fnd the ndex of the smallest element: prvate statc nt ndexsmallest(nt[] arr, nt start) { nt ndexmn = start; for (nt = start + 1; < arr.length; ++) { f (arr[] < arr[ndexmn]) { ndexmn = ; return ndexmn; The actual sort method s very smple: publc statc vod selectonsort(nt[] arr) { for (nt = 0; < arr.length - 1; ++) { nt j = ndexsmallest(arr, ); swap(arr,, j); Tme Analyss Some algorthms are much more effcent than others. The tme effcency or tme complexty of an algorthm s some measure of the number of operatons that t performs. for sortng algorthms, we ll focus on two types of operatons: comparsons and moves The number of operatons that an algorthm performs typcally depends on the sze, n, of ts nput. for sortng algorthms, n s the # of elements n the array C(n) = number of comparsons M(n) = number of moves To express the tme complexty of an algorthm, we ll express the number of operatons performed as a functon of n. examples: C(n) = n 2 +3n M(n) = 2n 2-1

Countng Comparsons by Selecton Sort prvate statc nt ndexsmallest(nt[] arr, nt start){ nt ndexmn = start; for (nt = start + 1; < arr.length; ++) { f (arr[] < arr[ndexmn]) { ndexmn = ; return ndexmn; publc statc vod selectonsort(nt[] arr) { for (nt = 0; < arr.length - 1; ++) { nt j = ndexsmallest(arr, ); swap(arr,, j); To sort n elements, selecton sort performs n-1passes: on 1st pass, t performs comparsons to fnd ndexsmallest on 2nd pass, t performs comparsons on the (n-1)st pass, t performs 1 comparson Addng them up: C(n) = 1 + 2 + + (n - 2) + (n - 1) Countng Comparsons by Selecton Sort (cont.) The resultng formula for C(n) s the sum of an arthmetc sequence: C(n) = 1 + 2 + + (n - 2) + (n - 1) = Formula for the sum of ths type of arthmetc sequence: Thus, we can smplfy our expresson for C(n) as follows: C(n) = (n - 1)((n - 1) 1) = 2 = n - 1 1 m 1 (n - 1)n 2 m(m 1) 2 C(n) = n - 1 1 n 2 2 - n 2

Focusng on the Largest Term When n s large, mathematcal expressons of n are domnated by ther largest term.e., the term that grows fastest as a functon of n. example: n n 2 /2 n/2 n 2 /2 n/2 10 50 5 45 100 5000 50 4950 10000 50,000,000 5000 49,995,000 In characterzng the tme complexty of an algorthm, we ll focus on the largest term n ts operaton-count expresson. for selecton sort, C(n) = n 2 /2 - n/2 n 2 /2 In addton, we ll typcally gnore the coeffcent of the largest term (e.g., n 2 /2 n 2 ). Bg-O Notaton We specfy the largest term usng bg-o notaton. e.g., we say that C(n) = n 2 /2 n/2 s O(n 2 ) Common classes of algorthms: name example expressons bg-o notaton constant tme 1, 7, 10 O(1) logarthmc tme 3log 10 n, log 2 n+5 O(log n) lnear tme 5n, 10n 2log 2 n O(n) nlogn tme 4nlog 2 n, nlog 2 n+n O(nlog n) quadratc tme 2n 2 + 3n, n 2 1 O(n 2 ) exponental tme 2 n, 5e n +2n 2 O(c n ) slower For large nputs, effcency matters more than CPU speed. e.g., an O(log n) algorthm on a slow machne wll outperform an O(n) algorthm on a fast machne

Orderng of Functons We can see below that: n 2 grows faster than nlog 2 n nlog 2 n grows faster than n n grows faster than log 2 n 160 140 120 100 80 60 n^2 n log n n log n 40 20 0 0 1 2 3 4 5 6 7 8 9 10 11 12 n Orderng of Functons (cont.) Zoomng n, we see that: n 2 >= n for all n >= 1 nlog 2 n >= n for all n >= 2 n > log 2 n for all n >= 1 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 n^2 n log n n log n

Bg-O Tme Analyss of Selecton Sort Comparsons: we showed that C(n) = n 2 /2 n/2 selecton sort performs O(n 2 ) comparsons Moves: after each of the n-1 passes, the algorthm does one swap. n 1 swaps, 3 moves per swap M(n) = 3(n-1) = 3n-3 selecton sort performs O(n) moves. Runnng tme (.e., total operatons):? Mathematcal Defnton of Bg-O Notaton f(n) = O(g(n)) f there exst postve constants c and n 0 such that f(n) <= cg(n) for all n >= n 0 Example: f(n) = n 2 /2 n/2 s O(n 2 ), because n 2 /2 n/2 <= n 2 for all n >= 0. c = 1 n 0 = 0 g(n) = n 2 f(n) = n 2 /2 n/2 n Bg-O notaton specfes an upper bound on a functon f(n) as n grows large.

Bg-O Notaton and Tght Bounds Bg-O notaton provdes an upper bound, not a tght bound (upper and lower). Example: 3n 3 s O(n 2 ) because 3n 3 <= n 2 for all n >= 1 3n 3 s also O(2 n ) because 3n 3 <= 2 n for all n >= 1 However, we generally try to use bg-o notaton to characterze a functon as closely as possble.e., as f we were usng t to specfy a tght bound. for our example, we would say that 3n 3 s O(n) Bg-Theta Notaton In theoretcal computer scence, bg-theta notaton ( ) s used to specfy a tght bound. f(n) = (g(n)) f there exst constants c 1, c 2, and n 0 such that c 1 g(n) <= f(n) <= c 2 g(n) for all n > n 0 Example: f(n) = n 2 /2 n/2 s (n 2 ), because (1/4)*n 2 <= n 2 /2 n/2 <= n 2 for all n >= 2 c 1 = 1/4 c 2 = 1 n 0 = 2 g(n) = n 2 f(n) = n 2 /2 n/2 n (1/4) * g(n) = n 2 /4

Sortng by Inserton I: Inserton Sort Basc dea: gong from left to rght, nsert each element nto ts proper place wth respect to the elements to ts left, sldng over other elements to make room. Example: 0 1 2 3 4 15 4 2 12 6 4 15 2 12 6 2 4 15 12 6 2 4 12 15 6 2 4 6 12 15 Comparng Selecton and Inserton Strateges In selecton sort, we start wth the postons n the array and select the correct elements to fll them. In nserton sort, we start wth the elements and determne where to nsert them n the array. Here s an example that llustrates the dfference: 0 1 2 3 4 5 6 18 12 15 9 25 2 17 Sortng by selecton: consder poston 0: fnd the element (2) that belongs there consder poston 1: fnd the element (9) that belongs there Sortng by nserton: consder the 12: determne where to nsert t consder the 15; determne where to nsert t

Insertng an Element When we consder element, elements 0 through 1 are already sorted wth respect to each other. example for =3: To nsert element : make a copy of element, storng t n the varable toinsert: toinsert 9 0 1 2 3 4 6 14 19 9 0 1 2 3 6 14 19 9 consder elements -1, -2, f an element > toinsert, slde t over to the rght stop at the frst element <= toinsert 9 0 1 2 3 6 14 14 19 9 toinsert 19 copy toinsert nto the resultng hole : 0 1 2 3 6 9 14 19 Inserton Sort Example (done together) descrpton of steps 12 5 2 13 18 4

Implementaton of Inserton Sort publc class Sort {... publc statc vod nsertonsort(nt[] arr) { for (nt = 1; < arr.length; ++) { f (arr[] < arr[-1]) { nt toinsert = arr[]; nt j = ; do { arr[j] = arr[j-1]; j = j - 1; whle (j > 0 && toinsert < arr[j-1]); arr[j] = toinsert; Tme Analyss of Inserton Sort The number of operatons depends on the contents of the array. best case: array s sorted thus, we never execute the do-whle loop each element s only compared to the element to ts left C(n) = n 1 = O(n), M(n) = 0, runnng tme = O(n) worst case: array s n reverse order each element s compared to all of the elements to ts left: arr[1] s compared to 1 element (arr[0]) arr[2] s compared to 2 elements (arr[0] and arr[1]) arr[n-1] s compared to n-1 elements C(n) = 1 +2+ +(n 1)=O(n 2 ) as seen n selecton sort smlarly, M(n) = O(n 2 ), runnng tme = O(n 2 ) average case:

Sortng by Inserton II: Shell Sort Developed by Donald Shell n 1959 Improves on nserton sort Takes advantage of the fact that nserton sort s fast when an array s almost sorted. Seeks to elmnate a dsadvantage of nserton sort: f an element s far from ts fnal locaton, many small moves are requred to put t where t belongs. Example: f the largest element starts out at the begnnng of the array, t moves one place to the rght on every nserton! 0 1 2 3 4 5 1000 999 42 56 30 18 23 11 Shell sort uses larger moves that allow elements to quckly get close to where they belong. Sortng Subarrays Basc dea: use nserton sort on subarrays that contan elements separated by some ncrement ncrements allow the data tems to make larger jumps repeat usng a decreasng sequence of ncrements Example for an ntal ncrement of 3: 0 1 2 3 4 5 6 7 36 18 10 27 3 20 9 8 three subarrays: 1) elements 0, 3, 6 2) elements 1, 4, 7 3) elements 2 and 5 Sort the subarrays usng nserton sort to get the followng: 0 1 2 3 4 5 6 7 96 23 3 10 14 27 18 8 20 36 9 18 3 Next, we complete the process usng an ncrement of 1.

Shell Sort: A Sngle Pass We don t consder the subarrays one at a tme. We consder elements arr[ncr] through arr[arr.length-1], nsertng each element nto ts proper place wth respect to the elements from ts subarray that are to the left of the element. The same example (ncr = 3): 0 1 2 3 4 5 6 7 36 18 10 27 3 20 9 8 27 18 10 36 3 20 9 8 27 3 10 36 18 20 9 8 27 3 10 36 18 20 9 8 9 3 10 27 18 20 36 8 9 3 10 27 8 20 36 18 When we consder element, the other elements n ts subarray are already sorted wth respect to each other. example for =6: (ncr = 3) the other element s n 9 s subarray (the 27 and 36) are already sorted wth respect to each other To nsert element : make a copy of element, storng t n the varable toinsert: toinsert Insertng an Element n a Subarray 9 consder elements -ncr, -(2*ncr), -(3*ncr), f an element > toinsert, slde t rght wthn the subarray stop at the frst element <= toinsert copy toinsert nto the hole : 0 1 2 3 4 5 6 7 27 3 10 36 18 20 9 8 0 1 2 3 4 5 6 7 toinsert 36 9 0 1 2 3 4 5 6 7 27 3 10 36 18 20 9 8 27 3 10 27 36 18 20 9 8 0 1 2 3 4 9 3 10 27 18

The Sequence of Increments Dfferent sequences of decreasng ncrements can be used. Our verson uses values that are one less than a power of two. 2 k 1 for some k 63, 31, 15, 7, 3, 1 can get to the next lower ncrement usng nteger dvson: ncr = ncr/2; Should avod numbers that are multples of each other. otherwse, elements that are sorted wth respect to each other n one pass are grouped together agan n subsequent passes repeat comparsons unnecessarly get fewer of the large jumps that speed up later passes example of a bad sequence: 64, 32, 16, 8, 4, 2, 1 what happens f the largest values are all n odd postons? Implementaton of Shell Sort publc statc vod shellsort(nt[] arr) { nt ncr = 1; whle (2 * ncr <= arr.length) { ncr = 2 * ncr; ncr = ncr - 1; whle (ncr >= 1) { for (nt = ncr; < arr.length; ++) { f (arr[] < arr[-ncr]) { nt toinsert = arr[]; nt j = ; do { arr[j] = arr[j-ncr]; j = j - ncr; whle (j > ncr-1 && toinsert < arr[j-ncr]); arr[j] = toinsert; ncr = ncr/2; (If you replace ncr wth 1 n the for-loop, you get the code for nserton sort.)

Tme Analyss of Shell Sort Dffcult to analyze precsely typcally use experments to measure ts effcency Wth a bad nterval sequence, t s O(n 2 ) n the worst case. Wth a good nterval sequence, t s better than O(n 2 ). at least O(n 1.5 ) n the average and worst case some experments have shown average-case runnng tmes of O(n 1.25 ) or even O(n 7/6 ) Sgnfcantly better than nserton or selecton for large n: n n 2 n 1.5 n 1.25 10 100 31.6 17.8 100 10,000 1000 316 10,000 100,000,000 1,000,000 100,000 10 6 10 12 10 9 3.16 x 10 7 We ve wrapped nserton sort n another loop and ncreased ts effcency! The key s n the larger jumps that Shell sort allows. Practcng Tme Analyss Consder the followng statc method: publc statc nt mystery(nt n) { nt x = 0; for (nt = 0; < n; ++) { x += ; // statement 1 for (nt j = 0; j < ; j++) { x += j; return x; What s the bg-o expresson for the number of tmes that statement 1 s executed as a functon of the nput n?

What about now? Consder the followng statc method: publc statc nt mystery(nt n) { nt x = 0; for (nt = 0; < 3*n + 4; ++) { x += ; // statement 1 for (nt j = 0; j < ; j++) { x += j; return x; What s the bg-o expresson for the number of tmes that statement 1 s executed as a functon of the nput n? Practcng Tme Analyss Consder the followng statc method: publc statc nt mystery(nt n) { nt x = 0; for (nt = 0; < n; ++) { x += ; // statement 1 for (nt j = 0; j < ; j++) { x += j; // statement 2 return x; What s the bg-o expresson for the number of tmes that statement 2 s executed as a functon of the nput n? value of number of tmes statement 2 s executed

Sortng by Exchange I: Bubble Sort Perform a sequence of passes through the array. On each pass: proceed from left to rght, swappng adjacent elements f they are out of order. Larger elements bubble up to the end of the array. At the end of the kth pass, the k rghtmost elements are n ther fnal postons, so we don t need to consder them n subsequent passes. 0 1 2 3 Example: 28 24 27 18 after the frst pass: after the second: after the thrd: 24 27 18 28 24 18 27 28 18 24 27 28 Implementaton of Bubble Sort publc class Sort {... publc statc vod bubblesort(nt[] arr) { for (nt = arr.length 1; > 0; --) { for (nt j = 0; j < ; j++) { f (arr[j] > arr[j+1]) { swap(arr, j, j+1); One for-loop nested n another: the nner loop performs a sngle pass the outer loop governs the number of passes, and the endng pont of each pass

Tme Analyss of Bubble Sort Comparsons: the kth pass performs comparsons, so we get C(n) = Moves: depends on the contents of the array n the worst case: n the best case: Runnng tme: Sortng by Exchange II: Qucksort Lke bubble sort, qucksort uses an approach based on exchangng out-of-order elements, but t s more effcent. A recursve, dvde-and-conquer algorthm: dvde: rearrange the elements so that we end up wth two subarrays that meet the followng crteron: each element n the left array <= each element n the rght array example: 12 8 14 4 6 13 6 8 4 14 12 13 conquer: apply qucksort recursvely to the subarrays, stoppng when a subarray has a sngle element combne: nothng needs to be done, because of the crteron used n formng the subarrays

Parttonng an Array Usng a Pvot The process that qucksort uses to rearrange the elements s known as parttonng the array. Parttonng s done usng a value known as the pvot. We rearrange the elements to produce two subarrays: left subarray: all values <= pvot rght subarray: all values >= pvot 7 15 4 9 6 18 9 12 partton usng a pvot of 9 7 9 4 6 9 18 15 12 all values <= 9 all values >= 9 equvalent to the crteron on the prevous page. Our approach to parttonng s one of several varants. Parttonng s useful n ts own rght. ex: fnd all students wth a GPA > 3.0. Possble Pvot Values Frst element or last element rsky, can lead to terrble worst-case behavor especally poor f the array s almost sorted 4 8 14 12 6 18 4 8 14 12 6 18 pvot = 18 Mddle element (what we wll use) Randomly chosen element Medan of three elements left, center, and rght elements three randomly selected elements takng the medan of three decreases the probablty of gettng a poor pvot

arr Parttonng an Array: An Example Mantan ndces and j, startng them outsde the array: = frst 1 j = last + 1 frst 7 15 4 9 6 18 9 12 pvot = 9 last 7 15 4 9 6 18 9 12 j Fnd out of place elements: ncrement untl arr[] >= pvot decrement j untl arr[j] <= pvot 7 15 4 9 6 18 9 12 Swap arr[] and arr[j]: j 7 9 4 9 6 18 15 12 j Parttonng Example (cont.) j from prev. page: 7 9 4 9 6 18 15 12 Fnd: Swap: j 7 9 4 9 6 18 15 12 j 7 9 4 6 9 18 15 12 Fnd: j 7 9 4 6 9 18 15 12 and now the ndces have crossed, so we return j. Subarrays: left = arr[frst : j], rght = arr[j+1 : last] frst j last 7 9 4 6 9 18 15 12

Parttonng Example 2 Start (pvot = 13): 24 5 2 13 18 4 20 19 j j Fnd: 24 5 2 13 18 4 20 19 Swap: j 4 5 2 13 18 24 20 19 Fnd: j 4 5 2 13 18 24 20 19 and now the ndces are equal, so we return j. Subarrays: j 4 5 2 13 18 24 20 19 Parttonng Example 3 (done together) Start (pvot = 5): 4 14 7 5 2 19 26 6 j Fnd: 4 14 7 5 2 19 26 6

Start (pvot = 15): Parttonng Example 4 8 10 7 15 20 9 6 18 j Fnd: 8 10 7 15 20 9 6 18 partton() Helper Method prvate statc nt partton(nt[] arr, nt frst, nt last) { nt pvot = arr[(frst + last)/2]; nt = frst - 1; // ndex gong left to rght nt j = last + 1; // ndex gong rght to left whle (true) { do { ++; whle (arr[] < pvot); do { j--; whle (arr[j] > pvot); f ( < j) { swap(arr,, j); else { return j; // arr[j] = end of left array frst last 7 15 4 9 6 18 9 12

Implementaton of Qucksort publc statc vod qucksort(nt[] arr) { // "wrapper" method qsort(arr, 0, arr.length - 1); prvate statc vod qsort(nt[] arr, nt frst, nt last) { nt splt = partton(arr, frst, last); f (frst < splt) { // f left subarray has 2+ values qsort(arr, frst, splt); // sort t recursvely! f (last > splt + 1) { // f rght has 2+ values qsort(arr, splt + 1, last); // sort t! // note: base case s when nether call s made, // because both subarrays have only one element! frst splt (j) last 7 9 4 6 9 18 15 12 A Quck Revew of Logarthms log b n = the exponent to whch b must be rased to get n log b n = p f b p = n examples: log 2 8 = 3 because 2 3 = 8 log 10 10000 = 4 because 10 4 = 10000 Another way of lookng at logs: let's say that you repeatedly dvde n by b (usng nteger dvson) log b n s an upper bound on the number of dvsons needed to reach 1 example: log 2 18 s approx. 4.17 18/2 = 9 9/2 = 4 4/2 = 2 2/2 = 1

A Quck Revew of Logs (cont.) If the number of operatons performed by an algorthm s proportonal to log b n for any base b, we say t s a O(log n) algorthm droppng the base. log b n grows much more slowly than n n log 2 n 2 1 1024 (1K) 10 1024*1024 (1M) 20 Thus, for large values of n: a O(log n) algorthm s much faster than a O(n) algorthm a O(nlogn) algorthm s much faster than a O(n 2 ) algorthm We can also show that an O(nlogn) algorthm s faster than a O(n 1.5 ) algorthm lke Shell sort. Tme Analyss of Qucksort Parttonng an array requres n comparsons, because each element s compared wth the pvot. best case: parttonng always dvdes the array n half repeated recursve calls gve: n/2 n/2 n/4 n/4 n/4 n/4..................... 1 1 1 1 1 1 1 1 1 1 n comparsons n 2*(n/2) = n 4*(n/4) = n... at each "row" except the bottom, we perform n comparsons there are rows that nclude comparsons C(n) =? Smlarly, M(n) and runnng tme are both 0

Tme Analyss of Qucksort (cont.) worst case: pvot s always the smallest or largest element one subarray has 1 element, the other has n - 1 repeated recursve calls gve: comparsons n n 1 n-1 n-1 1 n-2 n-2 1 n-3 n-3...... 1 2 2 n 1 1 C(n) = = O(n 2 ). M(n) and run tme are also O(n 2 ). 2 average case s harder to analyze C(n) > n log 2 n, but t s stll O(n log n) Mergesort All of the comparson-based sortng algorthms that we've seen thus far have sorted the array n place. used only a small amount of addtonal memory Mergesort s a sortng algorthm that requres an addtonal temporary array of the same sze as the orgnal one. t needs O(n) addtonal space, where n s the array sze It s based on the process of mergng two sorted arrays nto a sngle sorted array. example: 2 8 14 24 5 7 9 11 2 5 7 8 9 11 14 24

Mergng Sorted Arrays To merge sorted arrays A and B nto an array C, we mantan three ndces, whch start out on the frst elements of the arrays: A 2 8 14 24 k j C B 5 7 9 11 We repeatedly do the followng: compare A[] and B[j] copy the smaller of the two to C[k] ncrement the ndex of the array whose element was coped ncrement k A 2 8 14 24 k j C 2 B 5 7 9 11 Mergng Sorted Arrays (cont.) Startng pont: A 2 8 14 24 k j C B 5 7 9 11 After the frst copy: A 2 8 14 24 j B 5 7 9 11 C 2 k After the second copy: A 2 8 14 24 j B 5 7 9 11 C 2 5 k

Mergng Sorted Arrays (cont.) After the thrd copy: A 2 8 14 24 k j C 2 5 7 B 5 7 9 11 After the fourth copy: A 2 8 14 24 j B 5 7 9 11 C 2 5 7 8 k After the ffth copy: A 2 8 14 24 j B 5 7 9 11 C 2 5 7 8 9 k Mergng Sorted Arrays (cont.) After the sxth copy: A 2 8 14 24 j C 2 5 7 8 9 11 B 5 7 9 11 k There's nothng left n B, so we smply copy the remanng elements from A: A 2 8 14 24 j C 2 5 7 8 9 11 14 24 B 5 7 9 11 k

Dvde and Conquer Lke qucksort, mergesort s a dvde-and-conquer algorthm. dvde: splt the array n half, formng two subarrays conquer: apply mergesort recursvely to the subarrays, stoppng when a subarray has a sngle element combne: merge the sorted subarrays splt splt splt merge merge merge 8 12 4 14 6 33 2 27 4 8 12 14 2 6 27 33 2 4 6 8 12 14 27 33 Tracng the Calls to Mergesort the ntal call s made to sort the entre array: splt nto two 4-element subarrays, and make a recursve call to sort the left subarray: 12 8 14 4 splt nto two 2-element subarrays, and make a recursve call to sort the left subarray: 12 8 14 4 12 8

Tracng the Calls to Mergesort splt nto two 1-element subarrays, and make a recursve call to sort the left subarray: 12 8 14 4 12 8 12 base case, so return to the call for the subarray {12, 8: 12 8 14 4 12 8 Tracng the Calls to Mergesort make a recursve call to sort ts rght subarray: 12 8 14 4 12 8 8 base case, so return to the call for the subarray {12, 8: 12 8 14 4 12 8

Tracng the Calls to Mergesort merge the sorted halves of {12, 8: 12 8 14 4 12 8 8 12 end of the method, so return to the call for the 4-element subarray, whch now has a sorted left subarray: 8 12 14 4 Tracng the Calls to Mergesort make a recursve call to sort the rght subarray of the 4-element subarray 8 12 14 4 14 4 splt t nto two 1-element subarrays, and make a recursve call to sort the left subarray: 8 12 14 4 14 4 14 base case

Tracng the Calls to Mergesort return to the call for the subarray {14, 4: 8 12 14 4 14 4 make a recursve call to sort ts rght subarray: 8 12 14 4 14 4 4 base case Tracng the Calls to Mergesort return to the call for the subarray {14, 4: 8 12 14 4 14 4 merge the sorted halves of {14, 4: 8 12 14 4 14 4 4 14

Tracng the Calls to Mergesort end of the method, so return to the call for the 4-element subarray, whch now has two sorted 2-element subarrays: 8 12 4 14 merge the 2-element subarrays: 8 12 4 14 4 8 12 14 Tracng the Calls to Mergesort end of the method, so return to the call for the orgnal array, whch now has a sorted left subarray: 4 8 12 14 6 33 2 27 perform a smlar set of recursve calls to sort the rght subarray. here's the result: 4 8 12 14 2 6 27 33 fnally, merge the sorted 4-element subarrays to get a fully sorted 8-element array: 4 8 12 14 2 6 27 33 2 4 6 8 12 14 27 33

Implementng Mergesort One approach s to create new arrays for each new set of subarrays, and to merge them back nto the array that was splt. Instead, we'll create a temp. array of the same sze as the orgnal. pass t to each call of the recursve mergesort method use t when mergng subarrays of the orgnal array: arr 8 12 4 14 6 33 2 27 temp 4 8 12 14 after each merge, copy the result back nto the orgnal array: arr 4 8 12 14 6 33 2 27 temp 4 8 12 14 A Method for Mergng Subarrays prvate statc vod merge(nt[] arr, nt[] temp, nt leftstart, nt leftend, nt rghtstart, nt rghtend) { nt = leftstart; // ndex nto left subarray nt j = rghtstart; // ndex nto rght subarray nt k = leftstart; // ndex nto temp whle ( <= leftend && j <= rghtend) { f (arr[] < arr[j]) { temp[k] = arr[]; ++; k++; else { temp[k] = arr[j]; j++; k++; whle ( <= leftend) { temp[k] = arr[]; ++; k++; whle (j <= rghtend) { temp[k] = arr[j]; j++; k++; for ( = leftstart; <= rghtend; ++) { arr[] = temp[];

A Method for Mergng Subarrays prvate statc vod merge(nt[] arr, nt[] temp, nt leftstart, nt leftend, nt rghtstart, nt rghtend) { nt = leftstart; // ndex nto left subarray nt j = rghtstart; // ndex nto rght subarray nt k = leftstart; // ndex nto temp whle ( <= leftend && j <= rghtend) { // both subarrays stll have values f (arr[] < arr[j]) { temp[k] = arr[]; ++; k++; else { temp[k] = arr[j]; j++; k++;... leftstart leftend rghtstart rghtend arr: 4 8 12 14 2 6 27 33 temp: Methods for Mergesort Here's the key recursve method: prvate statc vod msort(nt[] arr, nt[] temp, nt start, nt end){ f (start >= end) { // base case: subarray of length 0 or 1 return; else { nt mddle = (start + end)/2; msort(arr, temp, start, mddle); msort(arr, temp, mddle + 1, end); merge(arr, temp, start, mddle, mddle + 1, end); arr: start end temp:

Methods for Mergesort Here's the key recursve method: prvate statc vod msort(nt[] arr, nt[] temp, nt start, nt end){ f (start >= end) { // base case: subarray of length 0 or 1 return; else { nt mddle = (start + end)/2; msort(arr, temp, start, mddle); msort(arr, temp, mddle + 1, end); merge(arr, temp, start, mddle, mddle + 1, end); We use a "wrapper" method to create the temp array, and to make the ntal call to the recursve method: publc statc vod mergesort(nt[] arr) { nt[] temp = new nt[arr.length]; msort(arr, temp, 0, arr.length - 1); Tme Analyss of Mergesort Mergng two halves of an array of sze n requres 2n moves. Why? Mergesort repeatedly dvdes the array n half, so we have the followng call tree (showng the szes of the arrays): moves n 2n n/2 n/2 2*2*(n/2) = 2n n/4 n/4 n/4 n/4 4*2*(n/4) = 2n........................ 1 1 1 1 1 1 1 1 1 1 at all but the last level of the call tree, there are 2n moves how many levels are there? M(n) =? C(n) =?

Summary: Comparson-Based Sortng Algorthms algorthm best case avg case worst case extra memory selecton sort O(n 2 ) O(n 2 ) O(n 2 ) O(1) nserton sort O(n) O(n 2 ) O(n 2 ) O(1) Shell sort O(n log n) O(n 1.5 ) O(n 1.5 ) O(1) bubble sort O(n 2 ) O(n 2 ) O(n 2 ) O(1) qucksort O(n log n) O(n log n) O(n 2 ) O(1) mergesort O(n log n) O(n log n) O(nlog n) O(n) Inserton sort s best for nearly sorted arrays. Mergesort has the best worst-case complexty, but requres extra memory and moves to and from the temp array. Qucksort s comparable to mergesort n the average case. Wth a reasonable pvot choce, ts worst case s seldom seen. Use SortCount.java to experment. Comparson-Based vs. Dstrbutve Sortng Untl now, all of the sortng algorthms we have consdered have been comparson-based: treat the keys as wholes (comparng them) don t take them apart n any way all that matters s the relatve order of the keys, not ther actual values. No comparson-based sortng algorthm can do better than O(n log 2 n) on an array of length n. O(n log 2 n) s a lower bound for such algorthms. Dstrbutve sortng algorthms do more than compare keys; they perform calculatons on the values of ndvdual keys. Movng beyond comparsons allows us to overcome the lower bound. tradeoff: use more memory.

Dstrbutve Sortng Example: Radx Sort Reles on the representaton of the data as a sequence of m quanttes wth k possble values. Examples: m k nteger n range 0... 999 3 10 strng of 15 upper-case letters 15 26 32-bt nteger 32 2 (n bnary) 4 256 (as bytes) Strategy: Dstrbute accordng to the last element n the sequence, then concatenate the results: 33 41 12 24 31 14 13 42 34 get: 41 31 12 42 33 13 24 14 34 Repeat, movng back one dgt each tme: get: 12 13 14 24 31 33 34 41 42 Analyss of Radx Sort Recall that we treat the values as a sequence of m quanttes wth k possble values. Number of operatons s O(n*m) for an array wth n elements better than O(n log n) when m < log n Memory usage ncreases as k ncreases. k tends to ncrease as m decreases tradeoff: ncreased speed requres ncreased memory usage

Bg-O Notaton Revsted We've seen that we can group functons nto classes by focusng on the fastest-growng term n the expresson for the number of operatons that they perform. e.g., an algorthm that performs n 2 /2 n/2 operatons s a O(n 2 )-tme or quadratc-tme algorthm Common classes of algorthms: name example expressons bg-o notaton constant tme 1, 7, 10 O(1) logarthmc tme 3log 10 n, log 2 n+5 O(log n) lnear tme 5n, 10n 2log 2 n O(n) nlogn tme 4nlog 2 n, nlog 2 n+n O(nlog n) quadratc tme 2n 2 + 3n, n 2 1 O(n 2 ) cubc tme n 2 +3n 3, 5n 3 5 O(n 3 ) exponental tme 2 n, 5e n +2n 2 O(c n ) factoral tme 3n!, 5n + n! O(n!) slower How Does the Number of Operatons Scale? Let's say that we have a problem sze of 1000, and we measure the number of operatons performed by a gven algorthm. If we double the problem sze to 2000, how would the number of operatons performed by an algorthm ncrease f t s: O(n)-tme O(n 2 )-tme O(n 3 )-tme O(log 2 n)-tme O(2 n )-tme

How Does the Actual Runnng Tme Scale? How much tme s requred to solve a problem of sze n? assume that each operaton requres 1 sec (1 x 10-6 sec) tme functon problem sze (n) 10 20 30 40 50 60 n.00001 s.00002 s.00003 s.00004 s.00005 s.00006 s n 2.0001 s.0004 s.0009 s.0016 s.0025 s.0036 s n 5.1 s 3.2 s 24.3 s 1.7 mn 5.2 mn 13.0 mn 2 n.001 s 1.0 s 17.9 mn 12.7 days 35.7 yrs 36,600 yrs sample computatons: when n = 10, an n 2 algorthm performs 10 2 operatons. 10 2 * (1 x 10-6 sec) =.0001 sec when n = 30, a 2 n algorthm performs 2 30 operatons. 2 30 * (1 x 10-6 sec) = 1073 sec = 17.9 mn What's the Largest Problem That Can Be Solved? What's the largest problem sze n that can be solved n a gven tme T? (agan assume 1 sec per operaton) tme functon tme avalable (T) 1 mn 1 hour 1 week 1 year n 60,000,000 3.6 x 10 9 6.0 x 10 11 3.1 x 10 13 n 2 7745 60,000 777,688 5,615,692 n 5 35 81 227 500 2 n 25 31 39 44 sample computatons: 1 hour = 3600 sec that's enough tme for 3600/(1 x 10-6 ) = 3.6 x 10 9 operatons n 2 algorthm: n 2 = 3.6 x 10 9 n = (3.6 x 10 9 ) 1/2 = 60,000 2 n algorthm: 2 n = 3.6 x 10 9 n = log 2 (3.6 x 10 9 ) ~= 31